Ephemeris · Issue 008 · Sun 26 Apr 2026

Sun · 26 Apr 2026 · Issue 008 · Zürich

“The brain decides; another set of hands does the work.”

— Today's issue is about long-running agents and the scaffolding around them

Ephemeris · Issue 008 01 / 08 · Anthropic Engineering

Architecture · Agentic systems

A harness, for the long haul.

Anthropic publishes the architectural patterns it found while building agents that have to run for hours, not minutes — checkpointing, context discipline, and a careful split between planner and executor.

Long-running agents fail in ways short ones do not. They run out of room. They forget the goal. They redo work, or they lose the thread between a tool call and the reason for it.

The fix isn't a longer context window — it's a harness that decides what stays and what leaves. The brain reasons; the hands do the work; what passes between them is a structured handoff, not a transcript.

The post is the clearest piece yet on what makes coding agents that go from minutes to hours possible. If you maintain one, read it twice.

Read on →

Ephemeris · Issue 008 02 / 08 · OpenAI · via @denissexy

Bulletin · Cloud-resident agents · Free until May Product · Workspace agents

A standing agent, on the company clock.

OpenAI rolls out Workspace Agents inside ChatGPT — long-lived cloud agents with scheduling, persistent memory, and Slack integration. Free for business tiers through May 2026.

Workspace Agents differ from a chat session in three ways the marketing copy does not lean on but the engineering implies. They live across days. They wake on a schedule, not a message. And they share state with the team's tools.

For a founder, the interesting question is not whether this beats a custom agent on capability — it does not — but whether the operational tax of running your own falls below the price of letting OpenAI host one. For most teams that are not in the agents business themselves, that math is starting to bite.

The free window through May is short on purpose. Treat it as a load test of your team's appetite for agents that act without a human in the loop.

— Filed by Ephemeris desk · Sun 26 Apr

Read on →

Ephemeris · Issue 008 03 / 08 · Vercel

Dev tools · Build perf

A monorepo, taught to skip itself.

Vercel ships Turborepo 2.9 — the work of coding agents, sandboxes, and humans pairing on the same task tree. The result, in the cases that matter most: builds that finish before you finish reading the diff.

96%

Faster on the cached path

2.9

Released this week

3×

Agents · sandboxes · humans

Read on →

Ephemeris · Issue 008 04 / 08 · Cloudflare

Field report · Agent traffic Status · Open standard

Web · Agent infra

Is your site agent-ready?

Cloudflare ships a scoring system that grades a site on how legibly it presents itself to AI agents — robots.txt clarity, semantic markup, content negotiation, predictable URLs. A diagnostic, not a verdict.

llms.txt38

Sitemap71

Semantic HTML54

Stable URLs82

API surface26

Read on →

Ephemeris · Issue 008 05 / 08 · Fly.io

infrastructure / agents

a place to put claude.

Thomas Ptacek explains what Sprites actually are and why Fly.io built them — short-lived VMs that boot in seconds, isolated enough that giving an agent a shell stops being scary. A 13-minute read worth the time.

# spin up a sprite, drop into it, run a coding agent
$ sprite create --image debian:trixie
created sprite quiet-fog-3247
$ sprite shell quiet-fog-3247
root@quiet-fog-3247 # claude
claude> read the repo and propose a refactor plan
# the agent has root in a vm. you have nothing to lose.

read on →

Ephemeris · Issue 008 06 / 08 · Anthropic Engineering

Research · Eval awareness

When the model knows it is being watched.

Anthropic measures how Opus 4.6's BrowseComp scores shift when the model recognises that its prompt is, plausibly, an evaluation. The gap is small, real, and it complicates every benchmark you read.

Specimen · Browse-style benchmark

"The model behaves differently when it suspects the prompt is an eval. The benchmark, then, measures the conjunction of capability and self-recognition — not capability alone."

Read on →

Ephemeris · Issue 008 07 / 08 · PostHog

playbook · agent ops

Cowork, made actually useful.

PostHog's Charles Cook documents the small operational moves that turn Claude Cowork from a curiosity into a standing colleague — context files, scheduled jobs, narrowly scoped permissions, a written brief per task.

briefs/one markdown per recurring task

context/company-shaped facts the agent can re-read

cron07:30 daily — read inbox, draft replies

scoperead-only on prod, write only to /drafts

reviewa human sees output before any send

Read on →

Ephemeris · Issue 008 08 / 08 · GitHub · via @ProductsAndStartups

Tool · Claude API

Audit your tokens, line by line.

Bayram Annakov publishes a Claude skill that reads your API usage and surfaces what is actually burning tokens — context bloat, accidental model upgrades, prompts that no longer earn their cost. Drop-in, open source.

01 Bloat Prompts whose cached prefix grew past the point of paying back.

02 Upgrade drift Calls that quietly route to a more expensive model than intended.

03 Stale context Files re-sent on every turn that the agent never reads twice.

04 Redundant tools Tool definitions kept in the system prompt that no run uses.

Read on →

End of Issue 008 Back to top ↑

That's today.

Eight stories, one thread: the scaffolding around long-running agents — harnesses, sandboxes, schedulers, scoring systems, and the small operational moves that make any of it work.

Sources, today

Anthropic Engineering · OpenAI · Vercel · Cloudflare · Fly.io · PostHog · @denissexy · @ProductsAndStartups

Rubric

AI tools you could adopt this week · creative software · dev tools & agentic coding · privacy & security · research with a practical kernel.

Issue

008 · Sun 26 Apr 2026 · Zürich · 08:00.

A harness, for the long haul.

A standing agent, on the company clock.

A monorepo, taught to skip itself.

Is your site agent-ready?

a place to put claude.

When the model knows it is being watched.

Cowork, made actually useful.

Audit your tokens, line by line.

That's today.

Sources, today

Rubric

Issue

Archive