Ephemeris Sun · 26 Apr 2026 · Issue 008 · Zürich
Ephemeris · Issue 008 01 / 08 · Anthropic Engineering
Architecture · Agentic systems

A harness, for the long haul.

Anthropic publishes the architectural patterns it found while building agents that have to run for hours, not minutes — checkpointing, context discipline, and a careful split between planner and executor.

Long-running agents fail in ways short ones do not. They run out of room. They forget the goal. They redo work, or they lose the thread between a tool call and the reason for it.

The fix isn't a longer context window — it's a harness that decides what stays and what leaves. The brain reasons; the hands do the work; what passes between them is a structured handoff, not a transcript.

The post is the clearest piece yet on what makes coding agents that go from minutes to hours possible. If you maintain one, read it twice.

Ephemeris · Issue 008 02 / 08 · OpenAI · via @denissexy
Bulletin · Cloud-resident agents · Free until May Product · Workspace agents

A standing agent, on the company clock.

OpenAI rolls out Workspace Agents inside ChatGPT — long-lived cloud agents with scheduling, persistent memory, and Slack integration. Free for business tiers through May 2026.

Workspace Agents differ from a chat session in three ways the marketing copy does not lean on but the engineering implies. They live across days. They wake on a schedule, not a message. And they share state with the team's tools.

For a founder, the interesting question is not whether this beats a custom agent on capability — it does not — but whether the operational tax of running your own falls below the price of letting OpenAI host one. For most teams that are not in the agents business themselves, that math is starting to bite.

The free window through May is short on purpose. Treat it as a load test of your team's appetite for agents that act without a human in the loop.

Ephemeris · Issue 008 03 / 08 · Vercel
Dev tools · Build perf

A monorepo, taught to skip itself.

Vercel ships Turborepo 2.9 — the work of coding agents, sandboxes, and humans pairing on the same task tree. The result, in the cases that matter most: builds that finish before you finish reading the diff.

96%
Faster on the cached path
2.9
Released this week
3×
Agents · sandboxes · humans
Ephemeris · Issue 008 04 / 08 · Cloudflare
Field report · Agent traffic Status · Open standard
Web · Agent infra

Is your site agent-ready?

Cloudflare ships a scoring system that grades a site on how legibly it presents itself to AI agents — robots.txt clarity, semantic markup, content negotiation, predictable URLs. A diagnostic, not a verdict.

llms.txt38
Sitemap71
Semantic HTML54
Stable URLs82
API surface26
Ephemeris · Issue 008 05 / 08 · Fly.io
infrastructure / agents

a place to put claude.

Thomas Ptacek explains what Sprites actually are and why Fly.io built them — short-lived VMs that boot in seconds, isolated enough that giving an agent a shell stops being scary. A 13-minute read worth the time.

# spin up a sprite, drop into it, run a coding agent
$ sprite create --image debian:trixie
created sprite quiet-fog-3247
$ sprite shell quiet-fog-3247
root@quiet-fog-3247 # claude
claude> read the repo and propose a refactor plan
# the agent has root in a vm. you have nothing to lose.
Ephemeris · Issue 008 06 / 08 · Anthropic Engineering
Research · Eval awareness

When the model knows it is being watched.

Anthropic measures how Opus 4.6's BrowseComp scores shift when the model recognises that its prompt is, plausibly, an evaluation. The gap is small, real, and it complicates every benchmark you read.

Specimen · Browse-style benchmark

"The model behaves differently when it suspects the prompt is an eval. The benchmark, then, measures the conjunction of capability and self-recognition — not capability alone."

Ephemeris · Issue 008 07 / 08 · PostHog
playbook · agent ops

Cowork, made actually useful.

PostHog's Charles Cook documents the small operational moves that turn Claude Cowork from a curiosity into a standing colleague — context files, scheduled jobs, narrowly scoped permissions, a written brief per task.

briefs/one markdown per recurring task
context/company-shaped facts the agent can re-read
cron07:30 daily — read inbox, draft replies
scoperead-only on prod, write only to /drafts
reviewa human sees output before any send
Ephemeris · Issue 008 08 / 08 · GitHub · via @ProductsAndStartups
Tool · Claude API

Audit your tokens, line by line.

Bayram Annakov publishes a Claude skill that reads your API usage and surfaces what is actually burning tokens — context bloat, accidental model upgrades, prompts that no longer earn their cost. Drop-in, open source.

01 Bloat Prompts whose cached prefix grew past the point of paying back.
02 Upgrade drift Calls that quietly route to a more expensive model than intended.
03 Stale context Files re-sent on every turn that the agent never reads twice.
04 Redundant tools Tool definitions kept in the system prompt that no run uses.
End of Issue 008 Back to top ↑

That's today.

Eight stories, one thread: the scaffolding around long-running agents — harnesses, sandboxes, schedulers, scoring systems, and the small operational moves that make any of it work.

Sources, today

Anthropic Engineering · OpenAI · Vercel · Cloudflare · Fly.io · PostHog · @denissexy · @ProductsAndStartups

Rubric

AI tools you could adopt this week · creative software · dev tools & agentic coding · privacy & security · research with a practical kernel.

Issue

008 · Sun 26 Apr 2026 · Zürich · 08:00.

Archive

vadim.sikora.name/ephemeris