Ephemeris · Issue 014 Saturday · 02 May 2026 · Zürich
Ephemeris · Issue 014 01 / 07 · Anthropic Engineering
Hiring · Evals

A take-home Opus can't just solve.

Anthropic's hiring team rebuilt their take-home from scratch — twice — after Opus 4 and then Opus 4.5 walked through earlier versions in minutes. The post is the design log of an interview problem engineered against frontier models, with the constraints, the failures, and the lessons for anyone still asking "implement a cache" in 2026.

"The bar isn't 'can a human solve this?' anymore. It's 'can a human solve this in a way the model still can't shortcut?' That's a different design problem.— Anthropic Engineering

Ephemeris · Issue 014 02 / 07 · OpenAI · Global Affairs
Security · Policy

Five moves to defend the next decade.

OpenAI lays out a five-part action plan for cybersecurity in the AI era — democratising AI-powered defence, hardening critical systems, and aligning with national-security partners. Worth reading less for the policy framing than for the specific posture they're proposing AI labs and customers adopt.

5Pillar action plan
Defenders, not just attackers
26The year cyber broke even
Ephemeris · Issue 014 03 / 07 · Anthropic Engineering
API · Agents

Tools the agent discovers at runtime.

Claude can now find, learn, and execute tools dynamically — instead of having every JSON schema crammed into the system prompt at session start. The implication, if you're building agents: stop pre-loading the world. Let the model ask for a manual when it needs one.

Tool catalogues that don't fit in a context window. Discovery first, exec second. The new primitive shifts agent design from menus to libraries.
EPHEMERIS // ISSUE 014 04 / 07 // VERCEL
$ DEV · GUARDRAILS

$ ship --agent --with-judgment_

Vercel publishes a working framework for "agent responsibly" — the difference between leveraging AI and relying on it. Concrete guardrails for code reviews on agent-generated PRs, sandboxing for runtime tool use, and the human-in-the-loop checkpoints that actually matter.

# reviewing an agent-authored PR — the new checklist
$ agent.diff --explain               # model summarises intent
$ agent.tests --run --strict         [passed 247 / 247]
$ agent.scope --check                [bounded to /apps/web]
$ agent.prod-keys --grep              [NONE — required]
$ human.review --required-on-merge    [approved by you]
# merge unblocked. shipped.
Ephemeris · Issue 014 05 / 07 · Cloudflare
Reliability · Infra

A network that fails small, on purpose.

Cloudflare wraps its multi-quarter "Code Orange" reliability initiative — a top-to-bottom rewrite of how dependencies, blast radii, and recovery paths are designed across the edge. The post-mortem-meets-postcard is required reading for anyone running infrastructure at any scale that involves the word "blast radius."

Code Orange started after a year in which a handful of incidents took out far more of the network than any one of them deserved to. The fix wasn't another runbook. It was an internal mandate to redesign for failure modes that stay small: every dependency labelled, every fault domain bounded, every recovery path practiced before the alarm fires.

The result, the team writes, isn't a network that doesn't fail. It's a network where any single failure stops being a CNN headline. That distinction is the entire point. If you operate something with a control plane and a data plane, this is the postmortem-as-playbook you wanted.

Ephemeris · Issue 014 06 / 07 · The Batch · Issue 351
Models · Open weights

Kimi K2.6 goes for the long session.

Moonshot AI's updated Kimi handles longer autonomous coding sessions and scales up its multi-agent orchestration relative to its predecessor. Open-weights, with measurable gains where most coding agents still drift after an hour. Worth pulling for anyone whose coding-agent budget is starting to look like a salary.

  1. i.Longer autonomous coding sessions before drift.
  2. ii.Multi-agent orchestration scaled up.
  3. iii.Open weights — pull, host, evaluate at home.
  4. iv.Pricing that pressures closed leaders.
Ephemeris · Issue 014 07 / 07 · Fly.io Blog
Agents · Infrastructure

Better agents, with the right plumbing.

Fly walks through using MorphLLM as the editing brain inside an agent loop on Fly Machines — fast file rewriting, scoped sandboxes, and a deployment shape that keeps cost predictable when you're running dozens of agent attempts in parallel. A practical take on the "where does the agent actually live?" question.

The edit step

MorphLLM specialises in one thing — fast, accurate file rewrites — so the orchestrator doesn't burn frontier-model tokens on mechanical patches.

The host shape

Fly Machines spin up per attempt, isolated and disposable. No long-lived "agent server" — just a fleet of short-lived sandboxes.

End of Issue 014 Back to top ↑

That's today.

Seven picks, six sources, one Saturday. Tomorrow morning at 08:00 Zürich, again — same rubric, different surface.

Sources today — Anthropic Engineering · OpenAI · Vercel · Cloudflare · DeepLearning.AI The Batch · Fly.io.
Rubric — Tools to adopt this week · creative software · dev tools & agentic coding · privacy & security · research with a practical kernel · anything actionable for a senior engineer or founder.
Issue 014 · 02 May 2026 · Zürich.