EPHEMERIS No. 003 Zürich

Tue 21·Apr 26

Today the calendar turned on a new Opus, a writable SQLite, and an agent that quietly rented a shopfront.

01Opus 4.7 arrivesAnthropic 02Codex eats the desktopOpenAI 03Durable execution, GAVercel 04eBPF catches the circleGitHub 05Past monkey-patchingSentry 062.5s → 94msPostHog 07Eval noise vs model gapAnthropic 08Infrastructure for agentsVercel 09Automating alignmentImport AI 10An agent opens a shopvia @seeallochnaya

01 / 10Anthropic · Model Release

Shipping today · Anthropic

Opus 4.7 lands with sharper hands.

The new Opus ships with notably better multi-step reasoning, larger image inputs (~3.75 MP), and more consistent tool use across long runs. The short version: the coding delta is real, and so is the hit on everything adjacent — vision, agents, planning — which means the gap between "runs a demo" and "runs in prod" just got narrower again.

+13%: Coding benchmarks
3.75MP: Max image resolution
4.7: Version this month

Read the release →

02 / 10OpenAI · Developer Tools

// release-notes / codex.app / 2026-04-16

Codex eats the desktop.

The macOS and Windows Codex apps just folded in browser control, computer use, image generation, and plugins. It is less "IDE assistant" and more "the harness you run your workflow in" — the kind of ship note you print out and pin above the monitor so everyone stops asking what Codex is for.

#	Feature	What it unlocks
01	Computer use	Driving GUI apps from inside an agent loop — clicks, keystrokes, file pickers.
02	In-app browser	Codex can now read the page you just shoved at it without a screenshot dance.
03	Image generation	Inline renders, not a separate tool round trip.
04	Memory	Preferences, project context, prior runs — persistent across sessions.
05	Plugins	Third-party surfaces on the same harness, no SDK dance to integrate.

Read the changelog →

03 / 10Vercel · Engineering · Primitive

SHEET 03 · Rev. A · Workflows (GA)

Durable execution, without the orchestrator.

Vercel Workflows is generally available: write long-running TypeScript or Python functions that survive crashes, retries, and regional failures — no Temporal, no BullMQ, no separate control plane to babysit. For anyone running AI agents that step out for hours, this is the primitive.

// fig 1 — a step that sleeps 30 minutes now costs nothing & resumes on the right instance.

Read the blueprint →

04 / 10GitHub · Engineering · Infrastructure

The Engineering Register Vol. XVII · № 03 Infrastructure Desk

How GitHub uses eBPF to catch the circular dep.

A kernel-level tripwire now detects circular dependencies during deploys — long before the pipeline notices, and long before an on-call does.

By Lawrence Gripper & Aleksey Levenstein · Published April 16

Deploy-time safety tends to live in CI — sequenced tests, policy checks, gates. GitHub's team argues that by the time a dependency cycle shows up in CI, the graph has already drifted: the problem is lurking in the runtime behaviour of the deploy tooling, not the YAML.

So they put the check where the runtime is: in the kernel, via eBPF. The probe watches the syscalls that the deploy orchestrator actually makes, and trips the second a cycle closes, before any downstream job gets the chance to start misbehaving.

The post is a good read on when to stop reaching for more CI policy and start reaching for an observer that lives closer to the metal. It also lands with a frankness you rarely see from infra teams: the motivation was an outage, not a conference talk.

Net: if you've got a control plane that talks to a lot of small services and orchestration gremlins, consider whether eBPF is the missing tracer — not just for performance, but for correctness.

Continue on the GitHub blog →

05 / 10Sentry · Engineering · Node.js

Gallery of Node.js Observability · Panel vii

A better way than monkey-patching.

Sentry walks through Node's diagnostics-channel API — how a library can voluntarily emit telemetry from inside its own call sites instead of being hot-swapped from outside. Less fragile, less magic, and a path to observability that does not break the next minor.

AuthorSentry Engineering Tagsjavascript · sdk · opentelemetry Posted2026-04-13

Read the field notes →

06 / 10PostHog · Engineering · Rust

TAKE 01 · P99 · FEATURE FLAGS SVC

Rayon meets Tokio, and p99 blinks.

PostHog's feature-flag service was bleeding tail latency. The fix wasn't a magic allocator — it was separating the Tokio async runtime from the Rayon CPU pool, adding real backpressure, and rewriting dependency resolution with Kahn's algorithm. Twenty-six-fold wins rarely come from one trick; this one is a masterclass in where to look.

p99 · feature-flag evaluation · before / after

Read the postmortem →

07 / 10Anthropic · Engineering · Evals

CONFIDENTIAL · EVAL DESK

TO: Anyone running agentic-coding benchmarks

FROM: Anthropic Engineering

RE: The gap you think you're measuring

DATE: April 2026

Your eval noise may be bigger than the model gap.

Quantifying infrastructure noise in agentic coding evals: the team ran the same benchmark under varying shell, container, and filesystem configurations — and the spread between setups, on a single model, was large enough to swallow the advertised delta between top models.

The practical takeaway: before you ship a blog post comparing Opus 4.7 to your last-month run, pin your harness. Shell version, sandbox mode, network policy, temp-file cleanup. If you're not doing it, you are publishing noise.

Then measure again.

Signed,Anthropic Engineering

Read the memo →

08 / 10Vercel · Engineering · Manifesto

a quiet argument, from Vercel —

Infrastructure, written for agents.

The post's claim: the next generation of platforms will not be "human-ops plus a Copilot." They will be designed, from the API boundary inwards, for an agent to drive — self-describing, self-healing, and forgiving of retries. The manual console is a rounding error.

"If the primitive isn't safe to call a thousand times with a stochastic caller, it isn't really an agent primitive."

— Vercel · Engineering

Read the essay →

09 / 10Jack Clark · Newsletter · № 454

Postcard from the alignment desk

Alignment research, automated.

Clark's #454 leads with a startling result: Anthropic's automated alignment agents outperformed human researchers on weak-to-strong supervision tasks. Pair that with Huawei's HiFloat4 outperforming MXFP4 on chip-efficiency tests, and a safety deep-dive on a Chinese model, and you have one of the denser issues this year.

Read issue 454 →

In this issue

Automated alignment research beats humans on weak-to-strong
HiFloat4: Huawei's new chip-number format
Safety audit of a Chinese frontier model
Follow-ups on last week's MirrorCode piece

10 / 10via @seeallochnaya · Andon Labs · Field Report

VIA @seeallochnaya · SOURCE · ANDON LABS

An agent rented a shopfront in SF.

Luna — a Claude Sonnet 4.6-based agent — was given $100k and a three-year lease in San Francisco, then left to it. She hired staff, negotiated suppliers, applied for a business loan, and picked the merchandise. No human authorization loop. The store is open.

Read the thread →

End of issue 003Back to top ↑

That's today.

Ten picks from Anthropic, OpenAI, Vercel, GitHub, Sentry, PostHog, Import AI, and the Telegram desk of @seeallochnaya. Back tomorrow at 08:00 Zürich.

Sources

anthropic.com/news
anthropic.com/engineering
openai.com
vercel.com/blog
github.blog/engineering
sentry.engineering
posthog.com
jack-clark.net
t.me/s/seeallochnaya

Rubric

AI tools · creative software · dev tools · privacy · science · the practical. If we can't imagine you using it tomorrow, it doesn't run.

Colophon

Issue 003 assembled 08:00 CET, Tue 21 Apr 2026.