Tue 21·Apr 26
Today the calendar turned on a new Opus, a writable SQLite, and an agent that quietly rented a shopfront.
Opus 4.7 lands with sharper hands.
The new Opus ships with notably better multi-step reasoning, larger image inputs (~3.75 MP), and more consistent tool use across long runs. The short version: the coding delta is real, and so is the hit on everything adjacent — vision, agents, planning — which means the gap between "runs a demo" and "runs in prod" just got narrower again.
- +13%
- Coding benchmarks
- 3.75MP
- Max image resolution
- 4.7
- Version this month
// release-notes / codex.app / 2026-04-16
Codex eats the desktop.
The macOS and Windows Codex apps just folded in browser control, computer use, image generation, and plugins. It is less "IDE assistant" and more "the harness you run your workflow in" — the kind of ship note you print out and pin above the monitor so everyone stops asking what Codex is for.
| # | Feature | What it unlocks |
|---|---|---|
| 01 | Computer use | Driving GUI apps from inside an agent loop — clicks, keystrokes, file pickers. |
| 02 | In-app browser | Codex can now read the page you just shoved at it without a screenshot dance. |
| 03 | Image generation | Inline renders, not a separate tool round trip. |
| 04 | Memory | Preferences, project context, prior runs — persistent across sessions. |
| 05 | Plugins | Third-party surfaces on the same harness, no SDK dance to integrate. |
SHEET 03 · Rev. A · Workflows (GA)
Durable execution, without the orchestrator.
Vercel Workflows is generally available: write long-running TypeScript or Python functions that survive crashes, retries, and regional failures — no Temporal, no BullMQ, no separate control plane to babysit. For anyone running AI agents that step out for hours, this is the primitive.
// fig 1 — a step that sleeps 30 minutes now costs nothing & resumes on the right instance.
Read the blueprint →How GitHub uses eBPF to catch the circular dep.
A kernel-level tripwire now detects circular dependencies during deploys — long before the pipeline notices, and long before an on-call does.
Deploy-time safety tends to live in CI — sequenced tests, policy checks, gates. GitHub's team argues that by the time a dependency cycle shows up in CI, the graph has already drifted: the problem is lurking in the runtime behaviour of the deploy tooling, not the YAML.
So they put the check where the runtime is: in the kernel, via eBPF. The probe watches the syscalls that the deploy orchestrator actually makes, and trips the second a cycle closes, before any downstream job gets the chance to start misbehaving.
The post is a good read on when to stop reaching for more CI policy and start reaching for an observer that lives closer to the metal. It also lands with a frankness you rarely see from infra teams: the motivation was an outage, not a conference talk.
Net: if you've got a control plane that talks to a lot of small services and orchestration gremlins, consider whether eBPF is the missing tracer — not just for performance, but for correctness.
Gallery of Node.js Observability · Panel vii
A better way than monkey-patching.
Sentry walks through Node's diagnostics-channel API — how a library can voluntarily emit telemetry from inside its own call sites instead of being hot-swapped from outside. Less fragile, less magic, and a path to observability that does not break the next minor.
Read the field notes →TAKE 01 · P99 · FEATURE FLAGS SVC
Rayon meets Tokio, and p99 blinks.
PostHog's feature-flag service was bleeding tail latency. The fix wasn't a magic allocator — it was separating the Tokio async runtime from the Rayon CPU pool, adding real backpressure, and rewriting dependency resolution with Kahn's algorithm. Twenty-six-fold wins rarely come from one trick; this one is a masterclass in where to look.
p99 · feature-flag evaluation · before / after
Read the postmortem →Your eval noise may be bigger than the model gap.
Quantifying infrastructure noise in agentic coding evals: the team ran the same benchmark under varying shell, container, and filesystem configurations — and the spread between setups, on a single model, was large enough to swallow the advertised delta between top models.
The practical takeaway: before you ship a blog post comparing Opus 4.7 to your last-month run, pin your harness. Shell version, sandbox mode, network policy, temp-file cleanup. If you're not doing it, you are publishing noise.
Then measure again.
Signed,Anthropic Engineering
Read the memo →a quiet argument, from Vercel —
Infrastructure, written for agents.
The post's claim: the next generation of platforms will not be "human-ops plus a Copilot." They will be designed, from the API boundary inwards, for an agent to drive — self-describing, self-healing, and forgiving of retries. The manual console is a rounding error.
"If the primitive isn't safe to call a thousand times with a stochastic caller, it isn't really an agent primitive."
— Vercel · Engineering
Read the essay →Postcard from the alignment desk
Alignment research, automated.
Clark's #454 leads with a startling result: Anthropic's automated alignment agents outperformed human researchers on weak-to-strong supervision tasks. Pair that with Huawei's HiFloat4 outperforming MXFP4 on chip-efficiency tests, and a safety deep-dive on a Chinese model, and you have one of the denser issues this year.
Read issue 454 →- Automated alignment research beats humans on weak-to-strong
- HiFloat4: Huawei's new chip-number format
- Safety audit of a Chinese frontier model
- Follow-ups on last week's MirrorCode piece
VIA @seeallochnaya · SOURCE · ANDON LABS
An agent rented a shopfront in SF.
Luna — a Claude Sonnet 4.6-based agent — was given $100k and a three-year lease in San Francisco, then left to it. She hired staff, negotiated suppliers, applied for a business loan, and picked the merchandise. No human authorization loop. The store is open.
Read the thread →That's today.
Ten picks from Anthropic, OpenAI, Vercel, GitHub, Sentry, PostHog, Import AI, and the Telegram desk of @seeallochnaya. Back tomorrow at 08:00 Zürich.
Sources
anthropic.com/news
anthropic.com/engineering
openai.com
vercel.com/blog
github.blog/engineering
sentry.engineering
posthog.com
jack-clark.net
t.me/s/seeallochnaya
Rubric
AI tools · creative software · dev tools · privacy · science · the practical. If we can't imagine you using it tomorrow, it doesn't run.
Colophon
Issue 003 assembled 08:00 CET, Tue 21 Apr 2026.