EPHEMERIS No. 002 Mon · 20 Apr 2026 · Zürich

Ten picks from seven corners of the web — heavy on dev primitives today, because the platforms all shipped at once.

01Codex Becomes an Operating LayerOpenAI 02Auto Mode Gets a Safer Kill-SwitchAnthropic 0322% Smaller LLMs, Zero LossCloudflare 04SQLite Goes Read-Write Over S3Fly.io 05The Infra Is the BenchmarkAnthropic 06Breaking AI Agents: Six ClassesImport AI 07eBPF Catches Bad DeploysGitHub 08One Photo, One 3D Worldvia @denissexy 09Durable Execution Goes GAVercel 10Inside a Sandbox Cold StartVercel

01 / 10 OpenAI · Dev Tools · Agents

The Cover Story

Codex is now the whole desk.

OpenAI's Codex app ships a bundle that used to be four separate products: computer use, in-app browsing, image generation, memory, and a plugin surface. The framing is modest — "developer workflows" — but the direction is not. Codex is being positioned as the layer where your other tools become callable, on both macOS and Windows. If you've been waiting for the moment to move serious work out of the browser and into an assistant, this is it.

AI · SDK OpenAI, product note Posted 16 Apr

Read on OpenAI →

02 / 10 Anthropic Engineering · Agentic Coding

⛨

Engineering

A safer way to skip permissions.

Claude Code's new auto mode sits between "approve every tool call" and "YOLO --dangerously-skip-permissions." Anthropic's engineers describe a sandbox-plus-policy layer that lets an agent run unattended while still refusing the operations you never want it to try. The post is a recipe for how to wire one up in your own harness.

Don't remove the prompts. Remove the need for prompts.

Read the engineering note →

03 / 10 Cloudflare · Research · Inference

22^%

Research

Lossless LLM compression, inference-time.

Cloudflare's research team describes Unweight, a compression scheme that shaves up to 22% off a model's footprint with no measurable quality loss. It's applied at inference, which means no retraining, no quantization knobs — just smaller weights flowing through the same pipeline. They're deploying it across the network.

22%: Footprint cut
0: Quality loss
∀: Provider-agnostic

Read the paper →

04 / 10 Fly.io · Storage · SQLite

~/app $ sqlite3 notes.db "insert into ..." && sync→s3

SQLite that writes through S3.

Ben Johnson's new Litestream Writable VFS turns the pattern that powered half the indie stack — SQLite + replicated streams — into something that writes. The database file is virtual; the pages live in S3-compatible storage; you keep the `sqlite3` binary.

The practical upshot is that you can put a SQLite-shaped workload on serverless compute without reaching for Postgres or worrying about which pod owns the disk.

  ╭──────────────────────────╮
  │  litestream vfs          │
  │    db  →  pages  →  s3   │
  │    ▓▓▓▓▓▓▓▓▓▓▓▓▓▓░  91%  │
  ╰──────────────────────────╯

$ open fly.io/blog →

05 / 10 Anthropic Engineering · Evals

Deep dive

The infra is the benchmark.

Anthropic's engineers ran the same agentic coding eval on the same model, over and over, varying only the surrounding harness — CPU, disk, network jitter, container image. The spread in scores was large enough to reorder the leaderboard. If you're reading a benchmark, you're reading a benchmark of the pipeline.

The practical advice is to run every new model on your own harness before you believe anyone's numbers, including your own from last week.

The broader point is that "X% on SWE-bench" is a claim about a test rig, and test rigs do not stand still. Shared rigs, with logged environments, are the only version that scales.

Useful nerd-snipe: write down your harness version next to every eval you run.

Read the full study →

06 / 10 Import AI · Research & Safety

Issue · 453

Six ways to break an AI agent.

Jack Clark's weekly rounds up a new paper taxonomising six classes of attack that land against agents operating in the real world — prompt smuggling, environment poisoning, tool-call hijacks, and three quieter ones. Read it as a defender's checklist. The companion item on "gradual disempowerment" is more uncomfortable and harder to action.

6 Attack classes · against live agents

Read Import AI 453 →

07 / 10 GitHub Engineering · Infra & Security

Field report · deployment

Catch bad deploys in the kernel.

Lawrence Gripper and Aleksey Levenstein walk through how GitHub uses eBPF probes to detect circular dependencies in their deployment tooling before they ship. The win: the check runs everywhere, in the kernel, without a sidecar. The broader win: once you see syscalls as first-class data, a class of "works on my machine" bugs disappears.

Bad-deploy rate, weekly, after rolling the probe across the fleet. Source: GitHub Engineering.

Read the postmortem →

08 / 10 via @denissexy · Creative AI

Four at once

One photo, a whole 3D world.

Tencent's world model takes a single 2D image and generates an editable 3D scene around it.
The output is a scene graph, not a mesh blob — which means you can actually move things.
Runway-style video generation is now the boring baseline; this is the next frontier.
The toolchain for creators is shifting from "prompt a clip" to "prompt a world and film inside it."

See the channel →

09 / 10 Vercel · Durable Execution

Platform update

Long-running functions, finally boring.

Vercel Workflows went GA this week — TypeScript or Python functions that survive process restarts, retries, and hours-long external waits without a separate orchestrator. The pitch is simple: write your code as if it runs for 40 minutes; the platform handles the durable part. If you've been queueing jobs just to get checkpointing, delete that queue.

$ cat workflow.ts
export default workflow(async (step) => {
  const job = await step.call(startJob)
  await step.sleep("1h")  // survives restarts
  return step.call(finalize, job)
})
$ _

Read the launch note →

10 / 10 Vercel · Engineering Deep-Dive

Cold-start recipe

How to shave a sandbox.

Vercel's sandbox team walks through the five knobs they turned to cut snapshot-restore time on the hot path. None is novel on its own; the payoff is in stacking them.

01

Parallel fetch

Snapshot chunks

02

Stream decompress

No staging file

03

Local NVMe

Hot cache

04

Page-in on demand

Lazy restore

05

Warm pool

Zero cold

Read the deep-dive →

End of issue 002Back to top ↑

That's all for today.

Ten picks from OpenAI, Anthropic Engineering, Cloudflare, Fly.io, Import AI, GitHub, Vercel, and one Telegram chaser. Back tomorrow at 08:00 Zürich.

Sources

openai.com/news
anthropic.com/engineering
blog.cloudflare.com
fly.io/blog
jack-clark.net
github.blog/engineering
vercel.com/blog
t.me/denissexy

Rubric

AI tools · creative software · dev tools · privacy · science · the practical. If we can't imagine you using it tomorrow, it doesn't run.

Colophon

Set in Fraunces & Inter (Google Fonts), with JetBrains Mono on the terminal pages. Hand-laid in HTML/CSS. Issue 002 assembled 08:00 CET, Mon 20 Apr 2026. Sentry & PostHog blogs declined to render; noted and skipped.

Ephemeris.