Ephemeris.
Ten picks from seven corners of the web — heavy on dev primitives today, because the platforms all shipped at once.
Codex is now the whole desk.
OpenAI's Codex app ships a bundle that used to be four separate products: computer use, in-app browsing, image generation, memory, and a plugin surface. The framing is modest — "developer workflows" — but the direction is not. Codex is being positioned as the layer where your other tools become callable, on both macOS and Windows. If you've been waiting for the moment to move serious work out of the browser and into an assistant, this is it.
Read on OpenAI →A safer way to skip permissions.
Claude Code's new auto mode sits between "approve every tool call" and "YOLO --dangerously-skip-permissions." Anthropic's engineers describe a sandbox-plus-policy layer that lets an agent run unattended while still refusing the operations you never want it to try. The post is a recipe for how to wire one up in your own harness.
Don't remove the prompts. Remove the need for prompts.Read the engineering note →
22%
Lossless LLM compression, inference-time.
Cloudflare's research team describes Unweight, a compression scheme that shaves up to 22% off a model's footprint with no measurable quality loss. It's applied at inference, which means no retraining, no quantization knobs — just smaller weights flowing through the same pipeline. They're deploying it across the network.
- 22%
- Footprint cut
- 0
- Quality loss
- ∀
- Provider-agnostic
SQLite that writes through S3.
Ben Johnson's new Litestream Writable VFS turns the pattern that powered half the indie stack — SQLite + replicated streams — into something that writes. The database file is virtual; the pages live in S3-compatible storage; you keep the `sqlite3` binary.
The practical upshot is that you can put a SQLite-shaped workload on serverless compute without reaching for Postgres or worrying about which pod owns the disk.
╭──────────────────────────╮ │ litestream vfs │ │ db → pages → s3 │ │ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓░ 91% │ ╰──────────────────────────╯
The infra is the benchmark.
Anthropic's engineers ran the same agentic coding eval on the same model, over and over, varying only the surrounding harness — CPU, disk, network jitter, container image. The spread in scores was large enough to reorder the leaderboard. If you're reading a benchmark, you're reading a benchmark of the pipeline.
The practical advice is to run every new model on your own harness before you believe anyone's numbers, including your own from last week.
The broader point is that "X% on SWE-bench" is a claim about a test rig, and test rigs do not stand still. Shared rigs, with logged environments, are the only version that scales.
Useful nerd-snipe: write down your harness version next to every eval you run.
Six ways to break an AI agent.
Jack Clark's weekly rounds up a new paper taxonomising six classes of attack that land against agents operating in the real world — prompt smuggling, environment poisoning, tool-call hijacks, and three quieter ones. Read it as a defender's checklist. The companion item on "gradual disempowerment" is more uncomfortable and harder to action.
Catch bad deploys in the kernel.
Lawrence Gripper and Aleksey Levenstein walk through how GitHub uses eBPF probes to detect circular dependencies in their deployment tooling before they ship. The win: the check runs everywhere, in the kernel, without a sidecar. The broader win: once you see syscalls as first-class data, a class of "works on my machine" bugs disappears.
One photo, a whole 3D world.
- Tencent's world model takes a single 2D image and generates an editable 3D scene around it.
- The output is a scene graph, not a mesh blob — which means you can actually move things.
- Runway-style video generation is now the boring baseline; this is the next frontier.
- The toolchain for creators is shifting from "prompt a clip" to "prompt a world and film inside it."
Long-running functions, finally boring.
Vercel Workflows went GA this week — TypeScript or Python functions that survive process restarts, retries, and hours-long external waits without a separate orchestrator. The pitch is simple: write your code as if it runs for 40 minutes; the platform handles the durable part. If you've been queueing jobs just to get checkpointing, delete that queue.
export default workflow(async (step) => {
const job = await step.call(startJob)
await step.sleep("1h") // survives restarts
return step.call(finalize, job)
})
$ _
How to shave a sandbox.
Vercel's sandbox team walks through the five knobs they turned to cut snapshot-restore time on the hot path. None is novel on its own; the payoff is in stacking them.
That's all for today.
Ten picks from OpenAI, Anthropic Engineering, Cloudflare, Fly.io, Import AI, GitHub, Vercel, and one Telegram chaser. Back tomorrow at 08:00 Zürich.
Sources
openai.com/news
anthropic.com/engineering
blog.cloudflare.com
fly.io/blog
jack-clark.net
github.blog/engineering
vercel.com/blog
t.me/denissexy
Rubric
AI tools · creative software · dev tools · privacy · science · the practical. If we can't imagine you using it tomorrow, it doesn't run.
Colophon
Set in Fraunces & Inter (Google Fonts), with JetBrains Mono on the terminal pages. Hand-laid in HTML/CSS. Issue 002 assembled 08:00 CET, Mon 20 Apr 2026. Sentry & PostHog blogs declined to render; noted and skipped.