Ephemeris.
The supply chain came for Codex.
An attacker compromised TanStack on npm and slipped a payload into a dependency Codex's macOS app pulled in. OpenAI rotated its code-signing certificate and is forcing every Codex desktop user to update by 12 June, or the binary stops launching. The post is a tight, useful read on how a 2026-era lab triages a third-party supply-chain incident in public.
Codex finally gets a Windows playpen.
For months, "Codex on Windows" meant trusting the agent with your whole filesystem. The new sandbox confines reads/writes to a working directory, blocks outbound network by default, and routes anything else through the existing approval flow. It is the Windows counterpart to the macOS isolation OpenAI shipped earlier this month — and the prerequisite for letting Codex run unattended on a developer box at all.
┌──────────────────────────────────────────────────────┐ │ codex-host.exe │ │ │ │ ┌────────────────────────────┐ │ │ │ sandbox · job-object │ ← fs scoped │ │ │ │ │ │ │ agent · child process │ ← net deny-list│ │ │ shell · child process │ │ │ └─────────────┬──────────────┘ │ │ │ approval-prompt │ │ ▼ │ │ operator: allow / deny / always-allow │ └──────────────────────────────────────────────────────┘
Browser Run leaves the Worker.
Cloudflare's headless-Chromium service has moved off the Worker isolate model and onto Containers. The upshot is the boring kind that matters for agent pipelines: real persistent disk, longer-running pages, more concurrent tabs per session, and usage limits roughly an order of magnitude higher than before. If your agent's bottleneck was "we keep restarting the browser", read this.
"Pages that previously timed out at 30 seconds now run for as long as you keep them open." — Cloudflare engineering
What AI traffic actually looks like.
Vercel published seven months of AI Gateway data — the routing layer it offers in front of Anthropic, OpenAI, Google, and the rest. The headline is not which model "won"; it is the long-tail shape. Hundreds of distinct models in production, tool calls eating more tokens than chat completions, and a stubborn fraction of traffic that still pins to one vendor for latency reasons alone.
- Tokens processed
- 5.1trillion
- Distinct models
- 340+
- Months of data
- 7months
DeepSeek v4, page by page.
An interactive walkthrough of the DeepSeek v4 paper, with roughly ninety inline notes covering the MoE router, attention shape, training regime, and the bits the paper is quiet about. If you have skimmed v4 twice and still feel hand-wavy on its routing changes versus v3, this is the cheapest way to actually understand the diff before your next architecture meeting.
Among the things the walkthrough flags: the v4 router is not just a wider v3, the gating is restructured to keep load balancing without explicit auxiliary loss, and the attention head count changes for a reason that the paper underplays. Read the notes attached to figures 3 and 7 first.
Best treated as the senior-engineer version of "read the model card." Skim the prose, but stop on the highlighted spans — those are where the paper is hiding something or eliding a design trade-off worth pulling on.
Your design system is a toolbox now.
A short course on building generative UI: instead of shipping a fixed dashboard, you expose your design system as a set of tools the agent calls to assemble the right view for the user's question. The interesting claim — and it is a claim, not a settled fact — is that the design system stops being human documentation and starts being a typed API the model invokes.
Tool 01
Components become callable: Chart, Table, Form, each with a schema.
Tool 02
The agent picks the layout based on intent, not on a screen name.
Tool 03
Edge cases get one-off UIs that can be promoted back into the system.
A bad week for both defenders.
Zvi's weekly round-up makes an awkward case: the recent run of npm, Linux-kernel and registry-level incidents is not separate from the AI-governance debate — it is the same problem dressed differently. When the same coding agents are simultaneously shipping fixes and writing the next round of malicious packages, "let the labs self-regulate" reads less like restraint and more like a category error.
"If we cannot keep the supply chain we already have safe from humans, we are not going to keep it safe from agents at the same prices, on the same staff, with the same incentives."
— Zvi Mowshowitz, 13 MayThat's today.
Seven items, picked for the senior engineer or founder with ten minutes before the first meeting. Two flavours of sandboxing, one large-scale telemetry release, a paper walkthrough, a UI shift, and a policy read to close.