Today's question
"When agents orchestrate agents,
who's watching?"
Codex inside a box with the door bolted shut.
OpenAI publishes how it actually runs Codex internally: sandboxed file roots, an approvals layer for risky ops, locked-down network policy, and telemetry on every shell call. The doc reads like a checklist any team shipping autonomous coding agents should steal.
Default sandbox · what Codex can / can't touch
- read & write under workdir
- run scoped shell commands with approval prompts
- emit structured telemetry on every action
- arbitrary outbound network egress
- writes outside the agent root
- long-lived credentials in env
A short list of MCP servers that actually pay rent.
PostHog's team waded through tens of thousands of MCP servers and surfaced the ones a startup engineer can plug in this afternoon — debugging in production, sprint planning, customer behaviour, and a few that quietly replace whole SaaS subscriptions.
4,063 errors closed before anyone opened the dashboard.
PostHog ran a month with agents triaging the error inbox: thousands of issues quietly resolved, noise suppressed, the rest routed to the team that owns the code. The interesting bit isn't the headline number — it's the column showing how many were the same flaky thing repeated.
| Action | Count | Share | Notes |
|---|---|---|---|
| Closed (resolved) | 4,063 | 66% | verified fix or intended behaviour |
| Suppressed (noise) | 1,751 | 28% | repeated patterns, third-party flakes |
| Routed to humans | 310 | 5% | tagged owner, posted in channel |
| Escalated (severity) | 52 | 1% | paged on-call |
| Total handled | 6,176 | 100% | one month, zero humans opening it first |
RE: stop monkey-patching every JS library you ship.
Sentry is upstreaming TracingChannel support into 44 JavaScript libraries so observability emits from the source instead of a brittle wrapper. The end of a fifteen-year hack — and a clean blueprint for any vendor still injecting at runtime.
- From
- Sentry Engineering
- To
- The JavaScript ecosystem
- Subject
- Native telemetry via diagnostics_channel
- Status
- Patches landing in 44 libraries · Node, Bun, Deno
- Action
- Drop your monkey-patch shims; subscribe to channels.
When agents orchestrate agents, who is watching?
A planner spawns a researcher who spawns a coder who calls another planner. Sentry's piece argues the failure mode that bites you isn't any single agent — it's the silence between two of them. Concrete patterns for tracing the gaps.
Seedance makes a splash, Nvidia lets a model design its own chips, robots learn not to forget.
DeepLearning's weekly digest on a fat issue: a generative-video model that closes the gap on Sora-class incumbents, AI in the silicon-design loop at Nvidia, and continual learning landing in real robots.
Seedance's open release lands close enough to closed-source video models that the practical question — which one is cheapest per acceptable second — finally has a real answer.
Nvidia's chip team describes a workflow where AI proposes layouts and floorplans the humans then pick from, shaving wall-clock weeks off the back end of a tape-out.
Continual-learning research reports robots that retain skills across tasks instead of catastrophically forgetting them between deployments — a quiet but load-bearing result.
Eight months of agentic coding, and the honeymoon is over.
Zvi's eighth round-up reads less like "look what's new" and more like a state-of-the-craft: where Claude Code and Codex genuinely move the needle, where they create new failure modes, and which workflows the steady users have actually settled into.
#8in series
Skim the table of contents first — Zvi's pieces reward jumping to the section that matches your current pain.
That's today.
Issue 021 · 09 May 2026 Today drew on OpenAI, PostHog, Sentry, The Batch, and Don't Worry About the Vase. Picks favour AI tools you can adopt this week, agentic-coding craft, and observability over the rest. Zürich · daily