WED
13 · MAY
When idle isn't idle.
Cloudflare watched QUIC throughput plummet on certain machines and traced it home. A Linux kernel optimisation kept telling the stack the connection had gone idle while real traffic flowed; CUBIC dutifully collapsed the congestion window to its minimum and refused to grow it back. The bug had nothing to do with QUIC.
Pure vibing ships nine times the vulns.
Six thousand real coding sessions, audited end to end. When developers let the agent drive without review, vulnerabilities multiply 9× compared to human–AI collaboration. Even with review, only 44% of AI-suggested code survives to a commit. The headline isn't "AI is dangerous" — it's that the loop wants a human in it.
A coding benchmark just solved itself.
ProgramBench measures agents on full software-engineering tasks, not toy snippets. This week, GPT-5.5 at xhigh reasoning closed its first complete task — and pushed 13.5% of programs past the 95%-tests-pass threshold. Claude Opus 4.7 trails materially. Treat it as a snapshot, not a winner — both labs ship fast.
Slot machine. Or printer.
Two product shapes are emerging for generative AI. The casino: high-variance, cheap pulls, the user runs the model ten times to get one keeper. The printer: one disciplined pull, expected to land on the first try. Each demands a different unit economics, a different UI, a different promise. Pick on purpose, not by accident.
The slot machine
Cheap inference, many tries, surprise as feature. Image gen, "draft me ten taglines," generative play. Margin lives in volume; trust lives in the dice.
The printer
Expensive inference, one shot, no retry expected. Tax filings, code review, contract drafting. Margin lives in correctness; trust lives in the receipt.
Claude makes the slides. This skill fixes them.
Bayram Annakov packaged the post-production pass for AI-generated decks into a Claude skill called Slide-Inspector. It hunts the usual failure modes — text overflowing the box, images stretched to nonsense, contrast too low for projection — and rewrites the offending slide in place. Drop it in, run it after generation, ship the deck.
Every Claude Code session, one window.
Anthropic introduced Agent View, a unified UI for managing parallel Claude Code sessions without dragging tmux into the conversation. It is a small change with an outsized effect: the developer can finally watch six agents at once and steer the one drifting off the rails.
For the past year, anyone running several agents at once has stitched together their own console — split panes, browser tabs, a notebook of session IDs. Agent View ends that. Each running session gets a card, each card surfaces the agent's last action and its current ask, and switching between them is one click.
The interesting part is not the UI; it is the implied workflow. Anthropic is conceding, in product form, that one developer routinely runs many agents and needs to triage them like patients in an ER. The skill it rewards is not coding, but choosing which agent to interrupt next.
For solo founders and small teams this lands close to a force multiplier. For larger orgs it raises a different question: how do you review six agents' worth of commits, and who owns the rejected diffs? Worth a Monday morning of experiments.
That's Wednesday.
- issue
- 025 · 13 May 2026 · Europe/Zürich
- today
- Cloudflare · Stanford · ProgramBench · @ProductsAndStartups · GitHub · Anthropic
- rubric
- AI tools, dev tools, security, benchmarks, creative software — actionable for engineers and founders this week.
- tomorrow
- 08:00 Zürich.