Ephemeris · Issue 025 Wednesday · 13 May 2026 · Zürich
Ephemeris · 025 01 / 06 · Cloudflare
Networking · Postmortem

When idle isn't idle.

Cloudflare watched QUIC throughput plummet on certain machines and traced it home. A Linux kernel optimisation kept telling the stack the connection had gone idle while real traffic flowed; CUBIC dutifully collapsed the congestion window to its minimum and refused to grow it back. The bug had nothing to do with QUIC.

symptom    cwnd pinned at floor
stack        quiche · linux 6.x · cubic
false cue    tcp_app_limited "idle"
real state   congested egress
fix          don't trust the kernel about your own flow
Ephemeris · 025 02 / 06 · Stanford · via @ProductsAndStartups
Security · Vibe coding

Pure vibing ships nine times the vulns.

Six thousand real coding sessions, audited end to end. When developers let the agent drive without review, vulnerabilities multiply 9× compared to human–AI collaboration. Even with review, only 44% of AI-suggested code survives to a commit. The headline isn't "AI is dangerous" — it's that the loop wants a human in it.

9×
more vulns, pure vibing
44%
AI code that ships
6k
real sessions audited
Ephemeris · 025 03 / 06 · ProgramBench · via @seeallochnaya
Benchmark · Models

A coding benchmark just solved itself.

ProgramBench measures agents on full software-engineering tasks, not toy snippets. This week, GPT-5.5 at xhigh reasoning closed its first complete task — and pushed 13.5% of programs past the 95%-tests-pass threshold. Claude Opus 4.7 trails materially. Treat it as a snapshot, not a winner — both labs ship fast.

ProgramBench · xhigh reasoning
GPT-5.513.5% ★ 1 full solve
Claude Opus 4.7trails
prior leader~0% full
Ephemeris · 025 04 / 06 · via @ProductsAndStartups
Product · Strategy

Slot machine. Or printer.

Two product shapes are emerging for generative AI. The casino: high-variance, cheap pulls, the user runs the model ten times to get one keeper. The printer: one disciplined pull, expected to land on the first try. Each demands a different unit economics, a different UI, a different promise. Pick on purpose, not by accident.

The slot machine

Cheap inference, many tries, surprise as feature. Image gen, "draft me ten taglines," generative play. Margin lives in volume; trust lives in the dice.

The printer

Expensive inference, one shot, no retry expected. Tax filings, code review, contract drafting. Margin lives in correctness; trust lives in the receipt.

Ephemeris · 025 05 / 06 · GitHub · via @ProductsAndStartups
Skills · Creative tools

Claude makes the slides. This skill fixes them.

Bayram Annakov packaged the post-production pass for AI-generated decks into a Claude skill called Slide-Inspector. It hunts the usual failure modes — text overflowing the box, images stretched to nonsense, contrast too low for projection — and rewrites the offending slide in place. Drop it in, run it after generation, ship the deck.

i. generate the deck as usual ii. point Slide-Inspector at the .pptx iii. it audits every slide: overflow, sizing, contrast iv. it patches the breakages — text reflowed, images recropped v. review, present
Ephemeris · 025 · The Daily Spread 06 / 06 · Anthropic · via @rvnikita_blog
Dev tools · Claude Code

Every Claude Code session, one window.

Anthropic introduced Agent View, a unified UI for managing parallel Claude Code sessions without dragging tmux into the conversation. It is a small change with an outsized effect: the developer can finally watch six agents at once and steer the one drifting off the rails.

For the past year, anyone running several agents at once has stitched together their own console — split panes, browser tabs, a notebook of session IDs. Agent View ends that. Each running session gets a card, each card surfaces the agent's last action and its current ask, and switching between them is one click.

The interesting part is not the UI; it is the implied workflow. Anthropic is conceding, in product form, that one developer routinely runs many agents and needs to triage them like patients in an ER. The skill it rewards is not coding, but choosing which agent to interrupt next.

For solo founders and small teams this lands close to a force multiplier. For larger orgs it raises a different question: how do you review six agents' worth of commits, and who owns the rejected diffs? Worth a Monday morning of experiments.

End of issue 025 Back to top ↑
colophon

That's Wednesday.

issue
025 · 13 May 2026 · Europe/Zürich
today
Cloudflare · Stanford · ProgramBench · @ProductsAndStartups · GitHub · Anthropic
rubric
AI tools, dev tools, security, benchmarks, creative software — actionable for engineers and founders this week.

tomorrow
08:00 Zürich.