Ephemeris · Issue 025 Wednesday · 13 May 2026 · Zürich

WED
13 · MAY

Ephemeris · 025 01 / 06 · Cloudflare

Networking · Postmortem

When idle isn't idle.

Cloudflare watched QUIC throughput plummet on certain machines and traced it home. A Linux kernel optimisation kept telling the stack the connection had gone idle while real traffic flowed; CUBIC dutifully collapsed the congestion window to its minimum and refused to grow it back. The bug had nothing to do with QUIC.

symptom cwnd pinned at floor

stack quiche · linux 6.x · cubic

false cue tcp_app_limited → "idle"

real state congested egress

fix don't trust the kernel about your own flow

Read the postmortem →

Ephemeris · 025 02 / 06 · Stanford · via @ProductsAndStartups

Security · Vibe coding

Pure vibing ships nine times the vulns.

Six thousand real coding sessions, audited end to end. When developers let the agent drive without review, vulnerabilities multiply 9× compared to human–AI collaboration. Even with review, only 44% of AI-suggested code survives to a commit. The headline isn't "AI is dangerous" — it's that the loop wants a human in it.

9×

more vulns, pure vibing

44%

AI code that ships

6k

real sessions audited

Read the brief →

Ephemeris · 025 03 / 06 · ProgramBench · via @seeallochnaya

Benchmark · Models

A coding benchmark just solved itself.

ProgramBench measures agents on full software-engineering tasks, not toy snippets. This week, GPT-5.5 at xhigh reasoning closed its first complete task — and pushed 13.5% of programs past the 95%-tests-pass threshold. Claude Opus 4.7 trails materially. Treat it as a snapshot, not a winner — both labs ship fast.

ProgramBench · xhigh reasoning

GPT-5.513.5% ★ 1 full solve

Claude Opus 4.7trails

prior leader~0% full

Read the benchmark →

Ephemeris · 025 04 / 06 · via @ProductsAndStartups

Product · Strategy

Slot machine. Or printer.

Two product shapes are emerging for generative AI. The casino: high-variance, cheap pulls, the user runs the model ten times to get one keeper. The printer: one disciplined pull, expected to land on the first try. Each demands a different unit economics, a different UI, a different promise. Pick on purpose, not by accident.

The slot machine

Cheap inference, many tries, surprise as feature. Image gen, "draft me ten taglines," generative play. Margin lives in volume; trust lives in the dice.

The printer

Expensive inference, one shot, no retry expected. Tax filings, code review, contract drafting. Margin lives in correctness; trust lives in the receipt.

Read the framing →

Ephemeris · 025 05 / 06 · GitHub · via @ProductsAndStartups

Skills · Creative tools

Claude makes the slides. This skill fixes them.

Bayram Annakov packaged the post-production pass for AI-generated decks into a Claude skill called Slide-Inspector. It hunts the usual failure modes — text overflowing the box, images stretched to nonsense, contrast too low for projection — and rewrites the offending slide in place. Drop it in, run it after generation, ship the deck.

i. generate the deck as usual ii. point Slide-Inspector at the .pptx iii. it audits every slide: overflow, sizing, contrast iv. it patches the breakages — text reflowed, images recropped v. review, present

Read the skill →

Ephemeris · 025 · The Daily Spread 06 / 06 · Anthropic · via @rvnikita_blog

Dev tools · Claude Code

Every Claude Code session, one window.

Anthropic introduced Agent View, a unified UI for managing parallel Claude Code sessions without dragging tmux into the conversation. It is a small change with an outsized effect: the developer can finally watch six agents at once and steer the one drifting off the rails.

For the past year, anyone running several agents at once has stitched together their own console — split panes, browser tabs, a notebook of session IDs. Agent View ends that. Each running session gets a card, each card surfaces the agent's last action and its current ask, and switching between them is one click.

The interesting part is not the UI; it is the implied workflow. Anthropic is conceding, in product form, that one developer routinely runs many agents and needs to triage them like patients in an ER. The skill it rewards is not coding, but choosing which agent to interrupt next.

For solo founders and small teams this lands close to a force multiplier. For larger orgs it raises a different question: how do you review six agents' worth of commits, and who owns the rejected diffs? Worth a Monday morning of experiments.

Read the announcement →

End of issue 025 Back to top ↑

colophon

That's Wednesday.

issue: 025 · 13 May 2026 · Europe/Zürich
today: Cloudflare · Stanford · ProgramBench · @ProductsAndStartups · GitHub · Anthropic
rubric: AI tools, dev tools, security, benchmarks, creative software — actionable for engineers and founders this week.

tomorrow: 08:00 Zürich.

WED13 · MAY