Ephemeris · Issue 017 Tuesday · 05 May 2026 · Zürich

AUTOMATE.

A single word holds today's issue. Models train models, agents ship agents, and the loop you've been writing by hand quietly becomes the product.

Ephemeris · Issue 017 01 / 07 · Import AI

Agents · Research

AI is starting to train the next AI.

Jack Clark's read of the trajectory: by end of 2028, frontier systems may run their own research programmes — write code, design experiments, train successors. The skill ladder is not flat. Coding has come fastest; alignment work is the dark horse.

Once a lab can spend a month of compute on a question it would have spent a quarter of human time on, the speed of progress stops being a function of how many people you can hire. Import AI 455

CLASSIFIED · ML/RESEARCH

Read on →

Ephemeris · Issue 017 02 / 07 · Vercel

Security · Agents

A security harness, not a service.

deepsec runs on your infrastructure with your keys, scans the codebase for vulnerabilities, files patches as PRs. Vercel open-sourced the lot. Run it nightly and call it a junior security engineer that doesn't sleep.

$ deepsec scan ./src --provider anthropic
→ indexed 2,418 files in 11.4s
→ 6 hypotheses queued

[ HIGH  ] auth/session.ts:42
            session id is RNG-derived; not cryptographically random
[ HIGH  ] api/upload.ts:118
            path traversal — basename() not enforced before fs.write
[ MED   ] middleware.ts:7
            CORS reflects Origin without allowlist

$ deepsec patch --interactive
→ 3 PRs opened · run with --auto for unattended mode

Read on →

Ephemeris · Issue 017 03 / 07 · Vercel

Agents · Case Study

Agents shipped the agents.

Cofounder is a multi-tenant AI co-founder for solo builders. The General Intelligence team built it on top of Vercel by deploying their own coding agents — which then deployed everything else. The loop runs on Workflows; humans review.

1

human team, hundreds of agents

12k

PRs in beta, mostly self-reviewed

7d

from idea to multi-tenant launch

Read on →

Ephemeris · Issue 017 04 / 07 · OpenAI

Voice · Infrastructure

Real-time voice without the pause.

OpenAI rebuilt its WebRTC stack to do conversational turn-taking at sub-200ms globally — VAD on the edge, rolling jitter buffers, hand-tuned NACK loops. A rare piece of infrastructure writing from a model lab. Read it for the architecture, not the product.

“.” The pause between turns is where conversation happens. Eliminate it and you have something that isn't conversation any more.

Read on →

Ephemeris · Issue 017 05 / 07 · PostHog

Data · Engineering

DuckDB and ClickHouse.

Not a death-match. PostHog runs both — DuckDB single-node for embedded, lightweight queries; ClickHouse for the big iron. The post is a rare apples-to-oranges comparison written by people who actually run them in production, with the failure modes named.

Dimension	DuckDB	ClickHouse
Topology	single-node, in-process	distributed cluster
Sweet spot	< 10 GB, ad-hoc	PB-scale, concurrent
Cold start	< 50 ms	seconds
Concurrency	1 writer, n readers	thousands
Used at PostHog for	local notebooks, CI	events, sessions, prod

Read on →

Ephemeris · Issue 017 06 / 07 · via @ProductsAndStartups

Agents · Playbook

The new prompt engineering is agentic engineering.

Karpathy on what separates teams that ship with AI from teams that just chat with it. Workflows, evals, tool composition. Founder takeaway — labs aren't going to fill the gaps where their data and incentives don't already point. Pick a blind spot.

Uneven intelligence is a feature. Models fail at trivial things and excel at hard ones. Build around the jagged frontier.
Eval loops over prompts. A prompt is one sample; an eval is a measurement. Treat the second as the unit of work.
Pick the blind spot. Verticals with little data and weak feedback are exactly where labs won't compete.
Compose, don't chat. The interface is workflow + tools + memory. Chat is the demo, not the product.

Read on →

Ephemeris · Issue 017 07 / 07 · via @ProductsAndStartups

Skills · Workflow

The optimisation loop, applied to everyone.

Karpathy's autoresearch loop — propose, evaluate, refine — works far beyond ML training. Lead qualification, hackathon judging, employee skill rubrics: the structure ports cleanly. The bottleneck isn't the loop; it's writing a fitness function that means anything.

The loop

Generate candidates. Score them against a fitness function. Refine the best. Repeat. The shape is the same whether you're tuning a transformer or grading sales prospects, and the failure modes rhyme: weak generators, noisy evaluators, fitness functions that proxy the wrong thing.

Where it breaks

Eighty percent of the work happens before the loop ever runs. Idea generation, the choice of fitness function, the calibration of the judge. Get those wrong and the loop will obediently optimise toward something useless. Get them right and the loop is almost mechanical.

The skill

Bayram Annakov's repo packages the pattern as a Claude skill — invoke it, hand it a problem, watch it iterate. Useful as a starting point, more useful as a forcing function: writing the fitness function makes you say what good actually looks like.

Read on →

End of issue 017 Back to top ↑

That's today.

Seven picks across model labs, infrastructure blogs, and one Russian-language Telegram channel. Today's thread was automation — of research, of security, of the optimisation loop itself.

Sources today

jack-clark.net
vercel.com/blog
openai.com
posthog.com
via @ProductsAndStartups

Rubric

AI tools to adopt this week
Creative software
Dev tools & agentic coding
Privacy & security
Practical research
Actionable for engineers / founders

Archive ↩