Ephemeris · Issue 017 Tuesday · 05 May 2026 · Zürich
Ephemeris · Issue 017 01 / 07 · Import AI
Agents · Research

AI is starting to train the next AI.

Jack Clark's read of the trajectory: by end of 2028, frontier systems may run their own research programmes — write code, design experiments, train successors. The skill ladder is not flat. Coding has come fastest; alignment work is the dark horse.

Once a lab can spend a month of compute on a question it would have spent a quarter of human time on, the speed of progress stops being a function of how many people you can hire. Import AI 455
CLASSIFIED · ML/RESEARCH
Ephemeris · Issue 017 02 / 07 · Vercel
Security · Agents

A security harness, not a service.

deepsec runs on your infrastructure with your keys, scans the codebase for vulnerabilities, files patches as PRs. Vercel open-sourced the lot. Run it nightly and call it a junior security engineer that doesn't sleep.

$ deepsec scan ./src --provider anthropic
 indexed 2,418 files in 11.4s
 6 hypotheses queued

[ HIGH  ] auth/session.ts:42
            session id is RNG-derived; not cryptographically random
[ HIGH  ] api/upload.ts:118
            path traversal — basename() not enforced before fs.write
[ MED   ] middleware.ts:7
            CORS reflects Origin without allowlist

$ deepsec patch --interactive
 3 PRs opened · run with --auto for unattended mode
Ephemeris · Issue 017 03 / 07 · Vercel
Agents · Case Study

Agents shipped the agents.

Cofounder is a multi-tenant AI co-founder for solo builders. The General Intelligence team built it on top of Vercel by deploying their own coding agents — which then deployed everything else. The loop runs on Workflows; humans review.

1
human team, hundreds of agents
12k
PRs in beta, mostly self-reviewed
7d
from idea to multi-tenant launch
Ephemeris · Issue 017 04 / 07 · OpenAI
Voice · Infrastructure

Real-time voice without the pause.

OpenAI rebuilt its WebRTC stack to do conversational turn-taking at sub-200ms globally — VAD on the edge, rolling jitter buffers, hand-tuned NACK loops. A rare piece of infrastructure writing from a model lab. Read it for the architecture, not the product.

. The pause between turns is where conversation happens. Eliminate it and you have something that isn't conversation any more.
Ephemeris · Issue 017 05 / 07 · PostHog
Data · Engineering

DuckDB and ClickHouse.

Not a death-match. PostHog runs both — DuckDB single-node for embedded, lightweight queries; ClickHouse for the big iron. The post is a rare apples-to-oranges comparison written by people who actually run them in production, with the failure modes named.

DimensionDuckDBClickHouse
Topologysingle-node, in-processdistributed cluster
Sweet spot< 10 GB, ad-hocPB-scale, concurrent
Cold start< 50 msseconds
Concurrency1 writer, n readersthousands
Used at PostHog forlocal notebooks, CIevents, sessions, prod
Ephemeris · Issue 017 06 / 07 · via @ProductsAndStartups
Agents · Playbook

The new prompt engineering is agentic engineering.

Karpathy on what separates teams that ship with AI from teams that just chat with it. Workflows, evals, tool composition. Founder takeaway — labs aren't going to fill the gaps where their data and incentives don't already point. Pick a blind spot.

  1. Uneven intelligence is a feature. Models fail at trivial things and excel at hard ones. Build around the jagged frontier.
  2. Eval loops over prompts. A prompt is one sample; an eval is a measurement. Treat the second as the unit of work.
  3. Pick the blind spot. Verticals with little data and weak feedback are exactly where labs won't compete.
  4. Compose, don't chat. The interface is workflow + tools + memory. Chat is the demo, not the product.
Ephemeris · Issue 017 07 / 07 · via @ProductsAndStartups
Skills · Workflow

The optimisation loop, applied to everyone.

Karpathy's autoresearch loop — propose, evaluate, refine — works far beyond ML training. Lead qualification, hackathon judging, employee skill rubrics: the structure ports cleanly. The bottleneck isn't the loop; it's writing a fitness function that means anything.

The loop

Generate candidates. Score them against a fitness function. Refine the best. Repeat. The shape is the same whether you're tuning a transformer or grading sales prospects, and the failure modes rhyme: weak generators, noisy evaluators, fitness functions that proxy the wrong thing.

Where it breaks

Eighty percent of the work happens before the loop ever runs. Idea generation, the choice of fitness function, the calibration of the judge. Get those wrong and the loop will obediently optimise toward something useless. Get them right and the loop is almost mechanical.

The skill

Bayram Annakov's repo packages the pattern as a Claude skill — invoke it, hand it a problem, watch it iterate. Useful as a starting point, more useful as a forcing function: writing the fitness function makes you say what good actually looks like.

End of issue 017 Back to top ↑

That's today.

Seven picks across model labs, infrastructure blogs, and one Russian-language Telegram channel. Today's thread was automation — of research, of security, of the optimisation loop itself.

Sources today

  • jack-clark.net
  • vercel.com/blog
  • openai.com
  • posthog.com
  • via @ProductsAndStartups

Rubric

  • AI tools to adopt this week
  • Creative software
  • Dev tools & agentic coding
  • Privacy & security
  • Practical research
  • Actionable for engineers / founders