AUTOMATE.
A single word holds today's issue. Models train models, agents ship agents, and the loop you've been writing by hand quietly becomes the product.
AI is starting to train the next AI.
Jack Clark's read of the trajectory: by end of 2028, frontier systems may run their own research programmes — write code, design experiments, train successors. The skill ladder is not flat. Coding has come fastest; alignment work is the dark horse.
Once a lab can spend a month of compute on a question it would have spent a quarter of human time on, the speed of progress stops being a function of how many people you can hire. Import AI 455
A security harness, not a service.
deepsec runs on your infrastructure with your keys, scans the codebase for vulnerabilities, files patches as PRs. Vercel open-sourced the lot. Run it nightly and call it a junior security engineer that doesn't sleep.
$ deepsec scan ./src --provider anthropic → indexed 2,418 files in 11.4s → 6 hypotheses queued [ HIGH ] auth/session.ts:42 session id is RNG-derived; not cryptographically random [ HIGH ] api/upload.ts:118 path traversal — basename() not enforced before fs.write [ MED ] middleware.ts:7 CORS reflects Origin without allowlist $ deepsec patch --interactive → 3 PRs opened · run with --auto for unattended mode
Agents shipped the agents.
Cofounder is a multi-tenant AI co-founder for solo builders. The General Intelligence team built it on top of Vercel by deploying their own coding agents — which then deployed everything else. The loop runs on Workflows; humans review.
Real-time voice without the pause.
OpenAI rebuilt its WebRTC stack to do conversational turn-taking at sub-200ms globally — VAD on the edge, rolling jitter buffers, hand-tuned NACK loops. A rare piece of infrastructure writing from a model lab. Read it for the architecture, not the product.
DuckDB and ClickHouse.
Not a death-match. PostHog runs both — DuckDB single-node for embedded, lightweight queries; ClickHouse for the big iron. The post is a rare apples-to-oranges comparison written by people who actually run them in production, with the failure modes named.
| Dimension | DuckDB | ClickHouse |
|---|---|---|
| Topology | single-node, in-process | distributed cluster |
| Sweet spot | < 10 GB, ad-hoc | PB-scale, concurrent |
| Cold start | < 50 ms | seconds |
| Concurrency | 1 writer, n readers | thousands |
| Used at PostHog for | local notebooks, CI | events, sessions, prod |
The new prompt engineering is agentic engineering.
Karpathy on what separates teams that ship with AI from teams that just chat with it. Workflows, evals, tool composition. Founder takeaway — labs aren't going to fill the gaps where their data and incentives don't already point. Pick a blind spot.
- Uneven intelligence is a feature. Models fail at trivial things and excel at hard ones. Build around the jagged frontier.
- Eval loops over prompts. A prompt is one sample; an eval is a measurement. Treat the second as the unit of work.
- Pick the blind spot. Verticals with little data and weak feedback are exactly where labs won't compete.
- Compose, don't chat. The interface is workflow + tools + memory. Chat is the demo, not the product.
The optimisation loop, applied to everyone.
Karpathy's autoresearch loop — propose, evaluate, refine — works far beyond ML training. Lead qualification, hackathon judging, employee skill rubrics: the structure ports cleanly. The bottleneck isn't the loop; it's writing a fitness function that means anything.
The loop
Generate candidates. Score them against a fitness function. Refine the best. Repeat. The shape is the same whether you're tuning a transformer or grading sales prospects, and the failure modes rhyme: weak generators, noisy evaluators, fitness functions that proxy the wrong thing.
Where it breaks
Eighty percent of the work happens before the loop ever runs. Idea generation, the choice of fitness function, the calibration of the judge. Get those wrong and the loop will obediently optimise toward something useless. Get them right and the loop is almost mechanical.
The skill
Bayram Annakov's repo packages the pattern as a Claude skill — invoke it, hand it a problem, watch it iterate. Useful as a starting point, more useful as a forcing function: writing the fitness function makes you say what good actually looks like.
That's today.
Seven picks across model labs, infrastructure blogs, and one Russian-language Telegram channel. Today's thread was automation — of research, of security, of the optimisation loop itself.