// Tuesday Edition · Vol. 031
Audits
Today's issue is about the machinery that watches the machine: open-sourced alignment tools, government cyber benchmarks, security LLMs pointed at production code, and a 2028 forecast neither side wants to ignore.
A new optimizer kills neurons. Aurora explains why.
Jack Clark's Import AI 457 packs three uncomfortable signals into one issue: the Aurora optimizer that finally fixes Muon's silent neuron-death problem, a twenty-year-old precision-calculation virus that reads like a template for an AI-targeted Stuxnet, and a serious attempt to define positive alignment — research aimed at human flourishing, not only at not-harming.
# muon.step() → some neurons drift to zero variance and stop learning for p in params: g = grad(p) m = momentum_update(m, g) # muon: spectral norm u = newton_schulz(m) # cursed: explodes a tail of units # aurora: scale per-unit, then constrain — neurons stay alive for p in params: g = grad(p) m = momentum_update(m, g) s = per_unit_scale(m) # <- the fix u = newton_schulz(m / s) * s
We pointed a security LLM at our own infrastructure code.
Project Glasswing is Cloudflare's pre-flight check on the new generation of security-focused frontier models. They ran Mythos and peers against real production code — auth, ingress, rate-limiting — and logged what it caught, what it missed, and how often it confidently invented bugs that did not exist.
- Target
- Live Cloudflare infrastructure source — not a benchmark fork.
- Models
- Mythos (Anthropic) · [redacted] · [redacted]
- Hits
- Confirmed previously-unknown classes of unsafe library use; missed a sizeable share of business-logic flaws.
- Misses
- Hallucinated vulnerabilities that diffed clean and reviewer-rejected at high rates.
- Verdict
- Useful as a force-multiplier under human triage. Not yet trustworthy unsupervised.
Anthropic donates its alignment audit to everyone.
Petri is the internal instrument Anthropic has used to interrogate Claude for agentic misalignment. They've now open-sourced it — donating the same tool that scrutinizes their own production models to anyone shipping an agent.
The pitch is unusually direct: alignment audits should not be a club good. Petri runs Claude — or any model wired to its harness — through structured probes designed to expose the specific failure modes Anthropic worries about. Reward hacking, deceptive compliance, capability sandbagging. The probes are the catalog Anthropic uses on itself.
The donation matters more than the tool. The signalling shifts: a frontier lab is saying out loud that "we use our own evals" is not a moat. If your agent stack does not yet have a permanent audit pipeline, this is the cheapest possible Tuesday-afternoon adoption.
Government auditors give GPT-5.5 a barely-passing grade on cyber.
The UK AI Security Institute's evaluation of GPT-5.5 finds only marginal improvement on multi-stage offensive cyber tasks over the prior generation. Useful as a public capability marker — and as a quiet correction to the "every release is a jump" narrative procurement teams keep buying.
Two ways the next two years end. Both are plausible.
Anthropic's policy team sketches a pair of 2028 scenarios for global AI leadership — and the investments and decisions that push toward each. A reading list more than a manifesto, but the more useful document for the founders and engineers who keep being asked "where is this all going."
Scenario A · Bounded
Frontier scaling slows; capability gains come from harnesses, evals, and tooling. The moat moves to integration depth and to the model's permission to act. Auditors and standards bodies become load-bearing.
Scenario B · Steep
Scaling continues unbroken; capability jumps outpace governance. Concentration risk hardens around the labs with the most compute. The marginal year of policy work matters more than any single benchmark.
SEVEN IDEAS FROM CODE WITH CLAUDE.
Bayram Annakov's debrief from Code with Claude 2026, distilled. Pick one, ship it this week. The conference was full of opinions; the through-line is that the infrastructure around the model is the lever, not the model.
- 01Use a stronger model as advisor to a weaker model in your agent loop.
- 02Plan around an agent half-life of 6–12 months. Don't tightly couple.
- 03The moat is the infrastructure around the model, not the model.
- 04Skills > prompts. Skills are versionable, testable, transferable.
- 05Verification before completion is the single highest-leverage habit.
- 06Build evals before you build features. Treat them like CI, not QA.
- 07Make agents grabbable: one task = one issue = one PR.
An open-source rival to Claude Design.
Open Design ships with nineteen Skills, seventy-one design-system templates, multi-format export, and a model-agnostic harness that runs against Claude, GPT, and Gemini APIs. The "designer agent" category has its first credible commons.
The slop tide rises. Stop sounding like the tide.
AI-generated content has finally saturated the feeds — and the marginal post now competes against the median noise. Bullas's argument is simple, slightly dull, and correct: in a flood, the only thing that floats is a voice.
The Diagnosis
The tax on attention has flipped. Five years ago a polished generic post got a fair hearing. Today the same post reads as a tell that nobody on the other end is real — and readers route around it the way they route around captcha noise.What Still Works
First-person experience with a named author. Numbers tied to the author's own work, not lifted from a dashboard screenshot. An opinion the author would still hold without an audience. Specifics — model names, dates, the exact moment something broke.The Cheaper Way
Stop drafting in the model's voice and editing toward yours; draft in yours and let the model fact-check. Reverse the polarity. The output is shorter, less symmetric, more linkable. It is also slower to make, which is now the point.That's today.
Eight stories on the apparatus that watches the model: an open-sourced alignment harness, a government cyber benchmark, a security LLM pointed at real infrastructure, a forecast that prices in two futures, plus a conference debrief, a designer commons, and a content reset.
| Issue | 031 · Tuesday 19 May 2026 · Zürich |
|---|---|
| Sources fed | Anthropic Research · Cloudflare · Jack Clark / Import AI · AISI · Jeff Bullas · GitHub · YouTube |
| Via channels | @seeallochnaya · @ProductsAndStartups · @TochkiNadAI |
| Rubric | AI tools you could adopt this week · Creative software · Dev tools & agentic coding · Privacy & security · Science & research with a practical kernel · Anything immediately actionable |
| Next issue | Wednesday 20 May 2026 · 08:00 Zürich |