Ephemeris · Issue 031 Tuesday · 19 May 2026 · Zürich
Ephemeris · Issue 031 01 / 8 · Import AI
AI Research · Optimizers

A new optimizer kills neurons. Aurora explains why.

Jack Clark's Import AI 457 packs three uncomfortable signals into one issue: the Aurora optimizer that finally fixes Muon's silent neuron-death problem, a twenty-year-old precision-calculation virus that reads like a template for an AI-targeted Stuxnet, and a serious attempt to define positive alignment — research aimed at human flourishing, not only at not-harming.

# muon.step()  →  some neurons drift to zero variance and stop learning
for p in params:
    g  = grad(p)
    m  = momentum_update(m, g)             # muon: spectral norm
    u  = newton_schulz(m)               # cursed: explodes a tail of units

# aurora: scale per-unit, then constrain — neurons stay alive
for p in params:
    g  = grad(p)
    m  = momentum_update(m, g)
    s  = per_unit_scale(m)              # <- the fix
    u  = newton_schulz(m / s) * s
Ephemeris · Issue 031 02 / 8 · Cloudflare
Security · Frontier Models

We pointed a security LLM at our own infrastructure code.

Project Glasswing is Cloudflare's pre-flight check on the new generation of security-focused frontier models. They ran Mythos and peers against real production code — auth, ingress, rate-limiting — and logged what it caught, what it missed, and how often it confidently invented bugs that did not exist.

Target
Live Cloudflare infrastructure source — not a benchmark fork.
Models
Mythos (Anthropic) · [redacted] · [redacted]
Hits
Confirmed previously-unknown classes of unsafe library use; missed a sizeable share of business-logic flaws.
Misses
Hallucinated vulnerabilities that diffed clean and reviewer-rejected at high rates.
Verdict
Useful as a force-multiplier under human triage. Not yet trustworthy unsupervised.
Ephemeris · Issue 031 03 / 8 · Anthropic Research
Alignment · Open Source

Anthropic donates its alignment audit to everyone.

Petri is the internal instrument Anthropic has used to interrogate Claude for agentic misalignment. They've now open-sourced it — donating the same tool that scrutinizes their own production models to anyone shipping an agent.

The pitch is unusually direct: alignment audits should not be a club good. Petri runs Claude — or any model wired to its harness — through structured probes designed to expose the specific failure modes Anthropic worries about. Reward hacking, deceptive compliance, capability sandbagging. The probes are the catalog Anthropic uses on itself.

The donation matters more than the tool. The signalling shifts: a frontier lab is saying out loud that "we use our own evals" is not a moat. If your agent stack does not yet have a permanent audit pipeline, this is the cheapest possible Tuesday-afternoon adoption.

Ephemeris · Issue 031 04 / 8 · AISI · via @seeallochnaya
Cyber · Government Benchmark

Government auditors give GPT-5.5 a barely-passing grade on cyber.

The UK AI Security Institute's evaluation of GPT-5.5 finds only marginal improvement on multi-stage offensive cyber tasks over the prior generation. Useful as a public capability marker — and as a quiet correction to the "every release is a jump" narrative procurement teams keep buying.

Evaluator
UK AI Security Institute (AISI)
.gov.uk
Subject
OpenAI GPT-5.5
May 2026
Domain
Multi-stage network exploitation, code-vuln analysis
cyber
Δ vs prior
Minimal — within noise on most task classes
marginal
Reading
A public capability marker, not a marketing chart
audit
Ephemeris · Issue 031 05 / 8 · Anthropic Institute
Policy · Forecast

Two ways the next two years end. Both are plausible.

Anthropic's policy team sketches a pair of 2028 scenarios for global AI leadership — and the investments and decisions that push toward each. A reading list more than a manifesto, but the more useful document for the founders and engineers who keep being asked "where is this all going."

Scenario A · Bounded

Frontier scaling slows; capability gains come from harnesses, evals, and tooling. The moat moves to integration depth and to the model's permission to act. Auditors and standards bodies become load-bearing.

Scenario B · Steep

Scaling continues unbroken; capability jumps outpace governance. Concentration risk hardens around the labs with the most compute. The marginal year of policy work matters more than any single benchmark.

Ephemeris · Issue 031 06 / 8 · via @ProductsAndStartups
Builder Notes · Conf Recap

SEVEN IDEAS FROM CODE WITH CLAUDE.

Bayram Annakov's debrief from Code with Claude 2026, distilled. Pick one, ship it this week. The conference was full of opinions; the through-line is that the infrastructure around the model is the lever, not the model.

  • 01Use a stronger model as advisor to a weaker model in your agent loop.
  • 02Plan around an agent half-life of 6–12 months. Don't tightly couple.
  • 03The moat is the infrastructure around the model, not the model.
  • 04Skills > prompts. Skills are versionable, testable, transferable.
  • 05Verification before completion is the single highest-leverage habit.
  • 06Build evals before you build features. Treat them like CI, not QA.
  • 07Make agents grabbable: one task = one issue = one PR.
Ephemeris · Issue 031 07 / 8 · GitHub · via @TochkiNadAI
Creative Tools · Open Source

An open-source rival to Claude Design.

Open Design ships with nineteen Skills, seventy-one design-system templates, multi-format export, and a model-agnostic harness that runs against Claude, GPT, and Gemini APIs. The "designer agent" category has its first credible commons.

01
02
03
04
05
06
07
08
09
10
11
12
Ephemeris · Issue 031 08 / 8 · Jeff Bullas
Content · Authenticity

The slop tide rises. Stop sounding like the tide.

AI-generated content has finally saturated the feeds — and the marginal post now competes against the median noise. Bullas's argument is simple, slightly dull, and correct: in a flood, the only thing that floats is a voice.

The Diagnosis

The tax on attention has flipped. Five years ago a polished generic post got a fair hearing. Today the same post reads as a tell that nobody on the other end is real — and readers route around it the way they route around captcha noise.

What Still Works

First-person experience with a named author. Numbers tied to the author's own work, not lifted from a dashboard screenshot. An opinion the author would still hold without an audience. Specifics — model names, dates, the exact moment something broke.

The Cheaper Way

Stop drafting in the model's voice and editing toward yours; draft in yours and let the model fact-check. Reverse the polarity. The output is shorter, less symmetric, more linkable. It is also slower to make, which is now the point.

End of issue 031 · 19 May 2026 Back to top ↑

That's today.

Eight stories on the apparatus that watches the model: an open-sourced alignment harness, a government cyber benchmark, a security LLM pointed at real infrastructure, a forecast that prices in two futures, plus a conference debrief, a designer commons, and a content reset.

Issue 031 · Tuesday 19 May 2026 · Zürich
Sources fed Anthropic Research · Cloudflare · Jack Clark / Import AI · AISI · Jeff Bullas · GitHub · YouTube
Via channels @seeallochnaya · @ProductsAndStartups · @TochkiNadAI
Rubric AI tools you could adopt this week · Creative software · Dev tools & agentic coding · Privacy & security · Science & research with a practical kernel · Anything immediately actionable
Next issue Wednesday 20 May 2026 · 08:00 Zürich