Ephemeris · Issue 031 Tuesday · 19 May 2026 · Zürich

// Tuesday Edition · Vol. 031

Audits

Today's issue is about the machinery that watches the machine: open-sourced alignment tools, government cyber benchmarks, security LLMs pointed at production code, and a 2028 forecast neither side wants to ignore.

Ephemeris · Issue 031 01 / 8 · Import AI

AI Research · Optimizers

A new optimizer kills neurons. Aurora explains why.

Jack Clark's Import AI 457 packs three uncomfortable signals into one issue: the Aurora optimizer that finally fixes Muon's silent neuron-death problem, a twenty-year-old precision-calculation virus that reads like a template for an AI-targeted Stuxnet, and a serious attempt to define positive alignment — research aimed at human flourishing, not only at not-harming.

# muon.step()  →  some neurons drift to zero variance and stop learning
for p in params:
    g  = grad(p)
    m  = momentum_update(m, g)             # muon: spectral norm
    u  = newton_schulz(m)               # cursed: explodes a tail of units

# aurora: scale per-unit, then constrain — neurons stay alive
for p in params:
    g  = grad(p)
    m  = momentum_update(m, g)
    s  = per_unit_scale(m)              # <- the fix
    u  = newton_schulz(m / s) * s

Read Import AI 457 →

Ephemeris · Issue 031 02 / 8 · Cloudflare

Security · Frontier Models

We pointed a security LLM at our own infrastructure code.

Project Glasswing is Cloudflare's pre-flight check on the new generation of security-focused frontier models. They ran Mythos and peers against real production code — auth, ingress, rate-limiting — and logged what it caught, what it missed, and how often it confidently invented bugs that did not exist.

Target: Live Cloudflare infrastructure source — not a benchmark fork.
Models: Mythos (Anthropic) · [redacted] · [redacted]
Hits: Confirmed previously-unknown classes of unsafe library use; missed a sizeable share of business-logic flaws.
Misses: Hallucinated vulnerabilities that diffed clean and reviewer-rejected at high rates.
Verdict: Useful as a force-multiplier under human triage. Not yet trustworthy unsupervised.

Read on Cloudflare →

Ephemeris · Issue 031 03 / 8 · Anthropic Research

Alignment · Open Source

Anthropic donates its alignment audit to everyone.

Petri is the internal instrument Anthropic has used to interrogate Claude for agentic misalignment. They've now open-sourced it — donating the same tool that scrutinizes their own production models to anyone shipping an agent.

The pitch is unusually direct: alignment audits should not be a club good. Petri runs Claude — or any model wired to its harness — through structured probes designed to expose the specific failure modes Anthropic worries about. Reward hacking, deceptive compliance, capability sandbagging. The probes are the catalog Anthropic uses on itself.

The donation matters more than the tool. The signalling shifts: a frontier lab is saying out loud that "we use our own evals" is not a moat. If your agent stack does not yet have a permanent audit pipeline, this is the cheapest possible Tuesday-afternoon adoption.

Read on Anthropic →

Ephemeris · Issue 031 04 / 8 · AISI · via @seeallochnaya

Cyber · Government Benchmark

Government auditors give GPT-5.5 a barely-passing grade on cyber.

The UK AI Security Institute's evaluation of GPT-5.5 finds only marginal improvement on multi-stage offensive cyber tasks over the prior generation. Useful as a public capability marker — and as a quiet correction to the "every release is a jump" narrative procurement teams keep buying.

Evaluator

UK AI Security Institute (AISI)

.gov.uk

Subject

OpenAI GPT-5.5

May 2026

Domain

Multi-stage network exploitation, code-vuln analysis

cyber

Δ vs prior

Minimal — within noise on most task classes

marginal

Reading

A public capability marker, not a marketing chart

audit

Read the AISI report →

Ephemeris · Issue 031 05 / 8 · Anthropic Institute

Policy · Forecast

Two ways the next two years end. Both are plausible.

Anthropic's policy team sketches a pair of 2028 scenarios for global AI leadership — and the investments and decisions that push toward each. A reading list more than a manifesto, but the more useful document for the founders and engineers who keep being asked "where is this all going."

Scenario A · Bounded

Frontier scaling slows; capability gains come from harnesses, evals, and tooling. The moat moves to integration depth and to the model's permission to act. Auditors and standards bodies become load-bearing.

Scenario B · Steep

Scaling continues unbroken; capability jumps outpace governance. Concentration risk hardens around the labs with the most compute. The marginal year of policy work matters more than any single benchmark.

Read the scenarios →

Ephemeris · Issue 031 06 / 8 · via @ProductsAndStartups

Builder Notes · Conf Recap

SEVEN IDEAS FROM CODE WITH CLAUDE.

Bayram Annakov's debrief from Code with Claude 2026, distilled. Pick one, ship it this week. The conference was full of opinions; the through-line is that the infrastructure around the model is the lever, not the model.

01Use a stronger model as advisor to a weaker model in your agent loop.
02Plan around an agent half-life of 6–12 months. Don't tightly couple.
03The moat is the infrastructure around the model, not the model.
04Skills > prompts. Skills are versionable, testable, transferable.
05Verification before completion is the single highest-leverage habit.
06Build evals before you build features. Treat them like CI, not QA.
07Make agents grabbable: one task = one issue = one PR.

Watch the playlist →

Ephemeris · Issue 031 07 / 8 · GitHub · via @TochkiNadAI

Creative Tools · Open Source

An open-source rival to Claude Design.

Open Design ships with nineteen Skills, seventy-one design-system templates, multi-format export, and a model-agnostic harness that runs against Claude, GPT, and Gemini APIs. The "designer agent" category has its first credible commons.

01

02

03

04

05

06

07

08

09

10

11

12

Browse on GitHub →

Ephemeris · Issue 031 08 / 8 · Jeff Bullas

Content · Authenticity

The slop tide rises. Stop sounding like the tide.

AI-generated content has finally saturated the feeds — and the marginal post now competes against the median noise. Bullas's argument is simple, slightly dull, and correct: in a flood, the only thing that floats is a voice.

The Diagnosis

The tax on attention has flipped. Five years ago a polished generic post got a fair hearing. Today the same post reads as a tell that nobody on the other end is real — and readers route around it the way they route around captcha noise.

What Still Works

First-person experience with a named author. Numbers tied to the author's own work, not lifted from a dashboard screenshot. An opinion the author would still hold without an audience. Specifics — model names, dates, the exact moment something broke.

The Cheaper Way

Stop drafting in the model's voice and editing toward yours; draft in yours and let the model fact-check. Reverse the polarity. The output is shorter, less symmetric, more linkable. It is also slower to make, which is now the point.

Read Jeff Bullas →

End of issue 031 · 19 May 2026 Back to top ↑

That's today.

Eight stories on the apparatus that watches the model: an open-sourced alignment harness, a government cyber benchmark, a security LLM pointed at real infrastructure, a forecast that prices in two futures, plus a conference debrief, a designer commons, and a content reset.

Issue	031 · Tuesday 19 May 2026 · Zürich
Sources fed	Anthropic Research · Cloudflare · Jack Clark / Import AI · AISI · Jeff Bullas · GitHub · YouTube
Via channels	@seeallochnaya · @ProductsAndStartups · @TochkiNadAI
Rubric	AI tools you could adopt this week · Creative software · Dev tools & agentic coding · Privacy & security · Science & research with a practical kernel · Anything immediately actionable
Next issue	Wednesday 20 May 2026 · 08:00 Zürich

Issue

031 · Tuesday 19 May 2026 · Zürich

Sources fed

Anthropic Research · Cloudflare · Jack Clark / Import AI · AISI · Jeff Bullas · GitHub · YouTube

Via channels

@seeallochnaya · @ProductsAndStartups · @TochkiNadAI

Rubric

AI tools you could adopt this week · Creative software · Dev tools & agentic coding · Privacy & security · Science & research with a practical kernel · Anything immediately actionable

Next issue

Wednesday 20 May 2026 · 08:00 Zürich