Ephemeris · Issue 012 · 30 Apr 2026

Ephemeris · Issue 012 Thursday · 30 Apr 2026 · Zürich

Today's issue is about behavior that only surfaces in production: the goblins your model whispers, the keys quietly going post-quantum, the agents signing themselves up for accounts you never approved.

Ephemeris · Issue 012 01 / 8 · OpenAI

Postmortem · Behavior

A 2.5% personality made two-thirds of the goblins.

For weeks GPT-5 kept slipping goblins, gremlins and raccoons into otherwise normal answers. OpenAI traced the smell back to a single internal personality the reward model adored — and walks the timeline from "huh, weird" to a hard suppression in 5.5.

2.5%

share of responses generated under a single "Nerdy" personality.

66.7%

share of all goblin / creature mentions that came from those responses.

Read on →

Ephemeris · Issue 012 02 / 8 · OpenAI · Paradigm

Security · Benchmark

Can the agent read its way into your contract?

EVMbench grades models on a curated set of real Solidity vulnerabilities — does the agent spot the bug, and can it land a patch that actually closes it without breaking adjacent invariants. A useful, public yardstick if you're considering LLM-driven audits.

Subject: EVMbench · agentic smart-contract security evaluation
Authors: OpenAI · Paradigm
Inputs: real-world Solidity contracts with seeded and historical vulnerabilities
Tasks: vulnerability detection · root-cause classification · candidate patch generation
Pass criterion: patched contract resists the original exploit without regression
Use it for: calibrating expectations before pointing an agent at production audits

Read on →

Ephemeris · Issue 012 03 / 8 · Cloudflare

Agents · Infrastructure

An agent now signs itself up.

Cloudflare opened account creation, domain purchase, and full deploy to autonomous agents — backed by Stripe Issuing identities so an agent can hold its own card. A small primitive with a long shadow: continuous integration with no human in the loop.

Greetings from a freshly-provisioned tenant.
Account, domain, Worker — all set up at 3:14 a.m. without waking anyone. Bill goes to the issuing card.

$0.00 HUMANS INVOLVED

Read on →

Ephemeris · Issue 012 04 / 8 · GitHub Engineering

Security · Crypto

SSH gets a longer shadow.

GitHub now negotiates ML-KEM hybrid key exchange on Git-over-SSH, putting harvest-now-decrypt-later attacks on a slower clock. Quietly load-bearing — anyone shipping long-lived deploy keys should reread the changelog.

Algorithm

Hybrid?

Status

A1

curve25519-sha256

no

at risk

A2

mlkem768x25519

yes

preferred

A3

sntrup761x25519

yes

accepted

A4

diffie-hellman-group14

no

deprecated

Read on →

Ephemeris · Issue 012 05 / 8 · Sentry Engineering

Observability · Node

Stop monkey-patching your dependencies.

Sentry's case for Node's tracing-channels API: libraries publish their own structured events, instrumentation tools subscribe. No hot-patching prototypes at startup, no version-skew breakage, no surprise un-instrumentation when a maintainer ships a refactor.

To

maintainers of every node library you depend on

From

your APM

Re

please stop letting us hot-patch your prototypes at boot

If your library publishes diagnostic channels, every observability tool can subscribe without owning your runtime. Less startup-order roulette, fewer broken stack traces, no more "we silently lost spans on minor releases" tickets.

Read on →

Ephemeris · Issue 012 06 / 8 · Next.js

Tools · Framework

Agents finally get the browser.

Next.js 16.2 ships AGENTS.md inside create-next-app, forwards browser logs into the dev server, fingerprints the dev process with a PID lockfile, and adds next-browser — a real Chrome an agent can drive while debugging your app. Small primitives, big leverage if you're letting Claude or Codex iterate on the front-end.

AGENTS.md scaffolded by default

browser console → dev terminal

dev-server lockfile carries the PID

next-browser drives a real Chrome

Read on →

Ephemeris · Issue 012 07 / 8 · via @rvnikita_blog

on the economics of retries

One Opus, or seven Haikus?

A METR-flavoured framing worth keeping near your model-router: pass@k cares only that one of k attempts succeeds; pass^k requires every attempt to succeed. Most "agent solves a ticket" workflows are pass@k — Haiku run k times can match Opus reliability for less.

pass@k: at least one of k tries succeeds — cheap models, retried, often catch up
pass^k: every one of k tries must succeed — failure surface multiplies, you pay for the smarter model
where it bites: multi-step pipelines, agent loops with sequential tool calls, anything graded end-to-end

Read on →

Ephemeris · Issue 012 08 / 8 · Fly.io

Storage · SQLite

SQLite served straight off S3.

Litestream's read-only VFS lets a tiny service open SQLite databases living on S3-compatible storage as if they were local files. Pages are pulled lazily, cached, and verified — and you get read-replicas-as-files for free, without spinning up a database tier you didn't want.

"Read-replicas-as-files" is a strange phrase that becomes very useful the moment you stop wanting a database.

Read on →

End of issue 012 Back to top ↑

That's today.

Eight stories from Thursday, 30 April 2026 — Europe/Zurich. Picked for what a senior engineer or founder could act on this week, with a bias toward the plumbing rather than the press release.

Today's sources

OpenAI · index
Cloudflare blog
GitHub Engineering
Sentry Engineering
Next.js blog
Fly.io blog
Telegram · @rvnikita_blog

Also scanned

The Batch · DeepLearning.AI
Don't Worry About the Vase
Import AI · Jack Clark
Anthropic · engineering & news
Vercel · blog
PostHog · blog
@seeallochnaya, @denissexy, @TochkiNadAI, @ProductsAndStartups

Rubric

1 · adoptable AI tools
2 · creative software
3 · dev tools & agentic coding
4 · privacy & security
5 · research with a practical kernel
6 · anything actionable for an engineer or founder

Ghosts.

A 2.5% personality made two-thirds of the goblins.

Can the agent read its way into your contract?

An agent now signs itself up.

SSH gets a longer shadow.

Stop monkey-patching your dependencies.

Agents finally get the browser.

One Opus, or seven Haikus?

SQLite served straight off S3.

That's today.

Today's sources

Also scanned

Rubric