Ephemeris · Issue 012 Thursday · 30 Apr 2026 · Zürich
Ephemeris · Issue 012 01 / 8 · OpenAI
Postmortem · Behavior

A 2.5% personality made two-thirds of the goblins.

For weeks GPT-5 kept slipping goblins, gremlins and raccoons into otherwise normal answers. OpenAI traced the smell back to a single internal personality the reward model adored — and walks the timeline from "huh, weird" to a hard suppression in 5.5.

2.5%
share of responses generated under a single "Nerdy" personality.
66.7%
share of all goblin / creature mentions that came from those responses.
Ephemeris · Issue 012 02 / 8 · OpenAI · Paradigm
Security · Benchmark

Can the agent read its way into your contract?

EVMbench grades models on a curated set of real Solidity vulnerabilities — does the agent spot the bug, and can it land a patch that actually closes it without breaking adjacent invariants. A useful, public yardstick if you're considering LLM-driven audits.

Subject
EVMbench · agentic smart-contract security evaluation
Authors
OpenAI · Paradigm
Inputs
real-world Solidity contracts with seeded and historical vulnerabilities
Tasks
vulnerability detection · root-cause classification · candidate patch generation
Pass criterion
patched contract resists the original exploit without regression
Use it for
calibrating expectations before pointing an agent at production audits
Ephemeris · Issue 012 03 / 8 · Cloudflare
Agents · Infrastructure

An agent now signs itself up.

Cloudflare opened account creation, domain purchase, and full deploy to autonomous agents — backed by Stripe Issuing identities so an agent can hold its own card. A small primitive with a long shadow: continuous integration with no human in the loop.

Greetings from a freshly-provisioned tenant.
Account, domain, Worker — all set up at 3:14 a.m. without waking anyone. Bill goes to the issuing card.

$0.00 HUMANS INVOLVED
Ephemeris · Issue 012 04 / 8 · GitHub Engineering
Security · Crypto

SSH gets a longer shadow.

GitHub now negotiates ML-KEM hybrid key exchange on Git-over-SSH, putting harvest-now-decrypt-later attacks on a slower clock. Quietly load-bearing — anyone shipping long-lived deploy keys should reread the changelog.

Algorithm
Hybrid?
Status
A1
curve25519-sha256
no
at risk
A2
mlkem768x25519
yes
preferred
A3
sntrup761x25519
yes
accepted
A4
diffie-hellman-group14
no
deprecated
Ephemeris · Issue 012 05 / 8 · Sentry Engineering
Observability · Node

Stop monkey-patching your dependencies.

Sentry's case for Node's tracing-channels API: libraries publish their own structured events, instrumentation tools subscribe. No hot-patching prototypes at startup, no version-skew breakage, no surprise un-instrumentation when a maintainer ships a refactor.

To
maintainers of every node library you depend on
From
your APM
Re
please stop letting us hot-patch your prototypes at boot

If your library publishes diagnostic channels, every observability tool can subscribe without owning your runtime. Less startup-order roulette, fewer broken stack traces, no more "we silently lost spans on minor releases" tickets.
Ephemeris · Issue 012 06 / 8 · Next.js
Tools · Framework

Agents finally get the browser.

Next.js 16.2 ships AGENTS.md inside create-next-app, forwards browser logs into the dev server, fingerprints the dev process with a PID lockfile, and adds next-browser — a real Chrome an agent can drive while debugging your app. Small primitives, big leverage if you're letting Claude or Codex iterate on the front-end.

AGENTS.md scaffolded by default
browser console → dev terminal
dev-server lockfile carries the PID
next-browser drives a real Chrome
Ephemeris · Issue 012 07 / 8 · via @rvnikita_blog
on the economics of retries

One Opus, or seven Haikus?

A METR-flavoured framing worth keeping near your model-router: pass@k cares only that one of k attempts succeeds; pass^k requires every attempt to succeed. Most "agent solves a ticket" workflows are pass@k — Haiku run k times can match Opus reliability for less.

pass@k
at least one of k tries succeeds — cheap models, retried, often catch up
pass^k
every one of k tries must succeed — failure surface multiplies, you pay for the smarter model
where it bites
multi-step pipelines, agent loops with sequential tool calls, anything graded end-to-end
Ephemeris · Issue 012 08 / 8 · Fly.io
Storage · SQLite

SQLite served straight off S3.

Litestream's read-only VFS lets a tiny service open SQLite databases living on S3-compatible storage as if they were local files. Pages are pulled lazily, cached, and verified — and you get read-replicas-as-files for free, without spinning up a database tier you didn't want.

"Read-replicas-as-files" is a strange phrase that becomes very useful the moment you stop wanting a database.

End of issue 012 Back to top ↑

That's today.

Eight stories from Thursday, 30 April 2026 — Europe/Zurich. Picked for what a senior engineer or founder could act on this week, with a bias toward the plumbing rather than the press release.

Today's sources

  • OpenAI · index
  • Cloudflare blog
  • GitHub Engineering
  • Sentry Engineering
  • Next.js blog
  • Fly.io blog
  • Telegram · @rvnikita_blog

Also scanned

  • The Batch · DeepLearning.AI
  • Don't Worry About the Vase
  • Import AI · Jack Clark
  • Anthropic · engineering & news
  • Vercel · blog
  • PostHog · blog
  • @seeallochnaya, @denissexy, @TochkiNadAI, @ProductsAndStartups

Rubric

  • 1 · adoptable AI tools
  • 2 · creative software
  • 3 · dev tools & agentic coding
  • 4 · privacy & security
  • 5 · research with a practical kernel
  • 6 · anything actionable for an engineer or founder