Ghosts.
Today's issue is about behavior that only surfaces in production: the goblins your model whispers, the keys quietly going post-quantum, the agents signing themselves up for accounts you never approved.
A 2.5% personality made two-thirds of the goblins.
For weeks GPT-5 kept slipping goblins, gremlins and raccoons into otherwise normal answers. OpenAI traced the smell back to a single internal personality the reward model adored — and walks the timeline from "huh, weird" to a hard suppression in 5.5.
Can the agent read its way into your contract?
EVMbench grades models on a curated set of real Solidity vulnerabilities — does the agent spot the bug, and can it land a patch that actually closes it without breaking adjacent invariants. A useful, public yardstick if you're considering LLM-driven audits.
- Subject
- EVMbench · agentic smart-contract security evaluation
- Authors
- OpenAI · Paradigm
- Inputs
- real-world Solidity contracts with seeded and historical vulnerabilities
- Tasks
- vulnerability detection · root-cause classification · candidate patch generation
- Pass criterion
- patched contract resists the original exploit without regression
- Use it for
- calibrating expectations before pointing an agent at production audits
An agent now signs itself up.
Cloudflare opened account creation, domain purchase, and full deploy to autonomous agents — backed by Stripe Issuing identities so an agent can hold its own card. A small primitive with a long shadow: continuous integration with no human in the loop.
Greetings from a freshly-provisioned tenant.
Account, domain, Worker — all set up at 3:14 a.m. without waking anyone. Bill goes to the issuing card.
SSH gets a longer shadow.
GitHub now negotiates ML-KEM hybrid key exchange on Git-over-SSH, putting harvest-now-decrypt-later attacks on a slower clock. Quietly load-bearing — anyone shipping long-lived deploy keys should reread the changelog.
Stop monkey-patching your dependencies.
Sentry's case for Node's tracing-channels API: libraries publish their own structured events, instrumentation tools subscribe. No hot-patching prototypes at startup, no version-skew breakage, no surprise un-instrumentation when a maintainer ships a refactor.
Agents finally get the browser.
Next.js 16.2 ships AGENTS.md inside create-next-app, forwards browser logs into the dev server, fingerprints the dev process with a PID lockfile, and adds next-browser — a real Chrome an agent can drive while debugging your app. Small primitives, big leverage if you're letting Claude or Codex iterate on the front-end.
One Opus, or seven Haikus?
A METR-flavoured framing worth keeping near your model-router: pass@k cares only that one of k attempts succeeds; pass^k requires every attempt to succeed. Most "agent solves a ticket" workflows are pass@k — Haiku run k times can match Opus reliability for less.
- pass@k
- at least one of k tries succeeds — cheap models, retried, often catch up
- pass^k
- every one of k tries must succeed — failure surface multiplies, you pay for the smarter model
- where it bites
- multi-step pipelines, agent loops with sequential tool calls, anything graded end-to-end
SQLite served straight off S3.
Litestream's read-only VFS lets a tiny service open SQLite databases living on S3-compatible storage as if they were local files. Pages are pulled lazily, cached, and verified — and you get read-replicas-as-files for free, without spinning up a database tier you didn't want.
"Read-replicas-as-files" is a strange phrase that becomes very useful the moment you stop wanting a database.
That's today.
Eight stories from Thursday, 30 April 2026 — Europe/Zurich. Picked for what a senior engineer or founder could act on this week, with a bias toward the plumbing rather than the press release.
Today's sources
- OpenAI · index
- Cloudflare blog
- GitHub Engineering
- Sentry Engineering
- Next.js blog
- Fly.io blog
- Telegram · @rvnikita_blog
Also scanned
- The Batch · DeepLearning.AI
- Don't Worry About the Vase
- Import AI · Jack Clark
- Anthropic · engineering & news
- Vercel · blog
- PostHog · blog
- @seeallochnaya, @denissexy, @TochkiNadAI, @ProductsAndStartups
Rubric
- 1 · adoptable AI tools
- 2 · creative software
- 3 · dev tools & agentic coding
- 4 · privacy & security
- 5 · research with a practical kernel
- 6 · anything actionable for an engineer or founder