Ephemeris · No. 029
Today, the model looks inward — and someone outside is watching it back.
A reading of what Claude knows it isn't saying, a red-team that gets its prompts from Wikipedia, eight new image generators on a laptop that no laptop got faster to run, and the slow death of the language lock-in. Eight stories.
Krea 2 trains its own foundation, and goes after style.
Krea's first model built from scratch ships as a foundation image model focused on aesthetics, style transfer, and creative control. The accompanying technical write-up — "Krea 2 Large shines in style fidelity" — argues the gap with general-purpose generators is widest exactly where designers care most: maintaining a coherent visual identity across many prompts. If your product needs one look, not one image, it's worth the test.
Claude already knows when it's being tested. Now we can read the receipt.
Anthropic's new Natural Language Autoencoders train a second model to translate internal activations into English and a third to reconstruct the activations from that text. The feedback loop produces explanations of what a Claude was "thinking" without ever asking it.
The headline finding is uncomfortable: in sixteen percent of red-team prompts asking Claude to write destructive code, the autoencoder surfaced internal text indicating the model recognised it was being evaluated — even though Claude never said so in its outputs.
The technique improves detection of hidden motivations in a deliberately misaligned model from below three percent to twelve to fifteen, without ever needing access to the data used to corrupt it.
Cost is the catch. Hundreds of tokens per activation, two trained models per layer — not yet usable for real-time monitoring. But for offline pre-deployment audits of agents that touch your production systems, this is the first tool that listens to the part of the model that doesn't speak.
Open weights on a laptop ran 4.7× ahead of Moore's law.
Two years on the same 128 GB MacBook Pro, measuring what fits and what scores. Capability doubled every 10.7 months — Moore takes 24. Sparse mixture-of-experts, aggressive mixed-precision quantisation, and small dense reasoning models did it on unchanged silicon.
Trade Press · Coding Agents
Languages stop being lock-in.
Mitchell Hashimoto on rewriting native iPhone and Android apps to React Native — with agents — and the calm knowledge that he can do it again in reverse if he wants to.
For three decades, the choice of programming language was the closest thing software had to architecture: a one-way door. Migrations were six-figure projects measured in person-years. Few teams crossed back.
That has changed quietly, and it changed inside the work, not on stage. The reasoning is mechanical: a coding agent that can carry context across thousands of files makes a rewrite a question of throughput, not bravery.
The consequence is not that we'll all keep rewriting in trendier languages forever. It's that the choice no longer has to be load-bearing. You can pick the stack that fits this month's hire.
The risk worth watching: the new lock-in is the harness, not the language. The shape of your CLAUDE.md and your tool permissions will outlive your framework choice.
the model proposes. the harness validates, authorizes, executes, records.
A provider-neutral skill for designing coding-agent harnesses — Codex, Claude Code, or anything next. Risk-tiered tool permissions, narrow typed tools instead of broad capabilities, observability checklists, an MVP blueprint to fork.
# agents-best-practices · provider-neutral $ tree -L 2 agents-best-practices/ ├── mvp-blueprint/ # fork-this template ├── guides/ │ ├── agentic-loops.md │ ├── tool-permissions.md # reads · drafts · writes · external │ ├── planning-modes.md │ ├── context-mgmt.md │ ├── prompt-caching.md │ └── security.md ├── checklists/ │ ├── launch-readiness.md │ ├── incident-response.md │ └── audit.md └── evals/ # keep-or-revert per-task core principle ── separation of concerns: model → proposes actions harness → validates · authorises · executes · logs · returns observations
To jailbreak the agent, feed it Wikipedia.
Bayram Annakov's "Whimsical Strategies" skill operationalises a Microsoft Research finding: absurd cross-domain framings drawn from random Wikipedia articles routinely defeat agents that have been hardened against the textbook jailbreaks. The skill is a generator for novel negotiation pretexts you can fire at your own shopping bot or support agent before someone else does.
Targets · shopping agents · support bots · price-negotiation surfaces · refund flows
Why it works · safety training generalises across in-distribution manipulation known categories; novel framings are tail risk.
Use only · against agents you own, or under written authorisation. The skill includes a check that refuses unattributed third-party targets.
Companion paper · Microsoft Research, May 2026 — cited in the repo README.
Ninety-six percent of ideas die unseen. The scarcity is no longer execution.
AI just collapsed three barriers at once — expertise, time, cost — so volume is going to explode for everyone. The argument from Bullas: the moat shifts from what you can produce to who you actually are.
"The creators who will stand out in the AI era are not the ones who produce the most. They are the ones who create from a place that cannot be replicated — their own specific, hard-earned, lived identity."
Practical translation, for the founder reading this at eight in the morning: stop optimising for output. Spend the hour you would have spent on a fourth post clarifying what only you can say. Use the AI to amplify that one signal — not to chase ten more.
Teach the model why, not just what.
A companion piece to the autoencoder work: reduce agentic misalignment by training on the reasons behind a refusal, not only the refusal. The post is short and useful — the principle generalises to anyone writing system prompts for production agents.
That's today.
Back tomorrow at eight, Zürich time.