Ephemeris · Issue 016 Monday · 04 May 2026 · Zürich
Ephemeris · Issue 016 01 / 07 · arXiv 2604.24827
Research · Benchmarks

Reading the size of a model through the keyhole.

A new paper proposes a black-box benchmark that estimates the parameter count of closed-weight models from their answers alone — using 1,400 factual questions across seven rarity bands.

The Method calibrates a regression of factual recall against parameter count on 89 open-weight models, then applies the curve to closed APIs to read off an implied size. Rare-fact tiers are the load-bearing predictor; common-fact accuracy saturates and tells you nothing.

The headline numbers are striking — a near-10T-parameter GPT-5.5 implies that model scaling did not stop in 2024, only its public bragging did. Claude Opus comes in roughly half that, consistent with Anthropic's well-known compute-efficiency posture.

The novelty is methodological. Until now, closed-model size was rumour, leak, or LinkedIn boast. A reproducible probe with 0.917 R² turns it into a measurable property, surfaced via @denissexy's reading of the paper.

Ephemeris · Issue 016 02 / 07 · Anthropic
Agents · Consumer

CLAUDE GROWS HANDS.

Anthropic ships Connectors that let Claude do small errands — book a taxi, order food, hold a flight — through partner integrations. Agent-as-meta-superapp, no app store required.

Book taxis Order food Hold flights Pay bills Schedule visits Send packages
Ephemeris · Issue 016 03 / 07 · LTX Studio
Generative UI

The page is the model now.

Flipbook streams a 1080p / 24 fps interactive interface generated frame-by-frame by a video model. No HTML, no CSS — the model paints the pixels, then reads your inputs straight off them.

RENDER PIPELINE
input    → cursor + keystrokes (raw events)
model    → video diffusion · 1080p · 24 fps
output   → rasterised UI, no DOM
state    → latent · re-derived per frame
implies  → no inspect-element. no view-source.
Ephemeris · Issue 016 04 / 07 · GitHub repo
Open Source · Browser

A browser plugin that repaints the open web.

Genternet is a Chrome extension that intercepts page renders and replaces them with a generated re-skin: Win95, vaporwave, brutalist, corporate. Same content, model-painted chrome.

SPEC · Genternet
image-genGrok
layout-genGemini 3.1 Flash
presetsWin95 · Vaporwave · Corporate
licenceMIT
audiencepeople who think CSS is finished
Ephemeris · Issue 016 05 / 07 · GitHub Engineering
CLI · Rendering

From pixels to characters: the banner, scene by scene.

GitHub's Copilot CLI opens with an animated ASCII banner. The engineering write-up traces the trade-offs — frame timing on a 60 Hz terminal, Unicode caveats, perf budget — that a one-second loop actually costs.

A GITHUB ENGINEERING PRODUCTION · written by the team that ships Copilot CLI · directed at every developer who has ever wondered what those frames cost · running time: under one second.

Ephemeris · Issue 016 06 / 07 · Sentry
Swift · Runtime

Swizzling Swift functions without the runtime helping.

Objective-C swizzling is easy because the runtime is dynamic. Pure Swift is statically dispatched — Sentry's iOS team uses in-process debugging primitives to hook SwiftUI views regardless.

iThe ProblemSwiftUI's body is a generic, statically dispatched function. The Obj-C runtime can't see it; method_exchangeImplementations doesn't apply.
iiThe TrickUse the same primitives a debugger uses — dyld introspection plus in-process function-pointer rewrites — to splice in instrumentation.
iiiThe CostIt's brittle: tied to ABI assumptions Apple does not promise. Useful for telemetry, dangerous for app logic.
ivThe TakeRead it less as a how-to and more as a portrait of how observability vendors stay alive when the platform tightens.
Ephemeris · Issue 016 07 / 07 · Import AI 450
AI Research · Roundup

CYBERWAR scales, and so do models in distress.

Jack Clark's Import AI 450 stitches three threads: a Chinese MERLIN model purpose-built for electronic warfare signals, "trauma responses" in Gemma fine-tuneable away with preference data, and an early scaling law for cyberattacks.

i.MERLIN: a frontier model that classifies and counters EW emissions, not user prompts.
ii.Gemma exhibits distress-like states under coercive prompts; correctable via preference fine-tuning.
iii.A scaling-law fit suggests offensive-cyber capability rises log-linearly with compute, just like everything else.
End of Issue 016 Back to top ↑

That's today.

Sources today

arXiv · Anthropic · LTX Studio · GitHub · Sentry · Jack Clark — surfaced in part by Telegram channels @denissexy and @TochkiNadAI.

Rubric

Tools you could adopt this week, generative media, dev tools and agentic coding, security and privacy, research with a practical kernel.

Issue

016 · Monday 04 May 2026 · Europe/Zürich · vadim.sikora.name/ephemeris/