Today's dossier: probing closed models, agents that book your taxi, and pages that are no longer made of HTML.
Reading the size of a model through the keyhole.
A new paper proposes a black-box benchmark that estimates the parameter count of closed-weight models from their answers alone — using 1,400 factual questions across seven rarity bands.
The Method calibrates a regression of factual recall against parameter count on 89 open-weight models, then applies the curve to closed APIs to read off an implied size. Rare-fact tiers are the load-bearing predictor; common-fact accuracy saturates and tells you nothing.
The headline numbers are striking — a near-10T-parameter GPT-5.5 implies that model scaling did not stop in 2024, only its public bragging did. Claude Opus comes in roughly half that, consistent with Anthropic's well-known compute-efficiency posture.
The novelty is methodological. Until now, closed-model size was rumour, leak, or LinkedIn boast. A reproducible probe with 0.917 R² turns it into a measurable property, surfaced via @denissexy's reading of the paper.
CLAUDE GROWS HANDS.
Anthropic ships Connectors that let Claude do small errands — book a taxi, order food, hold a flight — through partner integrations. Agent-as-meta-superapp, no app store required.
The page is the model now.
Flipbook streams a 1080p / 24 fps interactive interface generated frame-by-frame by a video model. No HTML, no CSS — the model paints the pixels, then reads your inputs straight off them.
RENDER PIPELINE input → cursor + keystrokes (raw events) model → video diffusion · 1080p · 24 fps output → rasterised UI, no DOM state → latent · re-derived per frame implies → no inspect-element. no view-source.
A browser plugin that repaints the open web.
Genternet is a Chrome extension that intercepts page renders and replaces them with a generated re-skin: Win95, vaporwave, brutalist, corporate. Same content, model-painted chrome.
From pixels to characters: the banner, scene by scene.
GitHub's Copilot CLI opens with an animated ASCII banner. The engineering write-up traces the trade-offs — frame timing on a 60 Hz terminal, Unicode caveats, perf budget — that a one-second loop actually costs.
A GITHUB ENGINEERING PRODUCTION · written by the team that ships Copilot CLI · directed at every developer who has ever wondered what those frames cost · running time: under one second.
Swizzling Swift functions without the runtime helping.
Objective-C swizzling is easy because the runtime is dynamic. Pure Swift is statically dispatched — Sentry's iOS team uses in-process debugging primitives to hook SwiftUI views regardless.
body is a generic, statically dispatched function. The Obj-C runtime can't see it; method_exchangeImplementations doesn't apply.dyld introspection plus in-process function-pointer rewrites — to splice in instrumentation.scales, and so do models in distress.
Jack Clark's Import AI 450 stitches three threads: a Chinese MERLIN model purpose-built for electronic warfare signals, "trauma responses" in Gemma fine-tuneable away with preference data, and an early scaling law for cyberattacks.