Decoupling the brain from the hands.
Anthropic's engineering team posted a surprisingly humble deep-dive on scaling Managed Agents — less a benchmark parade than an architectural confession. The brain (the model) and the hands (the execution plane) want different things, operate on different timescales, and fail in different ways. Treat them as one, and you rediscover it every outage.
The post's spine is simple: most agent reliability problems come from confusing a model's capability with a runtime's guarantee. A smarter model doesn't help a sandbox that can't bound a shell process. A stronger sandbox doesn't save a model that loses track of tool state. The paper walks through where each layer's responsibilities begin and end.
Notable are the constraints they chose to make explicit: isolation per session, bounded concurrency, deterministic tool exposure, and a narrow path for escalation when the agent's plan outgrows the sandbox's budget. The shape of the contract, not the cleverness of the code, does most of the work.
For teams building their own stacks, the prescriptions read like boring advice because they are. Write the execution contract first. Measure the hands, not only the brain. Assume your model will get better faster than your runtime does, and architect accordingly.