Loading…
Loading…
Written by Max Zeshut
Founder at Agentmelt
The amount of computational work an AI model performs at inference time to produce a response, beyond a single forward pass. Reasoning models (OpenAI o-series, Claude with extended thinking, DeepSeek R1) use test-time compute to deliberate through complex problems—generating intermediate reasoning steps, self-correcting, exploring alternatives—before producing a final answer. More test-time compute generally improves quality on hard reasoning tasks but increases latency and cost. The shift toward test-time compute has fundamentally changed AI agent economics: capable agents now spend significant compute thinking, not just generating.
A coding agent given a complex algorithm problem with low test-time compute produces a working solution that fails on edge cases 40% of the time. With 5x more test-time compute, the agent reasons through edge cases, self-corrects, and produces a solution that fails only 8% of the time. The cost increases 5x but the agent saves human debugging time worth 10x the additional compute cost.