How is context rot different from context window limits?

Context window limits are hard—exceeding them causes errors. Context rot is a soft quality degradation that occurs well before the limit. A model with a 200K context window typically performs best at 10-30K tokens, with quality declining smoothly thereafter. Treat the documented context window as the maximum, not the optimum—and design context management strategies for the optimum, not the maximum.

Context Rot

Written by Max Zeshut

Founder at Agentmelt

The gradual degradation of an AI agent's response quality as conversation context grows longer. Even when a model technically supports a million-token context window, response quality typically peaks at much shorter context lengths. As context grows, the model's attention spreads thinner across more information, instruction following weakens, and earlier instructions get diluted by later content. Context rot is why long-running agent conversations need periodic context compression, not just larger context windows.

Example

A coding agent works on a long debugging session, accumulating 200K tokens of context. Quality is excellent at 20K tokens, good at 60K, mediocre at 120K, and unreliable at 200K—even though the model 'supports' 200K. The team implements context compression that summarizes earlier turns when context exceeds 80K, restoring high-quality outputs.

Frequently asked questions

How is context rot different from context window limits?: Context window limits are hard—exceeding them causes errors. Context rot is a soft quality degradation that occurs well before the limit. A model with a 200K context window typically performs best at 10-30K tokens, with quality declining smoothly thereafter. Treat the documented context window as the maximum, not the optimum—and design context management strategies for the optimum, not the maximum.

Related niches

Back to glossary

Loading…