Loading…
Loading…
Written by Max Zeshut
Founder at Agentmelt
A compact language model (typically 1B–15B parameters) designed to run cheaply and with low latency, often on-device or on modest GPUs. SLMs like Llama 3.1 8B, Phi-3, and Gemma handle narrow, well-defined agent tasks—classification, extraction, routing—at 10–50× lower cost than frontier models. A common production pattern uses an SLM as a first-pass router and escalates only hard cases to a large reasoning model.
Back to glossary