Loading…
Loading…
Written by Max Zeshut
Founder at Agentmelt
A model architecture where the network contains multiple specialized sub-networks (experts), and a routing mechanism activates only a subset of experts for each input. A 400B-parameter MoE model might activate only 50B parameters per token, achieving near-frontier quality at the inference cost of a much smaller model. MoE architectures (used in models like Mixtral and reportedly in GPT-4) are why some AI agents can deliver high-quality responses at surprisingly low latency and cost—the model is large in total but efficient per-query.