When should you use LoRA vs. prompt engineering?

Use prompt engineering first—it requires no training and works immediately. Use LoRA when prompt engineering can't capture the behavior you need: matching a very specific tone or style, improving accuracy on domain-specific tasks by 10%+, or reducing token usage by baking instructions into the model weights instead of repeating them in every prompt.

LoRA (Low-Rank Adaptation)

Written by Max Zeshut

Founder at Agentmelt

A parameter-efficient fine-tuning technique that trains a small set of adapter weights (typically 0.1–1% of the full model) rather than updating all model parameters. LoRA enables domain-specific customization of large language models at a fraction of the cost and time of full fine-tuning—hours instead of days, $50–$500 instead of $10,000+. Multiple LoRA adapters can be swapped on a single base model, letting one deployment serve different agent behaviors (support tone, sales style, legal precision) by loading the appropriate adapter.

Example

A support team fine-tunes a base model with LoRA using 5,000 examples of their ideal ticket responses. Training takes 3 hours on a single GPU and costs $80. The resulting adapter is 50MB—easily versioned, deployed, and rolled back if quality regresses.

Frequently asked questions

When should you use LoRA vs. prompt engineering?: Use prompt engineering first—it requires no training and works immediately. Use LoRA when prompt engineering can't capture the behavior you need: matching a very specific tone or style, improving accuracy on domain-specific tasks by 10%+, or reducing token usage by baking instructions into the model weights instead of repeating them in every prompt.

Related niches

Back to glossary

Loading…