How does Constitutional AI differ from RLHF?

RLHF trains models using human preference ratings on specific outputs. Constitutional AI adds a layer where the model critiques its own outputs against a set of written principles, generating training signal without needing human feedback for every case. In practice, modern models use both techniques together.

Constitutional AI

Written by Max Zeshut

Founder at Agentmelt

A safety and alignment methodology where an AI model is trained to follow a set of principles (a 'constitution') that govern its behavior—such as being helpful, harmless, and honest. Instead of relying solely on human feedback for every possible scenario, the model uses its principles to self-evaluate and improve its responses. Developed by Anthropic, Constitutional AI is the foundation for building agents that refuse harmful requests, avoid deception, and maintain consistent ethical behavior at scale.

Часто задаваемые вопросы

How does Constitutional AI differ from RLHF?: RLHF trains models using human preference ratings on specific outputs. Constitutional AI adds a layer where the model critiques its own outputs against a set of written principles, generating training signal without needing human feedback for every case. In practice, modern models use both techniques together.

Связанные ниши

Назад в глоссарий

Loading…