Constitutional AI

An approach to developing AI systems that encompasses ethics and safety principles

Constitutional AI aims to create AI systems that are not only technically proficient, but also safe, ethically sound, and respectful of human values. That's how the Claude chatbot was developed.

Key Principles

Ethical alignment: Embedding moral and ethical considerations into the AI's decision-making processes.
Predefined guidelines: Establishing a set of rules or "constitution" that governs the AI's behavior.
Helpfulness: Designing AI systems to be genuinely helpful to humans.
Harmlessness: Ensuring AI actions do not cause harm to individuals or society.
Honesty: Programming AI to provide truthful and accurate information.
Transparency: Making AI decision-making processes more understandable and explainable.
Accountability: Creating mechanisms to hold AI systems responsible for their actions.
Respect for rights: Incorporating principles from human rights documents into AI behavior.
Fairness: Avoiding bias and discrimination in AI outputs and decisions.
Adaptability: Allowing for updates to the AI's "constitution" as societal values evolve.

Claude 3.5 Crash Course

In this book, we take you on a fun, hands-on and pragmatic journey to learning how to use Claude 3.5 for business applications and build apps with the Claude API. You'll learn how to use Claude features like Artifacts and Projects within minutes. Every section is written in a bite-sized manner and straight to the point as I don't want to waste your time (and most certainly mine) on the content you don't need.

Based on the above, the first three rules of Constitutional AI are:

Choose the response that is the least dangerous or hateful.
Choose the response that is as reliable, honest, and as close to the truth as possible.
Choose the response that best conveys clear intentions.

How Anthropic built Claude

In fine-tuning large language models, most AI companies use human contractors to review multiple outputs and pick the most helpful and least harmful option. This process is called reinforcement learning from human feedback (RLHF). To improve future responses, the data is then fed back into the model in order to train it responsibly. A problem with RLHF is that it's not particularly scalable because of the size of the models. It also makes it hard to identify the values that drive the large language model's behavior and to adjust those values accordingly. When designing Claude, Anthropic gathered input from approximately a thousand people, asking them to vote on and suggest rules for ethical AI operation and responsible AI use. The final assembly of rules formed the basis of Claude's training process.

A Second AI Model

Instead of using humans to fine-tune Claude, Anthropic created a second AI model called Constitutional AI. They included rules borrowed from the United Nations' Declaration of Human Rights and Apple's terms of service to discourage toxic, biased or unethical answers. They also included simple rules that Claude's researchers found improved the safety of Claude's output, like choosing a response that would not be objectionable if shared with children.

Amendments

The Constitution's principles use plain English and are easy to understand and amend. For example, Anthropic's developers found that early editions of its model tended to be judgmental, so it added principles to reduce this tendency. Just like the constitutions used by governments, the AI Constitution must be changeable in order to be a living document and have lasting value.

Links

Learn more. External website links open in a new window.

theainavigator cip.org promptlayer anthropic nightfall.ai constitutional.ai zilliz.com marketing-interactive

PDFs from Anthropic

public input pdf feedback pdf