Search: human values — Dictionary of AI

Value Misalignment Advanced

Model optimizes objectives misaligned with human values.

AI Safety & Alignment

Human Oversight Intermediate

Required human review for high-risk decisions.

AI Economics & Strategy

Human-in-the-Loop Intermediate

System design where humans validate or guide model outputs, especially for high-stakes decisions.

Foundations & Theory

Shared Autonomy Frontier

Control shared between human and agent.

World Models & Cognition

Human-in-the-Loop Control Frontier

Humans assist or override autonomous behavior.

World Models & Cognition

Alignment Problem Advanced

Ensuring AI systems pursue intended human goals.

AI Safety & Alignment

Value Learning Intermediate

Inferring and aligning with human preferences.

Governance & Ethics

Corrigibility Advanced

Willingness of system to accept correction or shutdown.

AI Safety & Alignment

RLHF Intermediate

Reinforcement learning from human feedback: uses preference data to train a reward model and optimize the policy.

Optimization

Alignment Intermediate

Ensuring model behavior matches human goals, norms, and constraints, including reducing harmful or deceptive outputs.

Foundations & Theory

Scalable Oversight Advanced

Using limited human feedback to guide large models.

AI Safety & Alignment

Outer Alignment Advanced

Correctly specifying goals.

AI Safety & Alignment

Alignment Research Intermediate

Research ensuring AI remains safe.

Governance & Ethics

Reward Model Intermediate

Model trained to predict human preferences (or utility) for candidate outputs; used in RLHF-style pipelines.

Foundations & Theory

Existential Risk Advanced

Risk threatening humanity’s survival.

AI Safety & Alignment

x-Risk Advanced

Existential risk from AI systems.

AI Safety & Alignment

Artificial Intelligence Intermediate

The field of building systems that perform tasks associated with human intelligence—perception, reasoning, language, planning, and decision-making—via algori...

Foundations & Theory

Gesture Recognition Frontier

Interpreting human gestures.

World Models & Cognition

NLP Intermediate

AI subfield dealing with understanding and generating human language, including syntax, semantics, and pragmatics.

Foundations & Theory

Orthogonality Thesis Advanced

Intelligence and goals are independent.

AI Safety & Alignment

Deceptive Alignment Advanced

Model behaves well during training but not deployment.

AI Safety & Alignment

Random Variable Advanced

Variable whose values depend on chance.

Probability & Statistics

DPO Intermediate

A preference-based training method optimizing policies directly from pairwise comparisons without explicit RL loops.

Optimization

Data Labeling Intermediate

Human or automated process of assigning targets; quality, consistency, and guidelines matter heavily.

Foundations & Theory

Speech Synthesis Intermediate

Generating human-like speech from text.

Speech & Audio AI

EU AI Act Intermediate

European regulation classifying AI systems by risk.

Governance & Ethics

Commonsense Physics Frontier

Human-like understanding of physical behavior.

World Models & Cognition

Teleoperation Frontier

Human controlling robot remotely.

World Models & Cognition

Developmental Robotics Advanced

Robots learning via exploration and growth.

Agents & Autonomy

Artificial General Intelligence Frontier

AI capable of performing most intellectual tasks humans can.

AGI & General Intelligence

Results for "human values"

Welcome to AI Glossary

Search

Browse

3D WordGraph