Results for "human values"
Model optimizes objectives misaligned with human values.
Required human review for high-risk decisions.
System design where humans validate or guide model outputs, especially for high-stakes decisions.
Control shared between human and agent.
Humans assist or override autonomous behavior.
Ensuring AI systems pursue intended human goals.
Inferring and aligning with human preferences.
Willingness of system to accept correction or shutdown.
Reinforcement learning from human feedback: uses preference data to train a reward model and optimize the policy.
Ensuring model behavior matches human goals, norms, and constraints, including reducing harmful or deceptive outputs.
Using limited human feedback to guide large models.
Correctly specifying goals.
Research ensuring AI remains safe.
Model trained to predict human preferences (or utility) for candidate outputs; used in RLHF-style pipelines.
Risk threatening humanity’s survival.
Existential risk from AI systems.
The field of building systems that perform tasks associated with human intelligence—perception, reasoning, language, planning, and decision-making—via algori...
Interpreting human gestures.
AI subfield dealing with understanding and generating human language, including syntax, semantics, and pragmatics.
Intelligence and goals are independent.
Model behaves well during training but not deployment.
Variable whose values depend on chance.
A preference-based training method optimizing policies directly from pairwise comparisons without explicit RL loops.
Human or automated process of assigning targets; quality, consistency, and guidelines matter heavily.
Generating human-like speech from text.
European regulation classifying AI systems by risk.
Human-like understanding of physical behavior.
Human controlling robot remotely.
Robots learning via exploration and growth.
AI capable of performing most intellectual tasks humans can.