Search: safety science — Dictionary of AI

Guardrails Intermediate

Rules and controls around generation (filters, validators, structured outputs) to reduce unsafe or invalid behavior.

Reinforcement Learning

Alignment Problem Advanced

Ensuring AI systems pursue intended human goals.

AI Safety & Alignment

Reward Hacking Advanced

Maximizing reward without fulfilling real goal.

AI Safety & Alignment

Instrumental Convergence Advanced

Tendency for agents to pursue resources regardless of final goal.

AI Safety & Alignment

Value Misalignment Advanced

Model optimizes objectives misaligned with human values.

AI Safety & Alignment

Outer Alignment Advanced

Correctly specifying goals.

AI Safety & Alignment

Deceptive Alignment Advanced

Model behaves well during training but not deployment.

AI Safety & Alignment

Mesa-Optimizer Advanced

Learned subsystem that optimizes its own objective.

AI Safety & Alignment

Robust Alignment Advanced

Maintaining alignment under new conditions.

AI Safety & Alignment

Corrigibility Advanced

Willingness of system to accept correction or shutdown.

AI Safety & Alignment

EU AI Act Intermediate

European regulation classifying AI systems by risk.

Governance & Ethics

High-Risk AI System Intermediate

AI used in sensitive domains requiring compliance.

Governance & Ethics

Change Management Intermediate

Governance of model changes.

Governance & Ethics

Shared Autonomy Frontier

Control shared between human and agent.

World Models & Cognition

Physical Safety Frontier

Ensuring robots do not harm humans.

World Models & Cognition

Clinical Validation Intermediate

Testing AI under actual clinical conditions.

AI in Healthcare

FDA Clearance Intermediate

US approval process for medical AI devices.

AI in Healthcare

SaMD Intermediate

Software regulated as a medical device.

AI in Healthcare

Artificial General Intelligence Frontier

AI capable of performing most intellectual tasks humans can.

AGI & General Intelligence

x-Risk Advanced

Existential risk from AI systems.

AI Safety & Alignment

Slow Takeoff Advanced

Incremental capability growth.

AI Safety & Alignment

AI Boxing Advanced

Isolating AI systems.

AI Safety & Alignment

Tripwire Advanced

Signals indicating dangerous behavior.

AI Safety & Alignment

Shutdown Problem Advanced

Ensuring AI allows shutdown.

AI Safety & Alignment

Power-Seeking Behavior Advanced

Tendency to gain control/resources.

AI Safety & Alignment

AI Treaty Intermediate

International agreements on AI.

Governance & Ethics

Results for "safety science"

Welcome to AI Glossary

Search

Browse

3D WordGraph