Search: safety science — Dictionary of AI

Alignment Tax Advanced

Tradeoff between safety and performance.

AI Safety & Alignment

Differential Progress Intermediate

Accelerating safety relative to capabilities.

Governance & Ethics

Safety Filter Intermediate

Automated detection/prevention of disallowed outputs (toxicity, self-harm, illegal instruction, etc.).

Foundations & Theory

Safety-Critical System Advanced

Systems where failure causes physical harm.

Agents & Autonomy

Artificial Intelligence Intermediate

The field of building systems that perform tasks associated with human intelligence—perception, reasoning, language, planning, and decision-making—via algori...

Foundations & Theory

Kill Switch Intermediate

Mechanism to disable AI system.

Governance & Ethics

Safety Envelope Frontier

Hard constraints preventing unsafe actions.

World Models & Cognition

Model Release Control Intermediate

Restricting distribution of powerful models.

Governance & Ethics

Alignment Research Intermediate

Research ensuring AI remains safe.

Governance & Ethics

Formal Verification Advanced

Mathematical guarantees of system behavior.

Agents & Autonomy

Existential Risk Advanced

Risk threatening humanity’s survival.

AI Safety & Alignment

Fast Takeoff Advanced

Sudden jump to superintelligence.

AI Safety & Alignment

Model Registry Intermediate

Central system to store model versions, metadata, approvals, and deployment state.

Foundations & Theory

Time Series Intermediate

Sequential data indexed by time.

Time Series

Computational Learning Theory Intermediate

A theoretical framework analyzing what classes of functions can be learned, how efficiently, and with what guarantees.

AI Economics & Strategy

Robotics Advanced

Field combining mechanics, control, perception, and AI to build autonomous machines.

Robotics & Embodied AI

Predictive Coding Frontier

Learning by minimizing prediction error.

World Models & Cognition

Embodiment Hypothesis Advanced

Intelligence emerges from interaction with the physical world.

Agents & Autonomy

Sensorimotor Loop Advanced

Closed loop linking sensing and acting.

Agents & Autonomy

Developmental Robotics Advanced

Robots learning via exploration and growth.

Agents & Autonomy

Scientific ML Advanced

AI applied to scientific problems.

AI in Science

Materials Discovery Advanced

AI discovering new compounds/materials.

AI in Science

Cooperative Game Advanced

Agents optimize collective outcomes.

Agents & Autonomy

Nash Equilibrium Advanced

No agent benefits from unilateral deviation.

Agents & Autonomy

Information Cascades Advanced

Early signals disproportionately influence outcomes.

Dynamics & Physics

Polarization Advanced

Groups adopting extreme positions.

Dynamics & Physics

Chain-of-Thought Intermediate

Stepwise reasoning patterns that can improve multi-step tasks; often handled implicitly or summarized for safety/privacy.

Foundations & Theory

RLHF Intermediate

Reinforcement learning from human feedback: uses preference data to train a reward model and optimize the policy.

Optimization

Alignment Intermediate

Ensuring model behavior matches human goals, norms, and constraints, including reducing harmful or deceptive outputs.

Foundations & Theory

Red Teaming Intermediate

Stress-testing models for failures, vulnerabilities, policy violations, and harmful behaviors before release.

Security & Privacy

Results for "safety science"

Welcome to AI Glossary

Search

Browse

3D WordGraph