Results for "safety science"
Tradeoff between safety and performance.
Accelerating safety relative to capabilities.
Automated detection/prevention of disallowed outputs (toxicity, self-harm, illegal instruction, etc.).
Systems where failure causes physical harm.
The field of building systems that perform tasks associated with human intelligence—perception, reasoning, language, planning, and decision-making—via algori...
Mechanism to disable AI system.
Hard constraints preventing unsafe actions.
Restricting distribution of powerful models.
Research ensuring AI remains safe.
Mathematical guarantees of system behavior.
Risk threatening humanity’s survival.
Sudden jump to superintelligence.
Central system to store model versions, metadata, approvals, and deployment state.
Sequential data indexed by time.
A theoretical framework analyzing what classes of functions can be learned, how efficiently, and with what guarantees.
Field combining mechanics, control, perception, and AI to build autonomous machines.
Learning by minimizing prediction error.
Intelligence emerges from interaction with the physical world.
Closed loop linking sensing and acting.
Robots learning via exploration and growth.
AI applied to scientific problems.
AI discovering new compounds/materials.
Agents optimize collective outcomes.
No agent benefits from unilateral deviation.
Early signals disproportionately influence outcomes.
Groups adopting extreme positions.
Stepwise reasoning patterns that can improve multi-step tasks; often handled implicitly or summarized for safety/privacy.
Reinforcement learning from human feedback: uses preference data to train a reward model and optimize the policy.
Ensuring model behavior matches human goals, norms, and constraints, including reducing harmful or deceptive outputs.
Stress-testing models for failures, vulnerabilities, policy violations, and harmful behaviors before release.