Results for "safety science"
Rules and controls around generation (filters, validators, structured outputs) to reduce unsafe or invalid behavior.
Ensuring AI systems pursue intended human goals.
Maximizing reward without fulfilling real goal.
Tendency for agents to pursue resources regardless of final goal.
Model optimizes objectives misaligned with human values.
Correctly specifying goals.
Model behaves well during training but not deployment.
Learned subsystem that optimizes its own objective.
Maintaining alignment under new conditions.
Willingness of system to accept correction or shutdown.
European regulation classifying AI systems by risk.
AI used in sensitive domains requiring compliance.
Governance of model changes.
Control shared between human and agent.
Ensuring robots do not harm humans.
Testing AI under actual clinical conditions.
US approval process for medical AI devices.
Software regulated as a medical device.
AI capable of performing most intellectual tasks humans can.
Existential risk from AI systems.
Incremental capability growth.
Isolating AI systems.
Signals indicating dangerous behavior.
Ensuring AI allows shutdown.
Tendency to gain control/resources.
International agreements on AI.