Results for "human values"
Detects trigger phrases in audio streams.
Model exploits poorly specified objectives.
Explicit output constraints (format, tone).
AI used in sensitive domains requiring compliance.
Mechanism to disable AI system.
AI systems that perceive and act in the physical world through sensors and actuators.
Inferring human goals from behavior.
Controlling robots via language.
AI tacitly coordinating prices.
A measurable property or attribute used as model input (raw or engineered), such as age, pixel intensity, or token ID.
A scalar measure optimized during training, typically expected loss over data, sometimes with regularization terms.
Plots true positive rate vs false positive rate across thresholds; summarizes separability.
Often more informative than ROC on imbalanced datasets; focuses on positive class performance.
A proper scoring rule measuring squared error of predicted probabilities for binary outcomes.
Methods to set starting weights to preserve signal/gradient scales across layers.
Nonlinear functions enabling networks to approximate complex mappings; ReLU variants dominate modern DL.
Mechanism that computes context-aware mixtures of representations; scales well and captures long-range dependencies.
Architecture based on self-attention and feedforward layers; foundation of modern LLMs and many multimodal models.
Measure of consistency across labelers; low agreement indicates ambiguous tasks or poor guidelines.
A formal privacy framework ensuring outputs do not reveal much about any single individual’s data contribution.
Standardized documentation describing intended use, performance, limitations, data, and ethical considerations.
Measures a model’s ability to fit random noise; used to bound generalization error.
Allows model to attend to information from different subspaces simultaneously.
Systematic error introduced by simplifying assumptions in a learning algorithm.
Stores past attention states to speed up autoregressive decoding.
Estimating parameters by maximizing likelihood of observed data.
Set of all actions available to the agent.
Expected cumulative reward from a state or state-action pair.
Fundamental recursive relationship defining optimal value functions.
Learning from data generated by a different policy.