Results for "learned objectives"
Ensuring learned behavior matches intended objective.
Learned subsystem that optimizes its own objective.
Configuration choices not learned directly (or not typically learned) that govern training or architecture.
Early architecture using learned gates for skip connections.
Model exploits poorly specified objectives.
The internal space where learned representations live; operations here often correlate with semantics or generative factors.
Automatically learning useful internal features (latent variables) that capture salient structure for downstream tasks.
A parameterized mapping from inputs to outputs; includes architecture + learned parameters.
Applying learned patterns incorrectly.
RL using learned or known environment models.
Learned model of environment dynamics.
Model behaves well during training but not deployment.
Maximizing reward without fulfilling real goal.
Tendency for agents to pursue resources regardless of final goal.
Model optimizes objectives misaligned with human values.
Correctly specifying goals.
Willingness of system to accept correction or shutdown.
Agents have opposing objectives.
The learned numeric values of a model adjusted during training to minimize a loss function.
Allows model to attend to information from different subspaces simultaneously.
A theoretical framework analyzing what classes of functions can be learned, how efficiently, and with what guarantees.
Extending agents with long-term memory stores.
A single attention mechanism within multi-head attention.
Models that learn to generate samples resembling training data.
Autoencoder using probabilistic latent variables and KL regularization.
Task instruction without examples.
Loss of old knowledge when learning new tasks.
Inferring reward function from observed behavior.
Learning without catastrophic forgetting.
Multiple agents interacting cooperatively or competitively.