Search: learned objectives — Dictionary of AI

Inner Alignment Advanced

Ensuring learned behavior matches intended objective.

AI Safety & Alignment

Mesa-Optimizer Advanced

Learned subsystem that optimizes its own objective.

AI Safety & Alignment

Hyperparameters Intermediate

Configuration choices not learned directly (or not typically learned) that govern training or architecture.

Optimization

Highway Network Intermediate

Early architecture using learned gates for skip connections.

AI Economics & Strategy

Specification Gaming Advanced

Model exploits poorly specified objectives.

AI Safety & Alignment

Latent Space Intermediate

The internal space where learned representations live; operations here often correlate with semantics or generative factors.

Foundations & Theory

Representation Learning Intermediate

Automatically learning useful internal features (latent variables) that capture salient structure for downstream tasks.

Machine Learning

Model Intermediate

A parameterized mapping from inputs to outputs; includes architecture + learned parameters.

Foundations & Theory

Overgeneralization Intermediate

Applying learned patterns incorrectly.

Model Failure Modes

Model-Based RL Advanced

RL using learned or known environment models.

Reinforcement Learning

World Model Frontier

Learned model of environment dynamics.

World Models & Cognition

Deceptive Alignment Advanced

Model behaves well during training but not deployment.

AI Safety & Alignment

Reward Hacking Advanced

Maximizing reward without fulfilling real goal.

AI Safety & Alignment

Instrumental Convergence Advanced

Tendency for agents to pursue resources regardless of final goal.

AI Safety & Alignment

Value Misalignment Advanced

Model optimizes objectives misaligned with human values.

AI Safety & Alignment

Outer Alignment Advanced

Correctly specifying goals.

AI Safety & Alignment

Corrigibility Advanced

Willingness of system to accept correction or shutdown.

AI Safety & Alignment

Competitive Game Advanced

Agents have opposing objectives.

Agents & Autonomy

Parameters Intermediate

The learned numeric values of a model adjusted during training to minimize a loss function.

Foundations & Theory

Multi-Head Attention Intermediate

Allows model to attend to information from different subspaces simultaneously.

AI Economics & Strategy

Computational Learning Theory Intermediate

A theoretical framework analyzing what classes of functions can be learned, how efficiently, and with what guarantees.

AI Economics & Strategy

Memory Augmentation Intermediate

Extending agents with long-term memory stores.

AI Economics & Strategy

Attention Head Intermediate

A single attention mechanism within multi-head attention.

AI Economics & Strategy

Generative Model Advanced

Models that learn to generate samples resembling training data.

Diffusion & Generative Models

Variational Autoencoder Advanced

Autoencoder using probabilistic latent variables and KL regularization.

Diffusion & Generative Models

Zero-Shot Prompting Intro

Task instruction without examples.

Prompting & Instructions

Catastrophic Forgetting Intermediate

Loss of old knowledge when learning new tasks.

Model Failure Modes

Inverse Reinforcement Learning Advanced

Inferring reward function from observed behavior.

Reinforcement Learning

Lifelong Learning Advanced

Learning without catastrophic forgetting.

Agents & Autonomy

Multi-Agent System Intermediate

Multiple agents interacting cooperatively or competitively.

AI Economics & Strategy

Results for "learned objectives"

Welcome to AI Glossary

Search

Browse

3D WordGraph