Results for "model-based"
Model-Based RL
AdvancedRL using learned or known environment models.
Model-based reinforcement learning is like having a map while exploring a new city. Instead of wandering around aimlessly, you can look at the map to plan your route and make better decisions about where to go next. In this type of learning, an AI agent first learns how the environment works—like...
RL using learned or known environment models.
Exact likelihood generative models using invertible transforms.
Learns the score (∇ log p(x)) for generative sampling.
Predicts next state given current state and action.
Reinforcement learning from human feedback: uses preference data to train a reward model and optimize the policy.
Central system to store model versions, metadata, approvals, and deployment state.
Classifying models by impact level.
Reconstructing a model or its capabilities via API queries or leaked artifacts.
Combines value estimation (critic) with policy learning (actor).
Simple agent responding directly to inputs.
Dynamic resource allocation.
Continuous loop adjusting actions based on state feedback.
Algorithm computing control actions.
Internal representation of the agent itself.
Risk of incorrect financial models.
A preference-based training method optimizing policies directly from pairwise comparisons without explicit RL loops.
Feature attribution method grounded in cooperative game theory for explaining predictions in tabular settings.
Detecting unauthorized model outputs or data leaks.
Model trained to predict human preferences (or utility) for candidate outputs; used in RLHF-style pipelines.
Diffusion model trained to remove noise step by step.
Assigning a role or identity to the model.
Models that define an energy landscape rather than explicit probabilities.
Removing weights or neurons to shrink models and improve efficiency; can be structured or unstructured.
Probabilistic energy-based neural network with hidden variables.
Acting to minimize surprise or free energy.
RL without explicit dynamics model.
Architecture based on self-attention and feedforward layers; foundation of modern LLMs and many multimodal models.
Achieving task performance by providing a small number of examples inside the prompt without weight updates.
Generating speech audio from text, with control over prosody, speaker identity, and style.
Monte Carlo method for state estimation.