Results for "model-based"
Model-Based RL
AdvancedRL using learned or known environment models.
Model-based reinforcement learning is like having a map while exploring a new city. Instead of wandering around aimlessly, you can look at the map to plan your route and make better decisions about where to go next. In this type of learning, an AI agent first learns how the environment works—like...
Chooses which experts process each token.
Software simulating physical laws.
Methods like Adam adjusting learning rates dynamically.
Deep learning system for protein structure prediction.
Predicts masked tokens in a sequence, enabling bidirectional context; often used for embeddings rather than generation.
The text (and possibly other modalities) given to an LLM to condition its output behavior.
Multiple examples included in prompt.
Asking model to review and improve output.
Learned model of environment dynamics.
Credit models with interpretable logic.
Running new model alongside production without user impact.
Retrieval based on embedding similarity rather than keyword overlap, capturing paraphrases and related concepts.
Samples from the smallest set of tokens whose probabilities sum to p, adapting set size by context.
Identifying and localizing objects in images, often with confidence scores and bounding rectangles.
Learning only from current policy’s data.
Continuous cycle of observation, reasoning, action, and feedback.
Separates planning from execution in agent architectures.
Simultaneous Localization and Mapping for robotics.
Distributed agents producing emergent intelligence.
Flat high-dimensional regions slowing training.
Guaranteed response times.
Artificial environment for training/testing agents.
Directly optimizing control policies.
Space of all possible robot configurations.
Sampling-based motion planner.
Learning by minimizing prediction error.
Software regulated as a medical device.
A function measuring prediction error (and sometimes calibration), guiding gradient-based optimization.
Learning where data arrives sequentially and the model updates continuously, often under changing distributions.
Maximum number of tokens the model can attend to in one forward pass; constrains long-document reasoning.