Results for "model-based"
Model-Based RL
AdvancedRL using learned or known environment models.
Model-based reinforcement learning is like having a map while exploring a new city. Instead of wandering around aimlessly, you can look at the map to plan your route and make better decisions about where to go next. In this type of learning, an AI agent first learns how the environment works—like...
A high-capacity language model trained on massive corpora, exhibiting broad generalization and emergent behaviors.
Built-in assumptions guiding learning efficiency and generalization.
Extracting system prompts or hidden instructions.
Model relies on irrelevant signals.
Using production outcomes to improve models.
Learning physical parameters from data.
Fast approximation of costly simulations.
Local surrogate explanation method approximating model behavior near a specific input.
Standardized documentation describing intended use, performance, limitations, data, and ethical considerations.
Inferring sensitive features of training data.
Required descriptions of model behavior and limits.
Architecture that retrieves relevant documents (e.g., from a vector DB) and conditions generation on them to reduce hallucinations.
Models that process or generate multiple modalities, enabling vision-language tasks, speech, video understanding, etc.
Routes inputs to subsets of parameters for scalable capacity.
A single attention mechanism within multi-head attention.
Joint vision-language model aligning images and text.
Models trained to decide when to call tools.
Temporary reasoning space (often hidden).
Probabilities do not reflect true correctness.
Restricting distribution of powerful models.
A parameterized mapping from inputs to outputs; includes architecture + learned parameters.
When a model cannot capture underlying structure, performing poorly on both training and test data.
Policies and practices for approving, monitoring, auditing, and documenting models in production.
Training a smaller “student” model to mimic a larger “teacher,” often improving efficiency while retaining performance.
Framework for identifying, measuring, and mitigating model risks.
Competitive advantage from proprietary models/data.
One complete traversal of the training dataset during training.
Injects sequence order into Transformers, since attention alone is permutation-invariant.
Letting an LLM call external functions/APIs to fetch data, compute, or take actions, improving reliability.
Raw model outputs before converting to probabilities; manipulated during decoding and calibration.