Results for "model-based"
Model-Based RL
AdvancedRL using learned or known environment models.
Model-based reinforcement learning is like having a map while exploring a new city. Instead of wandering around aimlessly, you can look at the map to plan your route and make better decisions about where to go next. In this type of learning, an AI agent first learns how the environment works—like...
Constraining model outputs into a schema used to call external APIs/tools safely and deterministically.
A measure of a model class’s expressive capacity based on its ability to shatter datasets.
Prevents attention to future tokens during training/inference.
Encodes token position explicitly, often via sinusoids.
GNN using attention to weight neighbor contributions dynamically.
Assigning category labels to images.
Explicit output constraints (format, tone).
Using markers to isolate context segments.
Fabrication of cases or statutes by LLMs.
Finding mathematical equations from data.
Emergence of conventions among agents.
The learned numeric values of a model adjusted during training to minimize a loss function.
The set of tokens a model can represent; impacts efficiency, multilinguality, and handling of rare strings.
End-to-end process for model training.
Model-generated content that is fluent but unsupported by evidence or incorrect; mitigated by grounding and verification.
Applying learned patterns incorrectly.
Central catalog of deployed and experimental models.
Model trained on its own outputs degrades quality.
Training with a small labeled dataset plus a larger unlabeled dataset, leveraging assumptions like smoothness/cluster structure.
Methods to set starting weights to preserve signal/gradient scales across layers.
Ordering training samples from easier to harder to improve convergence or generalization.
Pixel-level separation of individual object instances.
Artificially created data used to train/test models; helpful for privacy and coverage, risky if unrealistic.
Predicting future values from past observations.
Extension of convolution to graph domains using adjacency structure.
Agent reasoning about future outcomes.
Optimal estimator for linear dynamic systems.
Differences between training and inference conditions.
Inferring the agent’s internal state from noisy sensor data.
Control without feedback after execution begins.