Results for "model-based"
Model-Based RL
AdvancedRL using learned or known environment models.
Model-based reinforcement learning is like having a map while exploring a new city. Instead of wandering around aimlessly, you can look at the map to plan your route and make better decisions about where to go next. In this type of learning, an AI agent first learns how the environment works—like...
A continuous vector encoding of an item (word, image, user) such that semantic similarity corresponds to geometric closeness.
Mechanism that computes context-aware mixtures of representations; scales well and captures long-range dependencies.
An RNN variant using gates to mitigate vanishing gradients and capture longer context.
A datastore optimized for similarity search over embeddings, enabling semantic retrieval at scale.
Controlled experiment comparing variants by random assignment to estimate causal effects of changes.
Search algorithm for generation that keeps top-k partial sequences; can improve likelihood but reduce diversity.
A system that perceives state, selects actions, and pursues goals—often combining LLM reasoning with tools and memory.
Limiting gradient magnitude to prevent exploding gradients.
Methods for breaking goals into steps; can be classical (A*, STRIPS) or LLM-driven with tool calls.
Strategy mapping states to actions.
AI subfield dealing with understanding and generating human language, including syntax, semantics, and pragmatics.
Extending agents with long-term memory stores.
Expected return of taking action in a state.
Coordination arising without explicit programming.
Optimizing policies directly via gradient ascent on expected reward.
Categorizing AI applications by impact and regulatory risk.
Learning from data generated by a different policy.
Neural networks that operate on graph-structured data by propagating information along edges.
GNN framework where nodes iteratively exchange and aggregate messages from neighbors.
Pixel motion estimation between frames.
Repeating temporal patterns.
Maintaining two environments for instant rollback.
System that independently pursues goals over time.
Interleaving reasoning and tool use.
Sum of independent variables converges to normal distribution.
Updated belief after observing data.
Belief before observing data.
Optimization under uncertainty.
European regulation classifying AI systems by risk.
AI used in sensitive domains requiring compliance.