Results for "state-action value"
Expected cumulative reward from a state or state-action pair.
Expected return of taking action in a state.
Set of all actions available to the agent.
Fundamental recursive relationship defining optimal value functions.
Continuous cycle of observation, reasoning, action, and feedback.
Formal framework for sequential decision-making under uncertainty.
Predicts next state given current state and action.
Models time evolution via hidden states.
Inferring the agent’s internal state from noisy sensor data.
Strategy mapping states to actions.
Continuous loop adjusting actions based on state feedback.
Optimizing policies directly via gradient ascent on expected reward.
All possible configurations an agent may encounter.
Combines value estimation (critic) with policy learning (actor).
Directly optimizing control policies.
Learning only from current policy’s data.
Simple agent responding directly to inputs.
Learning action mapping directly from demonstrations.
Stores past attention states to speed up autoregressive decoding.
Maximum expected loss under normal conditions.
A learning paradigm where an agent interacts with an environment and learns to choose actions to maximize cumulative reward.
Optimal estimator for linear dynamic systems.
Interleaving reasoning and tool use.
Monte Carlo method for state estimation.
Sample mean converges to expected value.
Approximating expectations via random sampling.
Model optimizes objectives misaligned with human values.
Inferring and aligning with human preferences.
RL using learned or known environment models.
Reward only given upon task completion.