Results for "shared reward"
Learning only from current policy’s data.
Balancing learning new behaviors vs exploiting known rewards.
Ensuring AI systems pursue intended human goals.
Ensuring learned behavior matches intended objective.
Model behaves well during training but not deployment.
Learning policies from expert demonstrations.
Tendency to gain control/resources.
Inferring and aligning with human preferences.