Results for "adaptive learning rates"
Methods like Adam adjusting learning rates dynamically.
Popular optimizer combining momentum and per-parameter adaptive step sizes via first/second moment estimates.
Learning where data arrives sequentially and the model updates continuously, often under changing distributions.
Controls the size of parameter updates; too high diverges, too low trains slowly or gets stuck.
System-level design for general intelligence.
Flat high-dimensional regions slowing training.
Adjusting learning rate over training to improve convergence.
Ordering training samples from easier to harder to improve convergence or generalization.
Gradually increasing learning rate at training start to avoid divergence.
Optimization under uncertainty.
The relationship between inputs and outputs changes over time, requiring monitoring and model updates.
Randomizing simulation parameters to improve real-world transfer.
Using production outcomes to improve models.
Coordination arising without explicit programming.
Shift in feature distribution over time.
Imagined future trajectories.
Closed loop linking sensing and acting.
Visualization of optimization landscape.
Interleaving reasoning and tool use.
Acting to minimize surprise or free energy.
Techniques that stabilize and speed training by normalizing activations; LayerNorm is common in Transformers.
Matrix of first-order derivatives for vector-valued functions.
Nonlinear functions enabling networks to approximate complex mappings; ReLU variants dominate modern DL.
Centralized AI expertise group.
Predicting borrower default risk.
Controlled experiment comparing variants by random assignment to estimate causal effects of changes.
Optimization using curvature information; often expensive at scale.
Storing results to reduce compute.
Simulating adverse scenarios.
Selecting the most informative samples to label (e.g., uncertainty sampling) to reduce labeling cost.