Results for "adaptive learning rates"
A point where gradient is zero but is neither a max nor min; common in deep nets.
The shape of the loss function over parameter space.
A wide basin often correlated with better generalization.
Limiting gradient magnitude to prevent exploding gradients.
Matrix of second derivatives describing local curvature of loss.
Allows gradients to bypass layers, enabling very deep networks.
The range of functions a model can represent.
Capabilities that appear only beyond certain model sizes.
Extending agents with long-term memory stores.
Optimizing policies directly via gradient ascent on expected reward.
Multiple agents interacting cooperatively or competitively.
Models trained to decide when to call tools.
Ensuring decisions can be explained and traced.
Legal or policy requirement to explain AI decisions.
Recovering training data from gradients.
Detecting unauthorized model outputs or data leaks.
Neural networks that operate on graph-structured data by propagating information along edges.
Extension of convolution to graph domains using adjacency structure.
Graphs containing multiple node or edge types with different semantics.
GNN using attention to weight neighbor contributions dynamically.
Probabilistic graphical model for structured prediction.
Graphical model expressing factorization of a probability distribution.
Diffusion performed in latent space for efficiency.
Model that compresses input into latent space and reconstructs it.
Exact likelihood generative models using invertible transforms.
Combining signals from multiple modalities.
Generating human-like speech from text.
Changing speaker characteristics while preserving content.
Maps audio signals to linguistic units.
Identifying speakers in audio.