Results for "text+image+audio"
Attacks that manipulate model instructions (especially via retrieved content) to override system goals or exfiltrate data.
AI subfield dealing with understanding and generating human language, including syntax, semantics, and pragmatics.
Prevents attention to future tokens during training/inference.
Stores past attention states to speed up autoregressive decoding.
AI supporting legal research, drafting, and analysis.
A continuous vector encoding of an item (word, image, user) such that semantic similarity corresponds to geometric closeness.
Networks using convolution operations with weight sharing and locality, effective for images and signals.
A dataset + metric suite for comparing models; can be gamed or misaligned with real-world goals.
AI focused on interpreting images/video: classification, detection, segmentation, tracking, and 3D understanding.
Identifying and localizing objects in images, often with confidence scores and bounding rectangles.
Early architecture using learned gates for skip connections.
Allows model to attend to information from different subspaces simultaneously.
Routes inputs to subsets of parameters for scalable capacity.
Autoencoder using probabilistic latent variables and KL regularization.
Two-network setup where generator fools a discriminator.
Pixel-level separation of individual object instances.
Pixel motion estimation between frames.
Decomposes a matrix into orthogonal components; used in embeddings and compression.
Software pipeline converting raw sensor data into structured representations.
AI applied to X-rays, CT, MRI, ultrasound, pathology slides.
Automated assistance identifying disease indicators.