A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 3 Home / Browse V / Vision Transformer Vision Transformer Intermediate EN Share Print Transformer applied to image patches. AdvertisementAd space — term-top Definition Full Definition Transformer applied to image patches. Keywords patch embeddings Domains Computer Vision Related Terms CLIP related to Joint vision-language model aligning images and text. Multimodal Fusion related to Combining signals from multiple modalities. Cross-Attention related to Attention between different modalities. Optical Flow related to Pixel motion estimation between frames. Segmentation related to Assigning labels per pixel (semantic) or per instance (instance segmentation) to map object boundaries. Image Classification related to Assigning category labels to images. Instance Segmentation related to Pixel-level separation of individual object instances. Semantic Segmentation related to Pixel-wise classification of image regions.