:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhu, Chenchen, Suri, Saksham, Jose, Cijo, Oquab, Maxime, Szafraniec, Marc, Wen, Wei, Xiong, Yunyang, Labatut, Patrick, Bojanowski, Piotr, Krishnamoorthi, Raghuraman, Chandra, Vikas
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2603.22387
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

You Don't Need Domain-Specific Data Augmentations When Scaling Self-Supervised Learning
by: Moutakanni, Théo, et al.
Published: (2024)

Efficient Track Anything
by: Xiong, Yunyang, et al.
Published: (2024)

DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment
by: Jose, Cijo, et al.
Published: (2024)

EdgeTAM: On-Device Track Anything Model
by: Zhou, Chong, et al.
Published: (2025)

Vision Transformers Need Registers
by: Darcet, Timothée, et al.
Published: (2023)

SqueezeSAM: User friendly mobile interactive segmentation
by: Varadarajan, Balakrishnan, et al.
Published: (2023)

PathFusion: Path-consistent Lidar-Camera Deep Feature Fusion
by: Wu, Lemeng, et al.
Published: (2022)

Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach
by: Vo, Huy V., et al.
Published: (2024)

Small Vision-Language Models are Smart Compressors for Long Video Understanding
by: Fei, Junjie, et al.
Published: (2026)

Cluster and Predict Latent Patches for Improved Masked Image Modeling
by: Darcet, Timothée, et al.
Published: (2025)

Disentangling the Factors of Convergence between Brains and Computer Vision Models
by: Raugel, Joséphine, et al.
Published: (2025)

Back to the Features: DINO as a Foundation for Video World Models
by: Baldassarre, Federico, et al.
Published: (2025)

Misalignment Between Backpropagation and the Hierarchy of Brain Responses to Images
by: Raugel, Joséphine, et al.
Published: (2026)

DINOv3
by: Siméoni, Oriane, et al.
Published: (2025)

VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice
by: Liu, Shuming, et al.
Published: (2026)

MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
by: Liu, Zechun, et al.
Published: (2024)

Diversify, Don't Fine-Tune: Scaling Up Visual Recognition Training with Synthetic Images
by: Yu, Zhuoran, et al.
Published: (2023)

Agent-as-a-Judge: Evaluate Agents with Agents
by: Zhuge, Mingchen, et al.
Published: (2024)

LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding
by: Shen, Xiaoqian, et al.
Published: (2024)

MobileMoE: Scaling On-Device Mixture of Experts
by: Chen, Yanbei, et al.
Published: (2026)

Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts
by: Jawahar, Ganesh, et al.
Published: (2023)

DepthShrinker: A New Compression Paradigm Towards Boosting Real-Hardware Efficiency of Compact Neural Networks
by: Fu, Yonggan, et al.
Published: (2022)

SpinQuant: LLM quantization with learned rotations
by: Liu, Zechun, et al.
Published: (2024)

Advancing human-centric AI for robust X-ray analysis through holistic self-supervised learning
by: Moutakanni, Théo, et al.
Published: (2024)

DINOv2: Learning Robust Visual Features without Supervision
by: Oquab, Maxime, et al.
Published: (2023)

UPLiFT: Efficient Pixel-Dense Feature Upsampling with Local Attenders
by: Walmer, Matthew, et al.
Published: (2026)

ParetoQ: Improving Scaling Laws in Extremely Low-bit LLM Quantization
by: Liu, Zechun, et al.
Published: (2025)

MobileLLM-R1: Exploring the Limits of Sub-Billion Language Model Reasoners with Open Training Recipes
by: Zhao, Changsheng, et al.
Published: (2025)

VGGT-$Ω$
by: Wang, Jianyuan, et al.
Published: (2026)

Communication Efficient Distributed Training with Distributed Lion
by: Liu, Bo, et al.
Published: (2024)

LiFT: A Surprisingly Simple Lightweight Feature Transform for Dense ViT Descriptors
by: Suri, Saksham, et al.
Published: (2024)

Numerical analysis of a non-clamped dynamic thermoviscoelastic contact problem
by: Bartman, Piotr, et al.
Published: (2019)

dTRPO: Trajectory Reduction in Policy Optimization of Diffusion Large Language Models
by: Zhang, Wenxuan, et al.
Published: (2026)

AdaVFM: Adaptive Vision Foundation Models for Edge Intelligence via LLM-Guided Execution
by: Zhao, Yiwei, et al.
Published: (2026)

CHMv2: Improvements in Global Canopy Height Mapping using DINOv3
by: Brandt, John, et al.
Published: (2026)

LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior
by: Wang, Hanyu, et al.
Published: (2024)

Going Down Memory Lane: Scaling Tokens for Video Stream Understanding with Dynamic KV-Cache Memory
by: Agarwal, Vatsal, et al.
Published: (2026)

Convergence of a double step scheme for a class of second order Clarke subdifferential inclusions
by: Bartosz, Krzysztof, et al.
Published: (2023)

Better (pseudo-)labels for semi-supervised instance segmentation
by: Porcher, François, et al.
Published: (2024)

Microstructural Study and Wear Optimization of Squeeze Stir Casted Al6061/FA/CSA/Graphite Composite Material
by: Vikas Chandra, et al.
Published: (2025)