Saved in:
| Main Authors: | Zhu, Chenchen, Suri, Saksham, Jose, Cijo, Oquab, Maxime, Szafraniec, Marc, Wen, Wei, Xiong, Yunyang, Labatut, Patrick, Bojanowski, Piotr, Krishnamoorthi, Raghuraman, Chandra, Vikas |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.22387 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
You Don't Need Domain-Specific Data Augmentations When Scaling Self-Supervised Learning
by: Moutakanni, Théo, et al.
Published: (2024)
by: Moutakanni, Théo, et al.
Published: (2024)
Efficient Track Anything
by: Xiong, Yunyang, et al.
Published: (2024)
by: Xiong, Yunyang, et al.
Published: (2024)
DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment
by: Jose, Cijo, et al.
Published: (2024)
by: Jose, Cijo, et al.
Published: (2024)
EdgeTAM: On-Device Track Anything Model
by: Zhou, Chong, et al.
Published: (2025)
by: Zhou, Chong, et al.
Published: (2025)
Vision Transformers Need Registers
by: Darcet, Timothée, et al.
Published: (2023)
by: Darcet, Timothée, et al.
Published: (2023)
SqueezeSAM: User friendly mobile interactive segmentation
by: Varadarajan, Balakrishnan, et al.
Published: (2023)
by: Varadarajan, Balakrishnan, et al.
Published: (2023)
PathFusion: Path-consistent Lidar-Camera Deep Feature Fusion
by: Wu, Lemeng, et al.
Published: (2022)
by: Wu, Lemeng, et al.
Published: (2022)
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach
by: Vo, Huy V., et al.
Published: (2024)
by: Vo, Huy V., et al.
Published: (2024)
Small Vision-Language Models are Smart Compressors for Long Video Understanding
by: Fei, Junjie, et al.
Published: (2026)
by: Fei, Junjie, et al.
Published: (2026)
Cluster and Predict Latent Patches for Improved Masked Image Modeling
by: Darcet, Timothée, et al.
Published: (2025)
by: Darcet, Timothée, et al.
Published: (2025)
Disentangling the Factors of Convergence between Brains and Computer Vision Models
by: Raugel, Joséphine, et al.
Published: (2025)
by: Raugel, Joséphine, et al.
Published: (2025)
Back to the Features: DINO as a Foundation for Video World Models
by: Baldassarre, Federico, et al.
Published: (2025)
by: Baldassarre, Federico, et al.
Published: (2025)
Misalignment Between Backpropagation and the Hierarchy of Brain Responses to Images
by: Raugel, Joséphine, et al.
Published: (2026)
by: Raugel, Joséphine, et al.
Published: (2026)
DINOv3
by: Siméoni, Oriane, et al.
Published: (2025)
by: Siméoni, Oriane, et al.
Published: (2025)
VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice
by: Liu, Shuming, et al.
Published: (2026)
by: Liu, Shuming, et al.
Published: (2026)
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
by: Liu, Zechun, et al.
Published: (2024)
by: Liu, Zechun, et al.
Published: (2024)
Diversify, Don't Fine-Tune: Scaling Up Visual Recognition Training with Synthetic Images
by: Yu, Zhuoran, et al.
Published: (2023)
by: Yu, Zhuoran, et al.
Published: (2023)
Agent-as-a-Judge: Evaluate Agents with Agents
by: Zhuge, Mingchen, et al.
Published: (2024)
by: Zhuge, Mingchen, et al.
Published: (2024)
LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding
by: Shen, Xiaoqian, et al.
Published: (2024)
by: Shen, Xiaoqian, et al.
Published: (2024)
MobileMoE: Scaling On-Device Mixture of Experts
by: Chen, Yanbei, et al.
Published: (2026)
by: Chen, Yanbei, et al.
Published: (2026)
Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts
by: Jawahar, Ganesh, et al.
Published: (2023)
by: Jawahar, Ganesh, et al.
Published: (2023)
DepthShrinker: A New Compression Paradigm Towards Boosting Real-Hardware Efficiency of Compact Neural Networks
by: Fu, Yonggan, et al.
Published: (2022)
by: Fu, Yonggan, et al.
Published: (2022)
SpinQuant: LLM quantization with learned rotations
by: Liu, Zechun, et al.
Published: (2024)
by: Liu, Zechun, et al.
Published: (2024)
Advancing human-centric AI for robust X-ray analysis through holistic self-supervised learning
by: Moutakanni, Théo, et al.
Published: (2024)
by: Moutakanni, Théo, et al.
Published: (2024)
DINOv2: Learning Robust Visual Features without Supervision
by: Oquab, Maxime, et al.
Published: (2023)
by: Oquab, Maxime, et al.
Published: (2023)
UPLiFT: Efficient Pixel-Dense Feature Upsampling with Local Attenders
by: Walmer, Matthew, et al.
Published: (2026)
by: Walmer, Matthew, et al.
Published: (2026)
ParetoQ: Improving Scaling Laws in Extremely Low-bit LLM Quantization
by: Liu, Zechun, et al.
Published: (2025)
by: Liu, Zechun, et al.
Published: (2025)
MobileLLM-R1: Exploring the Limits of Sub-Billion Language Model Reasoners with Open Training Recipes
by: Zhao, Changsheng, et al.
Published: (2025)
by: Zhao, Changsheng, et al.
Published: (2025)
VGGT-$Ω$
by: Wang, Jianyuan, et al.
Published: (2026)
by: Wang, Jianyuan, et al.
Published: (2026)
Communication Efficient Distributed Training with Distributed Lion
by: Liu, Bo, et al.
Published: (2024)
by: Liu, Bo, et al.
Published: (2024)
LiFT: A Surprisingly Simple Lightweight Feature Transform for Dense ViT Descriptors
by: Suri, Saksham, et al.
Published: (2024)
by: Suri, Saksham, et al.
Published: (2024)
Numerical analysis of a non-clamped dynamic thermoviscoelastic contact problem
by: Bartman, Piotr, et al.
Published: (2019)
by: Bartman, Piotr, et al.
Published: (2019)
dTRPO: Trajectory Reduction in Policy Optimization of Diffusion Large Language Models
by: Zhang, Wenxuan, et al.
Published: (2026)
by: Zhang, Wenxuan, et al.
Published: (2026)
AdaVFM: Adaptive Vision Foundation Models for Edge Intelligence via LLM-Guided Execution
by: Zhao, Yiwei, et al.
Published: (2026)
by: Zhao, Yiwei, et al.
Published: (2026)
CHMv2: Improvements in Global Canopy Height Mapping using DINOv3
by: Brandt, John, et al.
Published: (2026)
by: Brandt, John, et al.
Published: (2026)
LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior
by: Wang, Hanyu, et al.
Published: (2024)
by: Wang, Hanyu, et al.
Published: (2024)
Going Down Memory Lane: Scaling Tokens for Video Stream Understanding with Dynamic KV-Cache Memory
by: Agarwal, Vatsal, et al.
Published: (2026)
by: Agarwal, Vatsal, et al.
Published: (2026)
Convergence of a double step scheme for a class of second order Clarke subdifferential inclusions
by: Bartosz, Krzysztof, et al.
Published: (2023)
by: Bartosz, Krzysztof, et al.
Published: (2023)
Better (pseudo-)labels for semi-supervised instance segmentation
by: Porcher, François, et al.
Published: (2024)
by: Porcher, François, et al.
Published: (2024)
Microstructural Study and Wear Optimization of Squeeze Stir Casted Al6061/FA/CSA/Graphite Composite Material
by: Vikas Chandra, et al.
Published: (2025)
by: Vikas Chandra, et al.
Published: (2025)
Similar Items
-
You Don't Need Domain-Specific Data Augmentations When Scaling Self-Supervised Learning
by: Moutakanni, Théo, et al.
Published: (2024) -
Efficient Track Anything
by: Xiong, Yunyang, et al.
Published: (2024) -
DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment
by: Jose, Cijo, et al.
Published: (2024) -
EdgeTAM: On-Device Track Anything Model
by: Zhou, Chong, et al.
Published: (2025) -
Vision Transformers Need Registers
by: Darcet, Timothée, et al.
Published: (2023)