Saved in:
| Main Authors: | Wang, Weitian, Meiner, Lukas, Shubham, Rai, De La Parra, Cecilia, Kumar, Akash |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.21317 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MixA-Q: Revisiting Activation Sparsity for Vision Transformers from a Mixed-Precision Quantization Perspective
by: Wang, Weitian, et al.
Published: (2025)
by: Wang, Weitian, et al.
Published: (2025)
LiteVGGT: Boosting Vanilla VGGT via Geometry-aware Cached Token Merging
by: Shu, Zhijian, et al.
Published: (2025)
by: Shu, Zhijian, et al.
Published: (2025)
VGGT-SLAM++
by: Mandal, Avilasha, et al.
Published: (2026)
by: Mandal, Avilasha, et al.
Published: (2026)
HeSS: Head Sensitivity Score for Sparsity Redistribution in VGGT
by: Kim, Yongsung, et al.
Published: (2026)
by: Kim, Yongsung, et al.
Published: (2026)
PROM: Prioritize Reduction of Multiplications Over Lower Bit-Widths for Efficient CNNs
by: Meiner, Lukas, et al.
Published: (2025)
by: Meiner, Lukas, et al.
Published: (2025)
PaceVGGT: Pre-Alternating-Attention Token Pruning for Visual Geometry Transformers
by: Li, Haotang, et al.
Published: (2026)
by: Li, Haotang, et al.
Published: (2026)
Data-Free Dynamic Compression of CNNs for Tractable Efficiency
by: Meiner, Lukas, et al.
Published: (2023)
by: Meiner, Lukas, et al.
Published: (2023)
VGGT-HPE: Reframing Head Pose Estimation as Relative Pose Prediction
by: Vasileiou, Vasiliki, et al.
Published: (2026)
by: Vasileiou, Vasiliki, et al.
Published: (2026)
VGGT-$Ω$
by: Wang, Jianyuan, et al.
Published: (2026)
by: Wang, Jianyuan, et al.
Published: (2026)
VGGT-World: Transforming VGGT into an Autoregressive Geometry World Model
by: Sun, Xiangyu, et al.
Published: (2026)
by: Sun, Xiangyu, et al.
Published: (2026)
MergeTok: Unified Continuous and Discrete Visual Tokenization via Token Merging
by: Zhang, Luyuan, et al.
Published: (2026)
by: Zhang, Luyuan, et al.
Published: (2026)
VGGT-X: When VGGT Meets Dense Novel View Synthesis
by: Liu, Yang, et al.
Published: (2025)
by: Liu, Yang, et al.
Published: (2025)
VGGT-MPR: VGGT-Enhanced Multimodal Place Recognition in Autonomous Driving Environments
by: Xu, Jingyi, et al.
Published: (2026)
by: Xu, Jingyi, et al.
Published: (2026)
FrameVGGT: Geometry-Aligned Frame-Level Memory for Bounded Streaming VGGT
by: Xu, Zhisong, et al.
Published: (2026)
by: Xu, Zhisong, et al.
Published: (2026)
Efficient Video Sampling: Pruning Temporally Redundant Tokens for Faster VLM Inference
by: Bagrov, Natan, et al.
Published: (2025)
by: Bagrov, Natan, et al.
Published: (2025)
SingingHead: A Large-scale 4D Dataset for Singing Head Animation
by: Wu, Sijing, et al.
Published: (2023)
by: Wu, Sijing, et al.
Published: (2023)
VGGT: Visual Geometry Grounded Transformer
by: Wang, Jianyuan, et al.
Published: (2025)
by: Wang, Jianyuan, et al.
Published: (2025)
STPro: Spatial and Temporal Progressive Learning for Weakly Supervised Spatio-Temporal Grounding
by: Garg, Aaryan, et al.
Published: (2025)
by: Garg, Aaryan, et al.
Published: (2025)
TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval
by: Shen, Leqi, et al.
Published: (2024)
by: Shen, Leqi, et al.
Published: (2024)
VGGT-Long: Chunk it, Loop it, Align it -- Pushing VGGT's Limits on Kilometer-scale Long RGB Sequences
by: Deng, Kai, et al.
Published: (2025)
by: Deng, Kai, et al.
Published: (2025)
VGGT-360: Geometry-Consistent Zero-Shot Panoramic Depth Estimation
by: Yuan, Jiayi, et al.
Published: (2026)
by: Yuan, Jiayi, et al.
Published: (2026)
4D-VGGT: A General Foundation Model with SpatioTemporal Awareness for Dynamic Scene Geometry Estimation
by: Wang, Haonan, et al.
Published: (2025)
by: Wang, Haonan, et al.
Published: (2025)
AVGGT: Rethinking Global Attention for Accelerating VGGT
by: Sun, Xianbing, et al.
Published: (2025)
by: Sun, Xianbing, et al.
Published: (2025)
Faster Vision Mamba is Rebuilt in Minutes via Merged Token Re-training
by: Shi, Mingjia, et al.
Published: (2024)
by: Shi, Mingjia, et al.
Published: (2024)
TokenCLIP: Token-wise Prompt Learning for Zero-shot Anomaly Detection
by: Zhou, Qihang, et al.
Published: (2025)
by: Zhou, Qihang, et al.
Published: (2025)
Dense Semantic Matching with VGGT Prior
by: Yang, Songlin, et al.
Published: (2025)
by: Yang, Songlin, et al.
Published: (2025)
Video Token Merging for Long-form Video Understanding
by: Lee, Seon-Ho, et al.
Published: (2024)
by: Lee, Seon-Ho, et al.
Published: (2024)
VGGT-Det: Mining VGGT Internal Priors for Sensor-Geometry-Free Multi-View Indoor 3D Object Detection
by: Cao, Yang, et al.
Published: (2026)
by: Cao, Yang, et al.
Published: (2026)
Reloc-VGGT: Visual Re-localization with Geometry Grounded Transformer
by: Deng, Tianchen, et al.
Published: (2025)
by: Deng, Tianchen, et al.
Published: (2025)
GPA-VGGT:Adapting VGGT to Large Scale Localization by Self-Supervised Learning with Geometry and Physics Aware Loss
by: Xu, Yangfan, et al.
Published: (2026)
by: Xu, Yangfan, et al.
Published: (2026)
ToSA: Token Merging with Spatial Awareness
by: Huang, Hsiang-Wei, et al.
Published: (2025)
by: Huang, Hsiang-Wei, et al.
Published: (2025)
Video, How Do Your Tokens Merge?
by: Pollard, Sam, et al.
Published: (2025)
by: Pollard, Sam, et al.
Published: (2025)
Sequential Token Merging: Revisiting Hidden States
by: Wen, Yan, et al.
Published: (2025)
by: Wen, Yan, et al.
Published: (2025)
Contextual Self-paced Learning for Weakly Supervised Spatio-Temporal Video Grounding
by: Kumar, Akash, et al.
Published: (2025)
by: Kumar, Akash, et al.
Published: (2025)
Similarity-Aware Token Pruning: Your VLM but Faster
by: Jeddi, Ahmadreza, et al.
Published: (2025)
by: Jeddi, Ahmadreza, et al.
Published: (2025)
VGGT-Segmentor: Geometry-Enhanced Cross-View Segmentation
by: Gao, Yulu, et al.
Published: (2026)
by: Gao, Yulu, et al.
Published: (2026)
HD-VGGT: High-Resolution Visual Geometry Transformer
by: Chen, Tianrun, et al.
Published: (2026)
by: Chen, Tianrun, et al.
Published: (2026)
Efficient Visual Transformer by Learnable Token Merging
by: Wang, Yancheng, et al.
Published: (2024)
by: Wang, Yancheng, et al.
Published: (2024)
Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs
by: Hyun, Jeongseok, et al.
Published: (2025)
by: Hyun, Jeongseok, et al.
Published: (2025)
Head-wise Adaptive Rotary Positional Encoding for Fine-Grained Image Generation
by: Li, Jiaye, et al.
Published: (2025)
by: Li, Jiaye, et al.
Published: (2025)
Similar Items
-
MixA-Q: Revisiting Activation Sparsity for Vision Transformers from a Mixed-Precision Quantization Perspective
by: Wang, Weitian, et al.
Published: (2025) -
LiteVGGT: Boosting Vanilla VGGT via Geometry-aware Cached Token Merging
by: Shu, Zhijian, et al.
Published: (2025) -
VGGT-SLAM++
by: Mandal, Avilasha, et al.
Published: (2026) -
HeSS: Head Sensitivity Score for Sparsity Redistribution in VGGT
by: Kim, Yongsung, et al.
Published: (2026) -
PROM: Prioritize Reduction of Multiplications Over Lower Bit-Widths for Efficient CNNs
by: Meiner, Lukas, et al.
Published: (2025)