:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Weitian, Meiner, Lukas, Shubham, Rai, De La Parra, Cecilia, Kumar, Akash
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2511.21317
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MixA-Q: Revisiting Activation Sparsity for Vision Transformers from a Mixed-Precision Quantization Perspective
by: Wang, Weitian, et al.
Published: (2025)

LiteVGGT: Boosting Vanilla VGGT via Geometry-aware Cached Token Merging
by: Shu, Zhijian, et al.
Published: (2025)

VGGT-SLAM++
by: Mandal, Avilasha, et al.
Published: (2026)

HeSS: Head Sensitivity Score for Sparsity Redistribution in VGGT
by: Kim, Yongsung, et al.
Published: (2026)

PROM: Prioritize Reduction of Multiplications Over Lower Bit-Widths for Efficient CNNs
by: Meiner, Lukas, et al.
Published: (2025)

PaceVGGT: Pre-Alternating-Attention Token Pruning for Visual Geometry Transformers
by: Li, Haotang, et al.
Published: (2026)

Data-Free Dynamic Compression of CNNs for Tractable Efficiency
by: Meiner, Lukas, et al.
Published: (2023)

VGGT-HPE: Reframing Head Pose Estimation as Relative Pose Prediction
by: Vasileiou, Vasiliki, et al.
Published: (2026)

VGGT-$Ω$
by: Wang, Jianyuan, et al.
Published: (2026)

VGGT-World: Transforming VGGT into an Autoregressive Geometry World Model
by: Sun, Xiangyu, et al.
Published: (2026)

MergeTok: Unified Continuous and Discrete Visual Tokenization via Token Merging
by: Zhang, Luyuan, et al.
Published: (2026)

VGGT-X: When VGGT Meets Dense Novel View Synthesis
by: Liu, Yang, et al.
Published: (2025)

VGGT-MPR: VGGT-Enhanced Multimodal Place Recognition in Autonomous Driving Environments
by: Xu, Jingyi, et al.
Published: (2026)

FrameVGGT: Geometry-Aligned Frame-Level Memory for Bounded Streaming VGGT
by: Xu, Zhisong, et al.
Published: (2026)

Efficient Video Sampling: Pruning Temporally Redundant Tokens for Faster VLM Inference
by: Bagrov, Natan, et al.
Published: (2025)

SingingHead: A Large-scale 4D Dataset for Singing Head Animation
by: Wu, Sijing, et al.
Published: (2023)

VGGT: Visual Geometry Grounded Transformer
by: Wang, Jianyuan, et al.
Published: (2025)

STPro: Spatial and Temporal Progressive Learning for Weakly Supervised Spatio-Temporal Grounding
by: Garg, Aaryan, et al.
Published: (2025)

TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval
by: Shen, Leqi, et al.
Published: (2024)

VGGT-Long: Chunk it, Loop it, Align it -- Pushing VGGT's Limits on Kilometer-scale Long RGB Sequences
by: Deng, Kai, et al.
Published: (2025)

VGGT-360: Geometry-Consistent Zero-Shot Panoramic Depth Estimation
by: Yuan, Jiayi, et al.
Published: (2026)

4D-VGGT: A General Foundation Model with SpatioTemporal Awareness for Dynamic Scene Geometry Estimation
by: Wang, Haonan, et al.
Published: (2025)

AVGGT: Rethinking Global Attention for Accelerating VGGT
by: Sun, Xianbing, et al.
Published: (2025)

Faster Vision Mamba is Rebuilt in Minutes via Merged Token Re-training
by: Shi, Mingjia, et al.
Published: (2024)

TokenCLIP: Token-wise Prompt Learning for Zero-shot Anomaly Detection
by: Zhou, Qihang, et al.
Published: (2025)

Dense Semantic Matching with VGGT Prior
by: Yang, Songlin, et al.
Published: (2025)

Video Token Merging for Long-form Video Understanding
by: Lee, Seon-Ho, et al.
Published: (2024)

VGGT-Det: Mining VGGT Internal Priors for Sensor-Geometry-Free Multi-View Indoor 3D Object Detection
by: Cao, Yang, et al.
Published: (2026)

Reloc-VGGT: Visual Re-localization with Geometry Grounded Transformer
by: Deng, Tianchen, et al.
Published: (2025)

GPA-VGGT:Adapting VGGT to Large Scale Localization by Self-Supervised Learning with Geometry and Physics Aware Loss
by: Xu, Yangfan, et al.
Published: (2026)

ToSA: Token Merging with Spatial Awareness
by: Huang, Hsiang-Wei, et al.
Published: (2025)

Video, How Do Your Tokens Merge?
by: Pollard, Sam, et al.
Published: (2025)

Sequential Token Merging: Revisiting Hidden States
by: Wen, Yan, et al.
Published: (2025)

Contextual Self-paced Learning for Weakly Supervised Spatio-Temporal Video Grounding
by: Kumar, Akash, et al.
Published: (2025)

Similarity-Aware Token Pruning: Your VLM but Faster
by: Jeddi, Ahmadreza, et al.
Published: (2025)

VGGT-Segmentor: Geometry-Enhanced Cross-View Segmentation
by: Gao, Yulu, et al.
Published: (2026)

HD-VGGT: High-Resolution Visual Geometry Transformer
by: Chen, Tianrun, et al.
Published: (2026)

Efficient Visual Transformer by Learnable Token Merging
by: Wang, Yancheng, et al.
Published: (2024)

Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs
by: Hyun, Jeongseok, et al.
Published: (2025)

Head-wise Adaptive Rotary Positional Encoding for Fine-Grained Image Generation
by: Li, Jiaye, et al.
Published: (2025)