Saved in:
| Main Authors: | Liang, Yinan, Wang, Ziwei, Xu, Xiuwei, Zhou, Jie, Lu, Jiwen |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.15369 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Q-VLM: Post-training Quantization for Large Vision-Language Models
by: Wang, Changyuan, et al.
Published: (2024)
by: Wang, Changyuan, et al.
Published: (2024)
3D Small Object Detection with Dynamic Spatial Pruning
by: Xu, Xiuwei, et al.
Published: (2023)
by: Xu, Xiuwei, et al.
Published: (2023)
TSP3D: Text-guided Sparse Voxel Pruning for Efficient 3D Visual Grounding
by: Guo, Wenxuan, et al.
Published: (2025)
by: Guo, Wenxuan, et al.
Published: (2025)
Towards Accurate Post-training Quantization for Diffusion Models
by: Wang, Changyuan, et al.
Published: (2023)
by: Wang, Changyuan, et al.
Published: (2023)
Anyview: Generalizable Indoor 3D Object Detection with Variable Frames
by: Wu, Zhenyu, et al.
Published: (2023)
by: Wu, Zhenyu, et al.
Published: (2023)
UniGoal: Towards Universal Zero-shot Goal-oriented Navigation
by: Yin, Hang, et al.
Published: (2025)
by: Yin, Hang, et al.
Published: (2025)
EmbodiedSAM: Online Segment Any 3D Thing in Real Time
by: Xu, Xiuwei, et al.
Published: (2024)
by: Xu, Xiuwei, et al.
Published: (2024)
MoManipVLA: Transferring Vision-language-action Models for General Mobile Manipulation
by: Wu, Zhenyu, et al.
Published: (2025)
by: Wu, Zhenyu, et al.
Published: (2025)
Memory-based Adapters for Online 3D Scene Perception
by: Xu, Xiuwei, et al.
Published: (2024)
by: Xu, Xiuwei, et al.
Published: (2024)
ATP-LLaVA: Adaptive Token Pruning for Large Vision Language Models
by: Ye, Xubing, et al.
Published: (2024)
by: Ye, Xubing, et al.
Published: (2024)
IGL-Nav: Incremental 3D Gaussian Localization for Image-goal Navigation
by: Guo, Wenxuan, et al.
Published: (2025)
by: Guo, Wenxuan, et al.
Published: (2025)
GC-VLN: Instruction as Graph Constraints for Training-free Vision-and-Language Navigation
by: Yin, Hang, et al.
Published: (2025)
by: Yin, Hang, et al.
Published: (2025)
SG-Nav: Online 3D Scene Graph Prompting for LLM-based Zero-shot Object Navigation
by: Yin, Hang, et al.
Published: (2024)
by: Yin, Hang, et al.
Published: (2024)
LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training
by: An, Xiang, et al.
Published: (2025)
by: An, Xiang, et al.
Published: (2025)
HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models
by: Huang, Runhui, et al.
Published: (2024)
by: Huang, Runhui, et al.
Published: (2024)
Continual LLaVA: Continual Instruction Tuning in Large Vision-Language Models
by: Cao, Meng, et al.
Published: (2024)
by: Cao, Meng, et al.
Published: (2024)
Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Context Sparsification
by: Huang, Wenxuan, et al.
Published: (2024)
by: Huang, Wenxuan, et al.
Published: (2024)
iGaussian: Real-Time Camera Pose Estimation via Feed-Forward 3D Gaussian Splatting Inversion
by: Wang, Hao, et al.
Published: (2025)
by: Wang, Hao, et al.
Published: (2025)
LLaVA-OneVision-2: Towards Next-Generation Perceptual Intelligence
by: An, Xiang, et al.
Published: (2026)
by: An, Xiang, et al.
Published: (2026)
AVG-LLaVA: An Efficient Large Multimodal Model with Adaptive Visual Granularity
by: Lan, Zhibin, et al.
Published: (2024)
by: Lan, Zhibin, et al.
Published: (2024)
OGGSplat: Open Gaussian Growing for Generalizable Reconstruction with Expanded Field-of-View
by: Wang, Yanbo, et al.
Published: (2025)
by: Wang, Yanbo, et al.
Published: (2025)
Chain-of-Spot: Interactive Reasoning Improves Large Vision-Language Models
by: Liu, Zuyan, et al.
Published: (2024)
by: Liu, Zuyan, et al.
Published: (2024)
LLaVA-MR: Large Language-and-Vision Assistant for Video Moment Retrieval
by: Lu, Weiheng, et al.
Published: (2024)
by: Lu, Weiheng, et al.
Published: (2024)
u-LLaVA: Unifying Multi-Modal Tasks via Large Language Model
by: Xu, Jinjin, et al.
Published: (2023)
by: Xu, Jinjin, et al.
Published: (2023)
R2RGEN: Real-to-Real 3D Data Generation for Spatially Generalized Manipulation
by: Xu, Xiuwei, et al.
Published: (2025)
by: Xu, Xiuwei, et al.
Published: (2025)
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
by: Lin, Bin, et al.
Published: (2024)
by: Lin, Bin, et al.
Published: (2024)
AwareVLN: Reasoning with Self-awareness for Vision-Language Navigation
by: Guo, Wenxuan, et al.
Published: (2026)
by: Guo, Wenxuan, et al.
Published: (2026)
GlobalMamba: Global Image Serialization for Vision Mamba
by: Wang, Chengkun, et al.
Published: (2024)
by: Wang, Chengkun, et al.
Published: (2024)
Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models
by: Zhang, Yi-Fan, et al.
Published: (2024)
by: Zhang, Yi-Fan, et al.
Published: (2024)
CarLLaVA: Vision language models for camera-only closed-loop driving
by: Renz, Katrin, et al.
Published: (2024)
by: Renz, Katrin, et al.
Published: (2024)
MC-LLaVA: Multi-Concept Personalized Vision-Language Model
by: An, Ruichuan, et al.
Published: (2025)
by: An, Ruichuan, et al.
Published: (2025)
Delta-LLaVA: Base-then-Specialize Alignment for Token-Efficient Vision-Language Models
by: Zamini, Mohamad, et al.
Published: (2025)
by: Zamini, Mohamad, et al.
Published: (2025)
LLaVA-LE: Large Language-and-Vision Assistant for Lunar Exploration
by: Inal, Gokce, et al.
Published: (2026)
by: Inal, Gokce, et al.
Published: (2026)
MC-LLaVA: Multi-Concept Personalized Vision-Language Model
by: An, Ruichuan, et al.
Published: (2024)
by: An, Ruichuan, et al.
Published: (2024)
LLaVA-OneVision: Easy Visual Task Transfer
by: Li, Bo, et al.
Published: (2024)
by: Li, Bo, et al.
Published: (2024)
MCA-LLaVA: Manhattan Causal Attention for Reducing Hallucination in Large Vision-Language Models
by: Zhao, Qiyan, et al.
Published: (2025)
by: Zhao, Qiyan, et al.
Published: (2025)
Efficient Token Pruning for LLaDA-V
by: Wan, Zhewen, et al.
Published: (2026)
by: Wan, Zhewen, et al.
Published: (2026)
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token
by: Zhang, Shaolei, et al.
Published: (2025)
by: Zhang, Shaolei, et al.
Published: (2025)
LLaVA-CoT: Let Vision Language Models Reason Step-by-Step
by: Xu, Guowei, et al.
Published: (2024)
by: Xu, Guowei, et al.
Published: (2024)
LLaVA-Ultra: Large Chinese Language and Vision Assistant for Ultrasound
by: Guo, Xuechen, et al.
Published: (2024)
by: Guo, Xuechen, et al.
Published: (2024)
Similar Items
-
Q-VLM: Post-training Quantization for Large Vision-Language Models
by: Wang, Changyuan, et al.
Published: (2024) -
3D Small Object Detection with Dynamic Spatial Pruning
by: Xu, Xiuwei, et al.
Published: (2023) -
TSP3D: Text-guided Sparse Voxel Pruning for Efficient 3D Visual Grounding
by: Guo, Wenxuan, et al.
Published: (2025) -
Towards Accurate Post-training Quantization for Diffusion Models
by: Wang, Changyuan, et al.
Published: (2023) -
Anyview: Generalizable Indoor 3D Object Detection with Variable Frames
by: Wu, Zhenyu, et al.
Published: (2023)