Saved in:
| Main Authors: | Liu, Yi, Xu, Xiao, Xu, Zeyu, Zhang, Meng, Li, Yibo, Chen, Haoyu, Zhang, Junkang, Wang, Qiang, Sun, Jifa, Lin, Siling, Cheng, Shengxun, Zhang, Lingshu, Wang, Kang |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.01540 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
E-VRAG: Enhancing Long Video Understanding with Resource-Efficient Retrieval Augmented Generation
by: Xu, Zeyu, et al.
Published: (2025)
by: Xu, Zeyu, et al.
Published: (2025)
Vision-EKIPL: External Knowledge-Infused Policy Learning for Visual Reasoning
by: Wang, Chaoyang, et al.
Published: (2025)
by: Wang, Chaoyang, et al.
Published: (2025)
Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders
by: Zhang, Boqiang, et al.
Published: (2026)
by: Zhang, Boqiang, et al.
Published: (2026)
VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning
by: Yuan, Ruifeng, et al.
Published: (2025)
by: Yuan, Ruifeng, et al.
Published: (2025)
OpenVision 2: A Family of Generative Pretrained Visual Encoders for Multimodal Learning
by: Liu, Yanqing, et al.
Published: (2025)
by: Liu, Yanqing, et al.
Published: (2025)
OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation
by: Zhang, Letian, et al.
Published: (2026)
by: Zhang, Letian, et al.
Published: (2026)
DeepSeek-VL: Towards Real-World Vision-Language Understanding
by: Lu, Haoyu, et al.
Published: (2024)
by: Lu, Haoyu, et al.
Published: (2024)
JW-VL: A Vision-Language Model for Solar Physics
by: Shao, Mingfu, et al.
Published: (2026)
by: Shao, Mingfu, et al.
Published: (2026)
A Cross-Hierarchical Difference Feature Fusion Network Based on Multiscale Encoder-Decoder for Hyperspectral Change Detection
by: Sheng, Mingshuai, et al.
Published: (2025)
by: Sheng, Mingshuai, et al.
Published: (2025)
Recall: Empowering Multimodal Embedding for Edge Devices
by: Cai, Dongqi, et al.
Published: (2024)
by: Cai, Dongqi, et al.
Published: (2024)
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception
by: Wang, Junyang, et al.
Published: (2024)
by: Wang, Junyang, et al.
Published: (2024)
VL-Explore: Zero-shot Vision-Language Exploration and Target Discovery by Mobile Robots
by: Zhang, Yuxuan, et al.
Published: (2025)
by: Zhang, Yuxuan, et al.
Published: (2025)
Multiple Information Prompt Learning for Cloth-Changing Person Re-Identification
by: Wei, Shengxun, et al.
Published: (2024)
by: Wei, Shengxun, et al.
Published: (2024)
Transformers in Pseudo-Random Number Generation: A Dual Perspective on Theory and Practice
by: Li, Ran, et al.
Published: (2025)
by: Li, Ran, et al.
Published: (2025)
EdgeMoE: Empowering Sparse Large Language Models on Mobile Devices
by: Yi, Rongjie, et al.
Published: (2023)
by: Yi, Rongjie, et al.
Published: (2023)
CoME-VL: Scaling Complementary Multi-Encoder Vision-Language Learning
by: Deria, Ankan, et al.
Published: (2026)
by: Deria, Ankan, et al.
Published: (2026)
VideoCap-R1: Enhancing MLLMs for Video Captioning via Structured Thinking
by: Meng, Desen, et al.
Published: (2025)
by: Meng, Desen, et al.
Published: (2025)
AgriGPT-VL: Agricultural Vision-Language Understanding Suite
by: Yang, Bo, et al.
Published: (2025)
by: Yang, Bo, et al.
Published: (2025)
HyperVL: An Efficient and Dynamic Multimodal Large Language Model for Edge Devices
by: HyperAI Team, et al.
Published: (2025)
by: HyperAI Team, et al.
Published: (2025)
ReDas: A Lightweight Architecture for Supporting Fine-Grained Reshaping and Multiple Dataflows on Systolic Array
by: Han, Meng, et al.
Published: (2023)
by: Han, Meng, et al.
Published: (2023)
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion
by: Chen, Jiuhai, et al.
Published: (2024)
by: Chen, Jiuhai, et al.
Published: (2024)
Towards Lightweight Graph Neural Network Search with Curriculum Graph Sparsification
by: Xie, Beini, et al.
Published: (2024)
by: Xie, Beini, et al.
Published: (2024)
VL-GenRM: Enhancing Vision-Language Verification via Vision Experts and Iterative Training
by: Zhang, Jipeng, et al.
Published: (2025)
by: Zhang, Jipeng, et al.
Published: (2025)
MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices
by: Chu, Xiangxiang, et al.
Published: (2023)
by: Chu, Xiangxiang, et al.
Published: (2023)
Uniform large deviations and metastability of random dynamical systems
by: Jiang, Jifa, et al.
Published: (2024)
by: Jiang, Jifa, et al.
Published: (2024)
Image Recognition with Online Lightweight Vision Transformer: A Survey
by: Zhang, Zherui, et al.
Published: (2025)
by: Zhang, Zherui, et al.
Published: (2025)
VL-DPO: Vision-Language-Guided Finetuning for Preference-Aligned Autonomous Driving
by: Xu, Zhefan, et al.
Published: (2026)
by: Xu, Zhefan, et al.
Published: (2026)
MobileMamba: Lightweight Multi-Receptive Visual Mamba Network
by: He, Haoyang, et al.
Published: (2024)
by: He, Haoyang, et al.
Published: (2024)
TinyChemVL: Advancing Chemical Vision-Language Models via Efficient Visual Token Reduction and Complex Reaction Tasks
by: Zhao, Xuanle, et al.
Published: (2025)
by: Zhao, Xuanle, et al.
Published: (2025)
MobileIE: An Extremely Lightweight and Effective ConvNet for Real-Time Image Enhancement on Mobile Devices
by: Yan, Hailong, et al.
Published: (2025)
by: Yan, Hailong, et al.
Published: (2025)
Stochastic Periodic Solutions for Newtonian Systems via Lyapunov Function
by: Duan, Junxia, et al.
Published: (2025)
by: Duan, Junxia, et al.
Published: (2025)
A Novel ViDAR Device With Visual Inertial Encoder Odometry and Reinforcement Learning-Based Active SLAM Method
by: Xin, Zhanhua, et al.
Published: (2025)
by: Xin, Zhanhua, et al.
Published: (2025)
LaMP-Val: Large Language Models Empower Personalized Valuation in Auction
by: Sun, Jie, et al.
Published: (2024)
by: Sun, Jie, et al.
Published: (2024)
Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision
by: Wei, Zhixiang, et al.
Published: (2026)
by: Wei, Zhixiang, et al.
Published: (2026)
Explicit Semantic-Base-Empowered Communications for 6G Mobile Networks
by: Wang, Fengyu, et al.
Published: (2024)
by: Wang, Fengyu, et al.
Published: (2024)
Qianfan-VL: Domain-Enhanced Universal Vision-Language Models
by: Dong, Daxiang, et al.
Published: (2025)
by: Dong, Daxiang, et al.
Published: (2025)
ImageRef-VL: Enabling Contextual Image Referencing in Vision-Language Models
by: Yi, Jingwei, et al.
Published: (2025)
by: Yi, Jingwei, et al.
Published: (2025)
Code2Worlds: Empowering Coding LLMs for 4D World Generation
by: Zhang, Yi, et al.
Published: (2026)
by: Zhang, Yi, et al.
Published: (2026)
Kimi-VL Technical Report
by: Kimi Team, et al.
Published: (2025)
by: Kimi Team, et al.
Published: (2025)
InternVL-X: Advancing and Accelerating InternVL Series with Efficient Visual Token Compression
by: Lu, Dongchen, et al.
Published: (2025)
by: Lu, Dongchen, et al.
Published: (2025)
Similar Items
-
E-VRAG: Enhancing Long Video Understanding with Resource-Efficient Retrieval Augmented Generation
by: Xu, Zeyu, et al.
Published: (2025) -
Vision-EKIPL: External Knowledge-Infused Policy Learning for Visual Reasoning
by: Wang, Chaoyang, et al.
Published: (2025) -
Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders
by: Zhang, Boqiang, et al.
Published: (2026) -
VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning
by: Yuan, Ruifeng, et al.
Published: (2025) -
OpenVision 2: A Family of Generative Pretrained Visual Encoders for Multimodal Learning
by: Liu, Yanqing, et al.
Published: (2025)