:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Liu, Yi, Xu, Xiao, Xu, Zeyu, Zhang, Meng, Li, Yibo, Chen, Haoyu, Zhang, Junkang, Wang, Qiang, Sun, Jifa, Lin, Siling, Cheng, Shengxun, Zhang, Lingshu, Wang, Kang
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2508.01540
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

E-VRAG: Enhancing Long Video Understanding with Resource-Efficient Retrieval Augmented Generation
by: Xu, Zeyu, et al.
Published: (2025)

Vision-EKIPL: External Knowledge-Infused Policy Learning for Visual Reasoning
by: Wang, Chaoyang, et al.
Published: (2025)

Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders
by: Zhang, Boqiang, et al.
Published: (2026)

VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning
by: Yuan, Ruifeng, et al.
Published: (2025)

OpenVision 2: A Family of Generative Pretrained Visual Encoders for Multimodal Learning
by: Liu, Yanqing, et al.
Published: (2025)

OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation
by: Zhang, Letian, et al.
Published: (2026)

DeepSeek-VL: Towards Real-World Vision-Language Understanding
by: Lu, Haoyu, et al.
Published: (2024)

JW-VL: A Vision-Language Model for Solar Physics
by: Shao, Mingfu, et al.
Published: (2026)

A Cross-Hierarchical Difference Feature Fusion Network Based on Multiscale Encoder-Decoder for Hyperspectral Change Detection
by: Sheng, Mingshuai, et al.
Published: (2025)

Recall: Empowering Multimodal Embedding for Edge Devices
by: Cai, Dongqi, et al.
Published: (2024)

Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception
by: Wang, Junyang, et al.
Published: (2024)

VL-Explore: Zero-shot Vision-Language Exploration and Target Discovery by Mobile Robots
by: Zhang, Yuxuan, et al.
Published: (2025)

Multiple Information Prompt Learning for Cloth-Changing Person Re-Identification
by: Wei, Shengxun, et al.
Published: (2024)

Transformers in Pseudo-Random Number Generation: A Dual Perspective on Theory and Practice
by: Li, Ran, et al.
Published: (2025)

EdgeMoE: Empowering Sparse Large Language Models on Mobile Devices
by: Yi, Rongjie, et al.
Published: (2023)

CoME-VL: Scaling Complementary Multi-Encoder Vision-Language Learning
by: Deria, Ankan, et al.
Published: (2026)

VideoCap-R1: Enhancing MLLMs for Video Captioning via Structured Thinking
by: Meng, Desen, et al.
Published: (2025)

AgriGPT-VL: Agricultural Vision-Language Understanding Suite
by: Yang, Bo, et al.
Published: (2025)

HyperVL: An Efficient and Dynamic Multimodal Large Language Model for Edge Devices
by: HyperAI Team, et al.
Published: (2025)

ReDas: A Lightweight Architecture for Supporting Fine-Grained Reshaping and Multiple Dataflows on Systolic Array
by: Han, Meng, et al.
Published: (2023)

Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion
by: Chen, Jiuhai, et al.
Published: (2024)

Towards Lightweight Graph Neural Network Search with Curriculum Graph Sparsification
by: Xie, Beini, et al.
Published: (2024)

VL-GenRM: Enhancing Vision-Language Verification via Vision Experts and Iterative Training
by: Zhang, Jipeng, et al.
Published: (2025)

MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices
by: Chu, Xiangxiang, et al.
Published: (2023)

Uniform large deviations and metastability of random dynamical systems
by: Jiang, Jifa, et al.
Published: (2024)

Image Recognition with Online Lightweight Vision Transformer: A Survey
by: Zhang, Zherui, et al.
Published: (2025)

VL-DPO: Vision-Language-Guided Finetuning for Preference-Aligned Autonomous Driving
by: Xu, Zhefan, et al.
Published: (2026)

MobileMamba: Lightweight Multi-Receptive Visual Mamba Network
by: He, Haoyang, et al.
Published: (2024)

TinyChemVL: Advancing Chemical Vision-Language Models via Efficient Visual Token Reduction and Complex Reaction Tasks
by: Zhao, Xuanle, et al.
Published: (2025)

MobileIE: An Extremely Lightweight and Effective ConvNet for Real-Time Image Enhancement on Mobile Devices
by: Yan, Hailong, et al.
Published: (2025)

Stochastic Periodic Solutions for Newtonian Systems via Lyapunov Function
by: Duan, Junxia, et al.
Published: (2025)

A Novel ViDAR Device With Visual Inertial Encoder Odometry and Reinforcement Learning-Based Active SLAM Method
by: Xin, Zhanhua, et al.
Published: (2025)

LaMP-Val: Large Language Models Empower Personalized Valuation in Auction
by: Sun, Jie, et al.
Published: (2024)

Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision
by: Wei, Zhixiang, et al.
Published: (2026)

Explicit Semantic-Base-Empowered Communications for 6G Mobile Networks
by: Wang, Fengyu, et al.
Published: (2024)

Qianfan-VL: Domain-Enhanced Universal Vision-Language Models
by: Dong, Daxiang, et al.
Published: (2025)

ImageRef-VL: Enabling Contextual Image Referencing in Vision-Language Models
by: Yi, Jingwei, et al.
Published: (2025)

Code2Worlds: Empowering Coding LLMs for 4D World Generation
by: Zhang, Yi, et al.
Published: (2026)

Kimi-VL Technical Report
by: Kimi Team, et al.
Published: (2025)

InternVL-X: Advancing and Accelerating InternVL Series with Efficient Visual Token Compression
by: Lu, Dongchen, et al.
Published: (2025)