:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhang, Yi, Ni, Bolin, Chen, Xin-Sheng, Zhang, Heng-Rui, Rao, Yongming, Peng, Houwen, Lu, Qinglin, Hu, Han, Guo, Meng-Hao, Hu, Shi-Min
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2510.13795
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning
by: Yang, Qi, et al.
Published: (2025)

Xwin-LM: Strong and Scalable Alignment Practice for LLMs
by: Ni, Bolin, et al.
Published: (2024)

SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs
by: Wang, Jiahui, et al.
Published: (2025)

R-Bench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation
by: Guo, Meng-Hao, et al.
Published: (2025)

ScalingFilter: Assessing Data Quality through Inverse Utilization of Scaling Laws
by: Li, Ruihang, et al.
Published: (2024)

Unlocking the Forecasting Economy: A Suite of Datasets for the Full Lifecycle of Prediction Market: [Experiments \& Analysis]
by: Jia, Huaiyu, et al.
Published: (2026)

FullStack Bench: Evaluating LLMs as Full Stack Coders
by: Bytedance-Seed-Foundation-Code-Team, et al.
Published: (2024)

Enhancing Visual Continual Learning with Language-Guided Supervision
by: Ni, Bolin, et al.
Published: (2024)

GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding
by: Hu, Rui, et al.
Published: (2025)

Beyond Sequential Distance: Inter-Modal Distance Invariant Position Encoding
by: Chen, Lin, et al.
Published: (2026)

BREEN: Bridge Data-Efficient Encoder-Free Multimodal Learning with Learnable Queries
by: Li, Tianle, et al.
Published: (2025)

Maxwell Demon and Einstein-Podolsky-Rosen Steering
by: Hu, Meng-Jun, et al.
Published: (2021)

Unlocking Gravity and Gravitational Waves with Radio Pulsars: Advances and Challenges
by: Hu, Huanchen
Published: (2025)

Towards Robust Evaluation of STEM Education: Leveraging MLLMs in Project-Based Learning
by: Wu, Xinyi, et al.
Published: (2025)

CAVALRY-V: A Large-Scale Generator Framework for Adversarial Attacks on Video MLLMs
by: Zhang, Jiaming, et al.
Published: (2025)

String Scattering and Evolution of Ryu-Takayanagi Surface
by: Jiang, Xin, et al.
Published: (2024)

HateXScore: A Metric Suite for Evaluating Reasoning Quality in Hate Speech Explanations
by: Hu, Yujia, et al.
Published: (2026)

CrossView Suite: Harnessing Cross-view Spatial Intelligence of MLLMs with Dataset, Model and Benchmark
by: Wang, Wei, et al.
Published: (2026)

LiteWebAgent: The Open-Source Suite for VLM-Based Web-Agent Applications
by: Zhang, Danqing, et al.
Published: (2025)

Fre-Res: Frequency-Residual Video Token Compression for Efficient Video MLLMs
by: Feng, Yigui, et al.
Published: (2026)

Unlocking the Forgery Detection Potential of Vanilla MLLMs: A Novel Training-Free Pipeline
by: Zuo, Rui, et al.
Published: (2025)

Channel Knowledge Map Construction: Recent Advances and Open Challenges
by: Ren, Zixiang, et al.
Published: (2025)

TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning
by: Zeng, Xiangyu, et al.
Published: (2024)

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM
by: Yuan, Yuqian, et al.
Published: (2024)

UDTIRI: An Online Open-Source Intelligent Road Inspection Benchmark Suite
by: Guo, Sicen, et al.
Published: (2023)

Full‐Color and High‐Resolution Femtosecond Laser Patterning of Perovskite Quantum Dots in Polyacrylonitrile Matrix
by: Jinming Hu, et al.
Published: (2024)

Towards Universal Video MLLMs with Attribute-Structured and Quality-Verified Instructions
by: Li, Yunheng, et al.
Published: (2026)

Turning Generators into Retrievers: Unlocking MLLMs for Natural Language-Guided Geo-Localization
by: Chen, Yuqi, et al.
Published: (2026)

When Eyes and Ears Disagree: Can MLLMs Discern Audio-Visual Confusion?
by: Ye, Qilang, et al.
Published: (2025)

RBench-V: A Primary Assessment for Visual Reasoning Models with Multi-modal Outputs
by: Guo, Meng-Hao, et al.
Published: (2025)

When Looking Is Not Enough: Visual Attention Structure Reveals Hallucination in MLLMs
by: Cao, Fanpu, et al.
Published: (2026)

Full‐Stack Architectures for Intelligent Brain‐Computer Interfaces
by: Hee Kyu Lee, et al.
Published: (2026)

Common 7B Language Models Already Possess Strong Math Capabilities
by: Li, Chen, et al.
Published: (2024)

X-Omni: Reinforcement Learning Makes Discrete Autoregressive Image Generative Models Great Again
by: Geng, Zigang, et al.
Published: (2025)

Unveiling the Compositional Ability Gap in Vision-Language Reasoning Model
by: Li, Tianle, et al.
Published: (2025)

Taiyi: A high-performance CKKS accelerator for Practical Fully Homomorphic Encryption
by: Fan, Shengyu, et al.
Published: (2024)

Toward a worldsheet theory of entanglement entropy
by: Wu, Houwen, et al.
Published: (2025)

Insight-V++: Towards Advanced Long-Chain Visual Reasoning with Multimodal Large Language Models
by: Dong, Yuhao, et al.
Published: (2026)

FullAnno: A Data Engine for Enhancing Image Comprehension of MLLMs
by: Hao, Jing, et al.
Published: (2024)

OpenView: Empowering MLLMs with Out-of-view VQA
by: Chen, Qixiang, et al.
Published: (2025)