Saved in:
| Main Authors: | Zhang, Yi, Ni, Bolin, Chen, Xin-Sheng, Zhang, Heng-Rui, Rao, Yongming, Peng, Houwen, Lu, Qinglin, Hu, Han, Guo, Meng-Hao, Hu, Shi-Min |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.13795 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning
by: Yang, Qi, et al.
Published: (2025)
by: Yang, Qi, et al.
Published: (2025)
Xwin-LM: Strong and Scalable Alignment Practice for LLMs
by: Ni, Bolin, et al.
Published: (2024)
by: Ni, Bolin, et al.
Published: (2024)
SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs
by: Wang, Jiahui, et al.
Published: (2025)
by: Wang, Jiahui, et al.
Published: (2025)
R-Bench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation
by: Guo, Meng-Hao, et al.
Published: (2025)
by: Guo, Meng-Hao, et al.
Published: (2025)
ScalingFilter: Assessing Data Quality through Inverse Utilization of Scaling Laws
by: Li, Ruihang, et al.
Published: (2024)
by: Li, Ruihang, et al.
Published: (2024)
Unlocking the Forecasting Economy: A Suite of Datasets for the Full Lifecycle of Prediction Market: [Experiments \& Analysis]
by: Jia, Huaiyu, et al.
Published: (2026)
by: Jia, Huaiyu, et al.
Published: (2026)
FullStack Bench: Evaluating LLMs as Full Stack Coders
by: Bytedance-Seed-Foundation-Code-Team, et al.
Published: (2024)
by: Bytedance-Seed-Foundation-Code-Team, et al.
Published: (2024)
Enhancing Visual Continual Learning with Language-Guided Supervision
by: Ni, Bolin, et al.
Published: (2024)
by: Ni, Bolin, et al.
Published: (2024)
GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding
by: Hu, Rui, et al.
Published: (2025)
by: Hu, Rui, et al.
Published: (2025)
Beyond Sequential Distance: Inter-Modal Distance Invariant Position Encoding
by: Chen, Lin, et al.
Published: (2026)
by: Chen, Lin, et al.
Published: (2026)
BREEN: Bridge Data-Efficient Encoder-Free Multimodal Learning with Learnable Queries
by: Li, Tianle, et al.
Published: (2025)
by: Li, Tianle, et al.
Published: (2025)
Maxwell Demon and Einstein-Podolsky-Rosen Steering
by: Hu, Meng-Jun, et al.
Published: (2021)
by: Hu, Meng-Jun, et al.
Published: (2021)
Unlocking Gravity and Gravitational Waves with Radio Pulsars: Advances and Challenges
by: Hu, Huanchen
Published: (2025)
by: Hu, Huanchen
Published: (2025)
Towards Robust Evaluation of STEM Education: Leveraging MLLMs in Project-Based Learning
by: Wu, Xinyi, et al.
Published: (2025)
by: Wu, Xinyi, et al.
Published: (2025)
CAVALRY-V: A Large-Scale Generator Framework for Adversarial Attacks on Video MLLMs
by: Zhang, Jiaming, et al.
Published: (2025)
by: Zhang, Jiaming, et al.
Published: (2025)
String Scattering and Evolution of Ryu-Takayanagi Surface
by: Jiang, Xin, et al.
Published: (2024)
by: Jiang, Xin, et al.
Published: (2024)
HateXScore: A Metric Suite for Evaluating Reasoning Quality in Hate Speech Explanations
by: Hu, Yujia, et al.
Published: (2026)
by: Hu, Yujia, et al.
Published: (2026)
CrossView Suite: Harnessing Cross-view Spatial Intelligence of MLLMs with Dataset, Model and Benchmark
by: Wang, Wei, et al.
Published: (2026)
by: Wang, Wei, et al.
Published: (2026)
LiteWebAgent: The Open-Source Suite for VLM-Based Web-Agent Applications
by: Zhang, Danqing, et al.
Published: (2025)
by: Zhang, Danqing, et al.
Published: (2025)
Fre-Res: Frequency-Residual Video Token Compression for Efficient Video MLLMs
by: Feng, Yigui, et al.
Published: (2026)
by: Feng, Yigui, et al.
Published: (2026)
Unlocking the Forgery Detection Potential of Vanilla MLLMs: A Novel Training-Free Pipeline
by: Zuo, Rui, et al.
Published: (2025)
by: Zuo, Rui, et al.
Published: (2025)
Channel Knowledge Map Construction: Recent Advances and Open Challenges
by: Ren, Zixiang, et al.
Published: (2025)
by: Ren, Zixiang, et al.
Published: (2025)
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning
by: Zeng, Xiangyu, et al.
Published: (2024)
by: Zeng, Xiangyu, et al.
Published: (2024)
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM
by: Yuan, Yuqian, et al.
Published: (2024)
by: Yuan, Yuqian, et al.
Published: (2024)
UDTIRI: An Online Open-Source Intelligent Road Inspection Benchmark Suite
by: Guo, Sicen, et al.
Published: (2023)
by: Guo, Sicen, et al.
Published: (2023)
Full‐Color and High‐Resolution Femtosecond Laser Patterning of Perovskite Quantum Dots in Polyacrylonitrile Matrix
by: Jinming Hu, et al.
Published: (2024)
by: Jinming Hu, et al.
Published: (2024)
Towards Universal Video MLLMs with Attribute-Structured and Quality-Verified Instructions
by: Li, Yunheng, et al.
Published: (2026)
by: Li, Yunheng, et al.
Published: (2026)
Turning Generators into Retrievers: Unlocking MLLMs for Natural Language-Guided Geo-Localization
by: Chen, Yuqi, et al.
Published: (2026)
by: Chen, Yuqi, et al.
Published: (2026)
When Eyes and Ears Disagree: Can MLLMs Discern Audio-Visual Confusion?
by: Ye, Qilang, et al.
Published: (2025)
by: Ye, Qilang, et al.
Published: (2025)
RBench-V: A Primary Assessment for Visual Reasoning Models with Multi-modal Outputs
by: Guo, Meng-Hao, et al.
Published: (2025)
by: Guo, Meng-Hao, et al.
Published: (2025)
When Looking Is Not Enough: Visual Attention Structure Reveals Hallucination in MLLMs
by: Cao, Fanpu, et al.
Published: (2026)
by: Cao, Fanpu, et al.
Published: (2026)
Full‐Stack Architectures for Intelligent Brain‐Computer Interfaces
by: Hee Kyu Lee, et al.
Published: (2026)
by: Hee Kyu Lee, et al.
Published: (2026)
Common 7B Language Models Already Possess Strong Math Capabilities
by: Li, Chen, et al.
Published: (2024)
by: Li, Chen, et al.
Published: (2024)
X-Omni: Reinforcement Learning Makes Discrete Autoregressive Image Generative Models Great Again
by: Geng, Zigang, et al.
Published: (2025)
by: Geng, Zigang, et al.
Published: (2025)
Unveiling the Compositional Ability Gap in Vision-Language Reasoning Model
by: Li, Tianle, et al.
Published: (2025)
by: Li, Tianle, et al.
Published: (2025)
Taiyi: A high-performance CKKS accelerator for Practical Fully Homomorphic Encryption
by: Fan, Shengyu, et al.
Published: (2024)
by: Fan, Shengyu, et al.
Published: (2024)
Toward a worldsheet theory of entanglement entropy
by: Wu, Houwen, et al.
Published: (2025)
by: Wu, Houwen, et al.
Published: (2025)
Insight-V++: Towards Advanced Long-Chain Visual Reasoning with Multimodal Large Language Models
by: Dong, Yuhao, et al.
Published: (2026)
by: Dong, Yuhao, et al.
Published: (2026)
FullAnno: A Data Engine for Enhancing Image Comprehension of MLLMs
by: Hao, Jing, et al.
Published: (2024)
by: Hao, Jing, et al.
Published: (2024)
OpenView: Empowering MLLMs with Out-of-view VQA
by: Chen, Qixiang, et al.
Published: (2025)
by: Chen, Qixiang, et al.
Published: (2025)
Similar Items
-
R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning
by: Yang, Qi, et al.
Published: (2025) -
Xwin-LM: Strong and Scalable Alignment Practice for LLMs
by: Ni, Bolin, et al.
Published: (2024) -
SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs
by: Wang, Jiahui, et al.
Published: (2025) -
R-Bench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation
by: Guo, Meng-Hao, et al.
Published: (2025) -
ScalingFilter: Assessing Data Quality through Inverse Utilization of Scaling Laws
by: Li, Ruihang, et al.
Published: (2024)