Saved in:
| Main Authors: | Chen, Chen, Hu, ZeYang, Chen, Fengjiao, Ma, Liya, Liu, Jiaxing, Li, Xiaoyu, Wang, Ziwen, Cao, Xuezhi, Cai, Xunliang |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.18915 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
UniHetero: Could Generation Enhance Understanding for Vision-Language-Model at Large Data Scale?
by: Chen, Fengjiao, et al.
Published: (2025)
by: Chen, Fengjiao, et al.
Published: (2025)
Omni-SafetyBench: A Benchmark for Safety Evaluation of Audio-Visual Large Language Models
by: Pan, Leyi, et al.
Published: (2025)
by: Pan, Leyi, et al.
Published: (2025)
OmniFusion Technical Report
by: Goncharova, Elizaveta, et al.
Published: (2024)
by: Goncharova, Elizaveta, et al.
Published: (2024)
Empathy Omni: Enabling Empathetic Speech Response Generation through Large Language Models
by: Wang, Haoyu, et al.
Published: (2025)
by: Wang, Haoyu, et al.
Published: (2025)
METER: Multi-modal Evidence-based Thinking and Explainable Reasoning -- Algorithm and Benchmark
by: Yang, Xu, et al.
Published: (2025)
by: Yang, Xu, et al.
Published: (2025)
VideoMind: An Omni-Modal Video Dataset with Intent Grounding for Deep-Cognitive Video Understanding
by: Yang, Baoyao, et al.
Published: (2025)
by: Yang, Baoyao, et al.
Published: (2025)
EVM-QuestBench: An Execution-Grounded Benchmark for Natural-Language Transaction Code Generation
by: Yang, Pei, et al.
Published: (2026)
by: Yang, Pei, et al.
Published: (2026)
MedMemoryBench: Benchmarking Agent Memory in Personalized Healthcare
by: Wang, Yihao, et al.
Published: (2026)
by: Wang, Yihao, et al.
Published: (2026)
MMSciBench: Benchmarking Language Models on Chinese Multimodal Scientific Problems
by: Ye, Xinwu, et al.
Published: (2025)
by: Ye, Xinwu, et al.
Published: (2025)
LLaMA-Omni: Seamless Speech Interaction with Large Language Models
by: Fang, Qingkai, et al.
Published: (2024)
by: Fang, Qingkai, et al.
Published: (2024)
XferBench: a Data-Driven Benchmark for Emergent Language
by: Boldt, Brendon, et al.
Published: (2024)
by: Boldt, Brendon, et al.
Published: (2024)
PathBench: Speech Intelligibility Benchmark for Automatic Pathological Speech Assessment
by: Halpern, Bence Mark, et al.
Published: (2026)
by: Halpern, Bence Mark, et al.
Published: (2026)
Beyond Localization: A Comprehensive Diagnosis of Perspective-Conditioned Spatial Reasoning in MLLMs from Omnidirectional Images
by: Chen, Yuangong, et al.
Published: (2026)
by: Chen, Yuangong, et al.
Published: (2026)
UniC-RAG: Universal Knowledge Corruption Attacks to Retrieval-Augmented Generation
by: Geng, Runpeng, et al.
Published: (2025)
by: Geng, Runpeng, et al.
Published: (2025)
BlasBench: An Open Benchmark for Irish Speech Recognition
by: Raj, Jyoutir, et al.
Published: (2026)
by: Raj, Jyoutir, et al.
Published: (2026)
PersistBench: When Should Long-Term Memories Be Forgotten by LLMs?
by: Pulipaka, Sidharth, et al.
Published: (2026)
by: Pulipaka, Sidharth, et al.
Published: (2026)
Exploration of Augmentation Strategies in Multi-modal Retrieval-Augmented Generation for the Biomedical Domain: A Case Study Evaluating Question Answering in Glycobiology
by: Kocbek, Primož, et al.
Published: (2025)
by: Kocbek, Primož, et al.
Published: (2025)
Merge-Bench: Resolve Merge Conflicts with Large Language Models
by: Schesch, Benedikt, et al.
Published: (2026)
by: Schesch, Benedikt, et al.
Published: (2026)
ContractBench: Can LLM Agents Preserve Observation Contracts?
by: Wang, Jicheng, et al.
Published: (2026)
by: Wang, Jicheng, et al.
Published: (2026)
ContextBench: A Benchmark for Context Retrieval in Coding Agents
by: Li, Han, et al.
Published: (2026)
by: Li, Han, et al.
Published: (2026)
PaperAudit-Bench: Benchmarking Error Detection in Research Papers for Critical Automated Peer Review
by: Tu, Songjun, et al.
Published: (2026)
by: Tu, Songjun, et al.
Published: (2026)
Evaluation Before Generation: A Paradigm for Robust Multimodal Sentiment Analysis with Missing Modalities
by: Chen, Rongfei, et al.
Published: (2026)
by: Chen, Rongfei, et al.
Published: (2026)
UrduBench: An Urdu Reasoning Benchmark using Contextually Ensembled Translations with Human-in-the-Loop
by: Shafique, Muhammad Ali, et al.
Published: (2026)
by: Shafique, Muhammad Ali, et al.
Published: (2026)
Towards Greater Leverage: Scaling Laws for Efficient Mixture-of-Experts Language Models
by: Tian, Changxin, et al.
Published: (2025)
by: Tian, Changxin, et al.
Published: (2025)
Low-Resource Court Judgment Summarization for Common Law Systems
by: Liu, Shuaiqi, et al.
Published: (2024)
by: Liu, Shuaiqi, et al.
Published: (2024)
RMGAP: Benchmarking the Generalization of Reward Models across Diverse Preferences
by: Zhou, Yangyang, et al.
Published: (2026)
by: Zhou, Yangyang, et al.
Published: (2026)
EQ-Bench: An Emotional Intelligence Benchmark for Large Language Models
by: Paech, Samuel J.
Published: (2023)
by: Paech, Samuel J.
Published: (2023)
AsyncTLS: Efficient Generative LLM Inference with Asynchronous Two-level Sparse Attention
by: Hu, Yuxuan, et al.
Published: (2026)
by: Hu, Yuxuan, et al.
Published: (2026)
UA-Legal-Bench: A Benchmark for Evaluating Large Language Models on Ukrainian Legal Reasoning
by: Ovcharov, Volodymyr
Published: (2026)
by: Ovcharov, Volodymyr
Published: (2026)
EduGuardBench: A Holistic Benchmark for Evaluating the Pedagogical Fidelity and Adversarial Safety of LLMs as Simulated Teachers
by: Jiang, Yilin, et al.
Published: (2025)
by: Jiang, Yilin, et al.
Published: (2025)
PrivacyBench: A Conversational Benchmark for Evaluating Privacy in Personalized AI
by: Mukhopadhyay, Srija, et al.
Published: (2025)
by: Mukhopadhyay, Srija, et al.
Published: (2025)
CVE-Bench: A Benchmark for AI Agents' Ability to Exploit Real-World Web Application Vulnerabilities
by: Zhu, Yuxuan, et al.
Published: (2025)
by: Zhu, Yuxuan, et al.
Published: (2025)
ConfProBench: A Confidence Evaluation Benchmark for MLLM-Based Process Judges
by: Zhou, Yue, et al.
Published: (2025)
by: Zhou, Yue, et al.
Published: (2025)
Beyond Rating: A Comprehensive Evaluation and Benchmark for AI Reviews
by: Li, Bowen, et al.
Published: (2026)
by: Li, Bowen, et al.
Published: (2026)
PhysicsArena: The First Multimodal Physics Reasoning Benchmark Exploring Variable, Process, and Solution Dimensions
by: Dai, Song, et al.
Published: (2025)
by: Dai, Song, et al.
Published: (2025)
DecisionBench: A Benchmark for Emergent Delegation in Long-Horizon Agentic Workflows
by: Gao, Yuxuan, et al.
Published: (2026)
by: Gao, Yuxuan, et al.
Published: (2026)
HalalBench: A Multilingual OCR Benchmark for Food Packaging Ingredient Extraction
by: Arief, Hasan
Published: (2026)
by: Arief, Hasan
Published: (2026)
Can AI Read Between The Lines? Benchmarking LLMs On Financial Nuance
by: Kubica, Dominick, et al.
Published: (2025)
by: Kubica, Dominick, et al.
Published: (2025)
MoodBench 1.0: An Evaluation Benchmark for Emotional Companionship Dialogue Systems
by: Jing, Haifeng, et al.
Published: (2025)
by: Jing, Haifeng, et al.
Published: (2025)
OmniNeuro: A Multimodal HCI Framework for Explainable BCI Feedback via Generative AI and Sonification
by: Nia, Ayda Aghaei
Published: (2025)
by: Nia, Ayda Aghaei
Published: (2025)
Similar Items
-
UniHetero: Could Generation Enhance Understanding for Vision-Language-Model at Large Data Scale?
by: Chen, Fengjiao, et al.
Published: (2025) -
Omni-SafetyBench: A Benchmark for Safety Evaluation of Audio-Visual Large Language Models
by: Pan, Leyi, et al.
Published: (2025) -
OmniFusion Technical Report
by: Goncharova, Elizaveta, et al.
Published: (2024) -
Empathy Omni: Enabling Empathetic Speech Response Generation through Large Language Models
by: Wang, Haoyu, et al.
Published: (2025) -
METER: Multi-modal Evidence-based Thinking and Explainable Reasoning -- Algorithm and Benchmark
by: Yang, Xu, et al.
Published: (2025)