Saved in:
| Main Authors: | Huang, Shuanghong, Xu, Jinlei, Zhou, Youchao, Zhou, Yanghao, Zhao, Xuan, Feng, Chong, Zhang, Wenxuan |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.12973 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SI-Bench: Benchmarking Social Intelligence of Large Language Models in Human-to-Human Conversations
by: Huang, Shuai, et al.
Published: (2025)
by: Huang, Shuai, et al.
Published: (2025)
SeaLLMs-Audio: Large Audio-Language Models for Southeast Asia
by: Liu, Chaoqun, et al.
Published: (2025)
by: Liu, Chaoqun, et al.
Published: (2025)
Evaluating Robustness of Large Audio Language Models to Audio Injection: An Empirical Study
by: Hou, Guanyu, et al.
Published: (2025)
by: Hou, Guanyu, et al.
Published: (2025)
Language of Thought Shapes Output Diversity in Large Language Models
by: Xu, Shaoyang, et al.
Published: (2026)
by: Xu, Shaoyang, et al.
Published: (2026)
VCB Bench: An Evaluation Benchmark for Audio-Grounded Large Language Model Conversational Agents
by: Hu, Jiliang, et al.
Published: (2025)
by: Hu, Jiliang, et al.
Published: (2025)
Disentangling Language and Culture for Evaluating Multilingual Large Language Models
by: Ying, Jiahao, et al.
Published: (2025)
by: Ying, Jiahao, et al.
Published: (2025)
Do Retrieval Augmented Language Models Know When They Don't Know?
by: Zhou, Youchao, et al.
Published: (2025)
by: Zhou, Youchao, et al.
Published: (2025)
AttackEval: How to Evaluate the Effectiveness of Jailbreak Attacking on Large Language Models
by: Shu, Dong, et al.
Published: (2024)
by: Shu, Dong, et al.
Published: (2024)
Safety Alignment of Large Language Models via Contrasting Safe and Harmful Distributions
by: Zhang, Xiaoyun, et al.
Published: (2024)
by: Zhang, Xiaoyun, et al.
Published: (2024)
Mitigating the Bias of Large Language Model Evaluation
by: Zhou, Hongli, et al.
Published: (2024)
by: Zhou, Hongli, et al.
Published: (2024)
Testing and Evaluation of Large Language Models: Correctness, Non-Toxicity, and Fairness
by: Wang, Wenxuan
Published: (2024)
by: Wang, Wenxuan
Published: (2024)
Subtopic-aware View Sampling and Temporal Aggregation for Long-form Document Matching
by: Zhou, Youchao, et al.
Published: (2024)
by: Zhou, Youchao, et al.
Published: (2024)
Character as a Latent Variable in Large Language Models: A Mechanistic Account of Emergent Misalignment and Conditional Safety Failures
by: Su, Yanghao, et al.
Published: (2026)
by: Su, Yanghao, et al.
Published: (2026)
UniversalNER: Targeted Distillation from Large Language Models for Open Named Entity Recognition
by: Zhou, Wenxuan, et al.
Published: (2023)
by: Zhou, Wenxuan, et al.
Published: (2023)
AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension
by: Yang, Qian, et al.
Published: (2024)
by: Yang, Qian, et al.
Published: (2024)
Offset Unlearning for Large Language Models
by: Huang, James Y., et al.
Published: (2024)
by: Huang, James Y., et al.
Published: (2024)
mDPO: Conditional Preference Optimization for Multimodal Large Language Models
by: Wang, Fei, et al.
Published: (2024)
by: Wang, Fei, et al.
Published: (2024)
MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation
by: Xue, Haochen, et al.
Published: (2025)
by: Xue, Haochen, et al.
Published: (2025)
ESC-Eval: Evaluating Emotion Support Conversations in Large Language Models
by: Zhao, Haiquan, et al.
Published: (2024)
by: Zhao, Haiquan, et al.
Published: (2024)
Evaluating and Enhancing Large Language Models for Conversational Reasoning on Knowledge Graphs
by: Huang, Yuxuan
Published: (2023)
by: Huang, Yuxuan
Published: (2023)
Audio Jailbreaks in Large Audio-Language Models: Taxonomy, Attack-Defense Analysis, and Cost-Aware Evaluation
by: Feng, Bo-Han, et al.
Published: (2026)
by: Feng, Bo-Han, et al.
Published: (2026)
Equipping Retrieval-Augmented Large Language Models with Document Structure Awareness
by: Xu, Lingnan, et al.
Published: (2025)
by: Xu, Lingnan, et al.
Published: (2025)
SafetyBench: Evaluating the Safety of Large Language Models
by: Zhang, Zhexin, et al.
Published: (2023)
by: Zhang, Zhexin, et al.
Published: (2023)
Calibrating the Confidence of Large Language Models by Eliciting Fidelity
by: Zhang, Mozhi, et al.
Published: (2024)
by: Zhang, Mozhi, et al.
Published: (2024)
Design, Results and Industry Implications of the World's First Insurance Large Language Model Evaluation Benchmark
by: Zhou, Hua, et al.
Published: (2025)
by: Zhou, Hua, et al.
Published: (2025)
Pruning General Large Language Models into Customized Expert Models
by: Zhao, Yirao, et al.
Published: (2025)
by: Zhao, Yirao, et al.
Published: (2025)
Words at Play: Benchmarking Audio Pun Understanding in Large Audio-Language Models
by: Su, Yuchen, et al.
Published: (2026)
by: Su, Yuchen, et al.
Published: (2026)
Rethinking the Evaluation for Conversational Recommendation in the Era of Large Language Models
by: Wang, Xiaolei, et al.
Published: (2023)
by: Wang, Xiaolei, et al.
Published: (2023)
Affect Recognition in Conversations Using Large Language Models
by: Feng, Shutong, et al.
Published: (2023)
by: Feng, Shutong, et al.
Published: (2023)
How do Large Language Models Handle Multilingualism?
by: Zhao, Yiran, et al.
Published: (2024)
by: Zhao, Yiran, et al.
Published: (2024)
Causal Tracing of Audio-Text Fusion in Large Audio Language Models
by: Chen, Wei-Chih, et al.
Published: (2026)
by: Chen, Wei-Chih, et al.
Published: (2026)
ICLEval: Evaluating In-Context Learning Ability of Large Language Models
by: Chen, Wentong, et al.
Published: (2024)
by: Chen, Wentong, et al.
Published: (2024)
Self-Debias: Self-correcting for Debiasing Large Language Models
by: Feng, Xuan, et al.
Published: (2026)
by: Feng, Xuan, et al.
Published: (2026)
The Rise of Parameter Specialization for Knowledge Storage in Large Language Models
by: Hong, Yihuai, et al.
Published: (2025)
by: Hong, Yihuai, et al.
Published: (2025)
SFTMix: Elevating Language Model Instruction Tuning with Mixup Recipe
by: Xiao, Yuxin, et al.
Published: (2024)
by: Xiao, Yuxin, et al.
Published: (2024)
Multilingual Jailbreak Challenges in Large Language Models
by: Deng, Yue, et al.
Published: (2023)
by: Deng, Yue, et al.
Published: (2023)
Evaluating Large Language Models for Radiology Natural Language Processing
by: Liu, Zhengliang, et al.
Published: (2023)
by: Liu, Zhengliang, et al.
Published: (2023)
DIFFA-2: A Practical Diffusion Large Language Model for General Audio Understanding
by: Zhou, Jiaming, et al.
Published: (2026)
by: Zhou, Jiaming, et al.
Published: (2026)
Evaluating Proactive Risk Awareness of Large Language Models
by: Luo, Xuan, et al.
Published: (2026)
by: Luo, Xuan, et al.
Published: (2026)
AHELM: A Holistic Evaluation of Audio-Language Models
by: Lee, Tony, et al.
Published: (2025)
by: Lee, Tony, et al.
Published: (2025)
Similar Items
-
SI-Bench: Benchmarking Social Intelligence of Large Language Models in Human-to-Human Conversations
by: Huang, Shuai, et al.
Published: (2025) -
SeaLLMs-Audio: Large Audio-Language Models for Southeast Asia
by: Liu, Chaoqun, et al.
Published: (2025) -
Evaluating Robustness of Large Audio Language Models to Audio Injection: An Empirical Study
by: Hou, Guanyu, et al.
Published: (2025) -
Language of Thought Shapes Output Diversity in Large Language Models
by: Xu, Shaoyang, et al.
Published: (2026) -
VCB Bench: An Evaluation Benchmark for Audio-Grounded Large Language Model Conversational Agents
by: Hu, Jiliang, et al.
Published: (2025)