:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Guo, Zichun, Shi, Yuling, Zeng, Wenhao, Hu, Chao, Lin, Haotian, Zhuo, Terry Yue, Chen, Jiawei, Gu, Xiaodong, Ma, Wenping
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition Computation and Language
Online Access:	https://arxiv.org/abs/2604.23813
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

LastingBench: Defend Benchmarks Against Knowledge Leakage
by: Fang, Yixiong, et al.
Published: (2025)

ClassEval-Pro: A Cross-Domain Benchmark for Class-Level Code Generation
by: Chen, Yeheng, et al.
Published: (2026)

IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs
by: Ma, David, et al.
Published: (2025)

EarthSpatialBench: Benchmarking Spatial Reasoning Capabilities of Multimodal LLMs on Earth Imagery
by: Xu, Zelin, et al.
Published: (2026)

MLLM-CompBench: A Comparative Reasoning Benchmark for Multimodal LLMs
by: Kil, Jihyung, et al.
Published: (2024)

Robust Preference Alignment via Directional Neighborhood Consensus
by: Mao, Ruochen, et al.
Published: (2025)

AttentionRAG: Attention-Guided Context Pruning in Retrieval-Augmented Generation
by: Fang, Yixiong, et al.
Published: (2025)

Analyzing the Mechanism of Attention Collapse in VGGT from a Dynamics Perspective
by: Li, Huan, et al.
Published: (2025)

HiBench: Benchmarking LLMs Capability on Hierarchical Structure Reasoning
by: Jiang, Zhuohang, et al.
Published: (2025)

DARL: Encouraging Diverse Answers for General Reasoning without Verifiers
by: Huang, Chongxuan, et al.
Published: (2026)

Olapa-MCoT: Enhancing the Chinese Mathematical Reasoning Capability of LLMs
by: Zhu, Shaojie, et al.
Published: (2023)

In Line with Context: Repository-Level Code Generation via Context Inlining
by: Hu, Chao, et al.
Published: (2026)

Between Lines of Code: Unraveling the Distinct Patterns of Machine and Human Programmers
by: Shi, Yuling, et al.
Published: (2024)

Unlocking Reasoning Capabilities in LLMs via Reinforcement Learning Exploration
by: Deng, Wenhao, et al.
Published: (2025)

HEART-Bench: Do LLM Agents Exhibit Human-like Psychology?
by: Peng, Weihan, et al.
Published: (2026)

Reasoning in Trees: Improving Retrieval-Augmented Generation for Multi-Hop Question Answering
by: Shi, Yuling, et al.
Published: (2026)

CrossWordBench: Evaluating the Reasoning Capabilities of LLMs and LVLMs with Controllable Puzzle Generation
by: Leng, Jixuan, et al.
Published: (2025)

Pruning the Unsurprising: Efficient LLM Reasoning via First-Token Surprisal
by: Zeng, Wenhao, et al.
Published: (2025)

ICE-Score: Instructing Large Language Models to Evaluate Code
by: Zhuo, Terry Yue
Published: (2023)

LongCodeZip: Compress Long Context for Code Language Models
by: Shi, Yuling, et al.
Published: (2025)

Attention as a Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models
by: Liu, Runze, et al.
Published: (2025)

ISO-Bench: Benchmarking Multimodal Causal Reasoning in Visual-Language Models through Procedural Plans
by: Sadana, Ananya, et al.
Published: (2025)

LLM-KG-Bench 3.0: A Compass for SemanticTechnology Capabilities in the Ocean of LLMs
by: Meyer, Lars-Peter, et al.
Published: (2025)

SpatialGeo:Boosting Spatial Reasoning in Multimodal LLMs via Geometry-Semantics Fusion
by: Guo, Jiajie, et al.
Published: (2025)

Digital Socrates: Evaluating LLMs through Explanation Critiques
by: Gu, Yuling, et al.
Published: (2023)

Tracking the Limits of Knowledge Propagation: How LLMs Fail at Multi-Step Reasoning with Conflicting Knowledge
by: Feng, Yiyang, et al.
Published: (2026)

From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging
by: Shi, Yuling, et al.
Published: (2024)

Semantic Human Mesh Reconstruction with Textures
by: Zhan, Xiaoyu, et al.
Published: (2024)

MME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in Video Scenarios
by: Shi, Yang, et al.
Published: (2025)

Chain-of-Thought Degrades Visual Spatial Reasoning Capabilities of Multimodal LLMs
by: Kancheti, Sai Srinivas, et al.
Published: (2026)

WorldValuesBench: A Large-Scale Benchmark Dataset for Multi-Cultural Value Awareness of Language Models
by: Zhao, Wenlong, et al.
Published: (2024)

Enhancing Multimodal Large Language Models Complex Reason via Similarity Computation
by: Zhang, Xiaofeng, et al.
Published: (2024)

From Neurons to Semantics: Evaluating Cross-Linguistic Alignment Capabilities of Large Language Models via Neurons Alignment
by: Huang, Chongxuan, et al.
Published: (2025)

How Difficulty-Aware Staged Reinforcement Learning Enhances LLMs' Reasoning Capabilities: A Preliminary Experimental Study
by: Ji, Yunjie, et al.
Published: (2025)

XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models
by: Wang, Xingrui, et al.
Published: (2025)

GeoR-Bench: Evaluating Geoscience Visual Reasoning
by: Zheng, Yushuo, et al.
Published: (2026)

SWE-QA: Can Language Models Answer Repository-level Code Questions?
by: Peng, Weihan, et al.
Published: (2025)

Hidden in Plain Sight: Evaluation of the Deception Detection Capabilities of LLMs in Multimodal Settings
by: Miah, Md Messal Monem, et al.
Published: (2025)

InterveneBench: Benchmarking LLMs for Intervention Reasoning and Causal Study Design in Real Social Systems
by: Shi, Shaojie, et al.
Published: (2026)

XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts
by: Ding, Yifeng, et al.
Published: (2024)