Saved in:
| Main Authors: | Yang, Eddie, Wang, Dashun |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.11898 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
A Benchmark for the Detection of Metalinguistic Disagreements between LLMs and Knowledge Graphs
by: Allen, Bradley P., et al.
Published: (2025)
by: Allen, Bradley P., et al.
Published: (2025)
The Illusion of Stochasticity in LLMs
by: Gu, Xiangming, et al.
Published: (2026)
by: Gu, Xiangming, et al.
Published: (2026)
Sci2Pol: Evaluating and Fine-tuning LLMs on Scientific-to-Policy Brief Generation
by: Wu, Weimin, et al.
Published: (2025)
by: Wu, Weimin, et al.
Published: (2025)
Papilusion at DAGPap24: Paper or Illusion? Detecting AI-generated Scientific Papers
by: Andreev, Nikita, et al.
Published: (2024)
by: Andreev, Nikita, et al.
Published: (2024)
Is LLM an Overconfident Judge? Unveiling the Capabilities of LLMs in Detecting Offensive Language with Annotation Disagreement
by: Lu, Junyu, et al.
Published: (2025)
by: Lu, Junyu, et al.
Published: (2025)
Quantifying the Benefit of Artificial Intelligence for Scientific Research
by: Gao, Jian, et al.
Published: (2023)
by: Gao, Jian, et al.
Published: (2023)
When Disagreements Elicit Robustness: Investigating Self-Repair Capabilities under LLM Multi-Agent Disagreements
by: Ju, Tianjie, et al.
Published: (2025)
by: Ju, Tianjie, et al.
Published: (2025)
LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs
by: Zhou, Yujun, et al.
Published: (2024)
by: Zhou, Yujun, et al.
Published: (2024)
Illusory VQA: Benchmarking and Enhancing Multimodal Models on Visual Illusions
by: Rostamkhani, Mohammadmostafa, et al.
Published: (2024)
by: Rostamkhani, Mohammadmostafa, et al.
Published: (2024)
The Illusion of Certainty: Uncertainty Quantification for LLMs Fails under Ambiguity
by: Tomov, Tim, et al.
Published: (2025)
by: Tomov, Tim, et al.
Published: (2025)
The Illusion-Illusion: Vision Language Models See Illusions Where There are None
by: Ullman, Tomer
Published: (2024)
by: Ullman, Tomer
Published: (2024)
The Illusion of Progress: Re-evaluating Hallucination Detection in LLMs
by: Janiak, Denis, et al.
Published: (2025)
by: Janiak, Denis, et al.
Published: (2025)
SciRerankBench: Benchmarking Rerankers Towards Scientific Retrieval-Augmented Generated LLMs
by: Chen, Haotian, et al.
Published: (2025)
by: Chen, Haotian, et al.
Published: (2025)
Leveraging Annotator Disagreement for Text Classification
by: Xu, Jin, et al.
Published: (2024)
by: Xu, Jin, et al.
Published: (2024)
Aligning LLM Uncertainty with Human Disagreement in Subjectivity Analysis
by: Lu, Junyu, et al.
Published: (2026)
by: Lu, Junyu, et al.
Published: (2026)
SciDA: Scientific Dynamic Assessor of LLMs
by: Zhou, Junting, et al.
Published: (2025)
by: Zhou, Junting, et al.
Published: (2025)
Pun Unintended: LLMs and the Illusion of Humor Understanding
by: Zangari, Alessandro, et al.
Published: (2025)
by: Zangari, Alessandro, et al.
Published: (2025)
ResearchBench: Benchmarking LLMs in Scientific Discovery via Inspiration-Based Task Decomposition
by: Liu, Yujie, et al.
Published: (2025)
by: Liu, Yujie, et al.
Published: (2025)
EarthSE: A Benchmark for Evaluating Earth Scientific Exploration Capability of LLMs
by: Xu, Wanghan, et al.
Published: (2025)
by: Xu, Wanghan, et al.
Published: (2025)
Quantifying and Predicting Disagreement in Graded Human Ratings
by: Zhang, Leixin, et al.
Published: (2026)
by: Zhang, Leixin, et al.
Published: (2026)
The Personality Illusion: Revealing Dissociation Between Self-Reports & Behavior in LLMs
by: Han, Pengrui, et al.
Published: (2025)
by: Han, Pengrui, et al.
Published: (2025)
MMCR: Benchmarking Cross-Source Reasoning in Scientific Papers
by: Tian, Yang, et al.
Published: (2025)
by: Tian, Yang, et al.
Published: (2025)
MSA at BEA 2025 Shared Task: Disagreement-Aware Instruction Tuning for Multi-Dimensional Evaluation of LLMs as Math Tutors
by: Hikal, Baraa, et al.
Published: (2025)
by: Hikal, Baraa, et al.
Published: (2025)
IllusionVQA: A Challenging Optical Illusion Dataset for Vision Language Models
by: Shahgir, Haz Sameen, et al.
Published: (2024)
by: Shahgir, Haz Sameen, et al.
Published: (2024)
The Leaderboard Illusion
by: Singh, Shivalika, et al.
Published: (2025)
by: Singh, Shivalika, et al.
Published: (2025)
NUTMEG: Separating Signal From Noise in Annotator Disagreement
by: Ivey, Jonathan, et al.
Published: (2025)
by: Ivey, Jonathan, et al.
Published: (2025)
Do Differences in Values Influence Disagreements in Online Discussions?
by: van der Meer, Michiel, et al.
Published: (2023)
by: van der Meer, Michiel, et al.
Published: (2023)
Bridging the Gap: In-Context Learning for Modeling Human Disagreement
by: Muscato, Benedetta, et al.
Published: (2025)
by: Muscato, Benedetta, et al.
Published: (2025)
From Disagreement to Understanding: The Case for Ambiguity Detection in NLI
by: Jayaweera, Chathuri, et al.
Published: (2025)
by: Jayaweera, Chathuri, et al.
Published: (2025)
The Illusion of State in State-Space Models
by: Merrill, William, et al.
Published: (2024)
by: Merrill, William, et al.
Published: (2024)
LEGOBench: Scientific Leaderboard Generation Benchmark
by: Singh, Shruti, et al.
Published: (2024)
by: Singh, Shruti, et al.
Published: (2024)
SciAssess: Benchmarking LLM Proficiency in Scientific Literature Analysis
by: Cai, Hengxing, et al.
Published: (2024)
by: Cai, Hengxing, et al.
Published: (2024)
The Gray Area: Characterizing Moderator Disagreement on Reddit
by: Alipour, Shayan, et al.
Published: (2026)
by: Alipour, Shayan, et al.
Published: (2026)
Learning-to-Context Slope: Evaluating In-Context Learning Effectiveness Beyond Performance Illusions
by: Wang, Dingzriui, et al.
Published: (2025)
by: Wang, Dingzriui, et al.
Published: (2025)
Small Changes, Large Consequences: Analyzing the Allocational Fairness of LLMs in Hiring Contexts
by: Seshadri, Preethi, et al.
Published: (2025)
by: Seshadri, Preethi, et al.
Published: (2025)
Extreme Miscalibration and the Illusion of Adversarial Robustness
by: Raina, Vyas, et al.
Published: (2024)
by: Raina, Vyas, et al.
Published: (2024)
Can AI Validate Science? Benchmarking LLMs for Accurate Scientific Claim $\rightarrow$ Evidence Reasoning
by: Javaji, Shashidhar Reddy, et al.
Published: (2025)
by: Javaji, Shashidhar Reddy, et al.
Published: (2025)
Benchmarking LLMs via Uncertainty Quantification
by: Ye, Fanghua, et al.
Published: (2024)
by: Ye, Fanghua, et al.
Published: (2024)
Beyond Consensus: Perspectivist Modeling and Evaluation of Annotator Disagreement in NLP
by: Xu, Yinuo, et al.
Published: (2026)
by: Xu, Yinuo, et al.
Published: (2026)
Disagreement as Data: Reasoning Trace Analytics in Multi-Agent Systems
by: Tajik, Elham, et al.
Published: (2026)
by: Tajik, Elham, et al.
Published: (2026)
Similar Items
-
A Benchmark for the Detection of Metalinguistic Disagreements between LLMs and Knowledge Graphs
by: Allen, Bradley P., et al.
Published: (2025) -
The Illusion of Stochasticity in LLMs
by: Gu, Xiangming, et al.
Published: (2026) -
Sci2Pol: Evaluating and Fine-tuning LLMs on Scientific-to-Policy Brief Generation
by: Wu, Weimin, et al.
Published: (2025) -
Papilusion at DAGPap24: Paper or Illusion? Detecting AI-generated Scientific Papers
by: Andreev, Nikita, et al.
Published: (2024) -
Is LLM an Overconfident Judge? Unveiling the Capabilities of LLMs in Detecting Offensive Language with Annotation Disagreement
by: Lu, Junyu, et al.
Published: (2025)