Saved in:
| Main Authors: | Zhou, Xiaolin, Luo, Zheng, Gao, Yicheng, Chen, Qixuan, Hu, Xiyang, Zhao, Yue, Liu, Ruishan |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.13649 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MedExAgent: Training LLM Agents to Ask, Examine, and Diagnose in Noisy Clinical Environments
by: Gao, Yicheng, et al.
Published: (2026)
by: Gao, Yicheng, et al.
Published: (2026)
Quantifying and Mitigating Self-Preference Bias of LLM Judges
by: Yang, Jinming, et al.
Published: (2026)
by: Yang, Jinming, et al.
Published: (2026)
Judging the Judges: A Systematic Study of Position Bias in LLM-as-a-Judge
by: Shi, Lin, et al.
Published: (2024)
by: Shi, Lin, et al.
Published: (2024)
Towards More Accurate US Presidential Election via Multi-step Reasoning with Large Language Models
by: Yu, Chenxiao, et al.
Published: (2024)
by: Yu, Chenxiao, et al.
Published: (2024)
BiasScope: Towards Automated Detection of Bias in LLM-as-a-Judge Evaluation
by: Lai, Peng, et al.
Published: (2026)
by: Lai, Peng, et al.
Published: (2026)
Decoding Biases: Automated Methods and LLM Judges for Gender Bias Detection in Language Models
by: Kumar, Shachi H, et al.
Published: (2024)
by: Kumar, Shachi H, et al.
Published: (2024)
AD-LLM: Benchmarking Large Language Models for Anomaly Detection
by: Yang, Tiankai, et al.
Published: (2024)
by: Yang, Tiankai, et al.
Published: (2024)
MLLM-as-a-Judge: Assessing Multimodal LLM-as-a-Judge with Vision-Language Benchmark
by: Chen, Dongping, et al.
Published: (2024)
by: Chen, Dongping, et al.
Published: (2024)
Contrastive Decoding Mitigates Score Range Bias in LLM-as-a-Judge
by: Fujinuma, Yoshinari
Published: (2025)
by: Fujinuma, Yoshinari
Published: (2025)
Multimodal Generative Engine Optimization: Rank Manipulation for Vision-Language Model Rankers
by: Du, Yixuan, et al.
Published: (2026)
by: Du, Yixuan, et al.
Published: (2026)
Enhancing Large Language Models for Mobility Analytics with Semantic Location Tokenization
by: Chen, Yile, et al.
Published: (2025)
by: Chen, Yile, et al.
Published: (2025)
Mitigating Translationese Bias in Multilingual LLM-as-a-Judge via Disentangled Information Bottleneck
by: Zhang, Hongbin, et al.
Published: (2026)
by: Zhang, Hongbin, et al.
Published: (2026)
StealthRank: LLM Ranking Manipulation via Stealthy Prompt Optimization
by: Tang, Yiming, et al.
Published: (2025)
by: Tang, Yiming, et al.
Published: (2025)
BadJudge: Backdoor Vulnerabilities of LLM-as-a-Judge
by: Tong, Terry, et al.
Published: (2025)
by: Tong, Terry, et al.
Published: (2025)
Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge
by: Ye, Jiayi, et al.
Published: (2024)
by: Ye, Jiayi, et al.
Published: (2024)
Dynamics of Adversarial Attacks on Large Language Model-Based Search Engines
by: Hu, Xiyang
Published: (2025)
by: Hu, Xiyang
Published: (2025)
Lost in Execution: On the Multilingual Robustness of Tool Calling in Large Language Models
by: Luo, Zheng, et al.
Published: (2026)
by: Luo, Zheng, et al.
Published: (2026)
Investigating Non-Transitivity in LLM-as-a-Judge
by: Xu, Yi, et al.
Published: (2025)
by: Xu, Yi, et al.
Published: (2025)
Benchmarking Adversarial Robustness to Bias Elicitation in Large Language Models: Scalable Automated Assessment with LLM-as-a-Judge
by: Cantini, Riccardo, et al.
Published: (2025)
by: Cantini, Riccardo, et al.
Published: (2025)
TrustJudge: Inconsistencies of LLM-as-a-Judge and How to Alleviate Them
by: Wang, Yidong, et al.
Published: (2025)
by: Wang, Yidong, et al.
Published: (2025)
HarmMetric Eval: Benchmarking Metrics and Judges for LLM Harmfulness Assessment
by: Yang, Langqi, et al.
Published: (2025)
by: Yang, Langqi, et al.
Published: (2025)
Persuasiveness and Bias in LLM: Investigating the Impact of Persuasiveness and Reinforcement of Bias in Language Models
by: Roy, Saumya
Published: (2025)
by: Roy, Saumya
Published: (2025)
Counterfactual Trace Auditing of LLM Agent Skills
by: Zhou, Xiaolin, et al.
Published: (2026)
by: Zhou, Xiaolin, et al.
Published: (2026)
RLearner-LLM: Balancing Logical Grounding and Fluency in Large Language Models via Hybrid Direct Preference Optimization
by: Bao, Qiming, et al.
Published: (2026)
by: Bao, Qiming, et al.
Published: (2026)
Play Favorites: A Statistical Method to Measure Self-Bias in LLM-as-a-Judge
by: Spiliopoulou, Evangelia, et al.
Published: (2025)
by: Spiliopoulou, Evangelia, et al.
Published: (2025)
Investigating Thinking Behaviours of Reasoning-Based Language Models for Social Bias Mitigation
by: Luo, Guoqing, et al.
Published: (2025)
by: Luo, Guoqing, et al.
Published: (2025)
Value-Action Alignment in Large Language Models under Privacy-Prosocial Conflict
by: Chen, Guanyu, et al.
Published: (2026)
by: Chen, Guanyu, et al.
Published: (2026)
A Survey on LLM-as-a-Judge
by: Gu, Jiawei, et al.
Published: (2024)
by: Gu, Jiawei, et al.
Published: (2024)
Digital Gatekeepers: Exploring Large Language Model's Role in Immigration Decisions
by: Mao, Yicheng, et al.
Published: (2025)
by: Mao, Yicheng, et al.
Published: (2025)
Judge's Verdict: A Comprehensive Analysis of LLM Judge Capability Through Human Agreement
by: Han, Steve, et al.
Published: (2025)
by: Han, Steve, et al.
Published: (2025)
Mitigating Hallucinations in Large Language Models via Causal Reasoning
by: Li, Yuangang, et al.
Published: (2025)
by: Li, Yuangang, et al.
Published: (2025)
R-Judge: Benchmarking Safety Risk Awareness for LLM Agents
by: Yuan, Tongxin, et al.
Published: (2024)
by: Yuan, Tongxin, et al.
Published: (2024)
Think-J: Learning to Think for Generative LLM-as-a-Judge
by: Huang, Hui, et al.
Published: (2025)
by: Huang, Hui, et al.
Published: (2025)
Bayesian Calibration of Win Rate Estimation with LLM Evaluators
by: Gao, Yicheng, et al.
Published: (2024)
by: Gao, Yicheng, et al.
Published: (2024)
Detect, Investigate, Judge and Determine: A Knowledge-guided Framework for Few-shot Fake News Detection
by: Liu, Ye, et al.
Published: (2024)
by: Liu, Ye, et al.
Published: (2024)
Criterion Validity of LLM-as-Judge for Business Outcomes in Conversational Commerce
by: Chen, Liang, et al.
Published: (2026)
by: Chen, Liang, et al.
Published: (2026)
Where is the answer? Investigating Positional Bias in Language Model Knowledge Extraction
by: Saito, Kuniaki, et al.
Published: (2024)
by: Saito, Kuniaki, et al.
Published: (2024)
Do LLM Decoders Listen Fairly? Benchmarking How Language Model Priors Shape Bias in Speech Recognition
by: Ginjala, Srishti, et al.
Published: (2026)
by: Ginjala, Srishti, et al.
Published: (2026)
Full-ECE: A Metric For Token-level Calibration on Large Language Models
by: Liu, Han, et al.
Published: (2024)
by: Liu, Han, et al.
Published: (2024)
LFTF: Locating First and Then Fine-Tuning for Mitigating Gender Bias in Large Language Models
by: Qin, Zhanyue, et al.
Published: (2025)
by: Qin, Zhanyue, et al.
Published: (2025)
Similar Items
-
MedExAgent: Training LLM Agents to Ask, Examine, and Diagnose in Noisy Clinical Environments
by: Gao, Yicheng, et al.
Published: (2026) -
Quantifying and Mitigating Self-Preference Bias of LLM Judges
by: Yang, Jinming, et al.
Published: (2026) -
Judging the Judges: A Systematic Study of Position Bias in LLM-as-a-Judge
by: Shi, Lin, et al.
Published: (2024) -
Towards More Accurate US Presidential Election via Multi-step Reasoning with Large Language Models
by: Yu, Chenxiao, et al.
Published: (2024) -
BiasScope: Towards Automated Detection of Bias in LLM-as-a-Judge Evaluation
by: Lai, Peng, et al.
Published: (2026)