:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhou, Xiaolin, Luo, Zheng, Gao, Yicheng, Chen, Qixuan, Hu, Xiyang, Zhao, Yue, Liu, Ruishan
Format:	Preprint
Published:	2026
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2601.13649
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MedExAgent: Training LLM Agents to Ask, Examine, and Diagnose in Noisy Clinical Environments
by: Gao, Yicheng, et al.
Published: (2026)

Quantifying and Mitigating Self-Preference Bias of LLM Judges
by: Yang, Jinming, et al.
Published: (2026)

Judging the Judges: A Systematic Study of Position Bias in LLM-as-a-Judge
by: Shi, Lin, et al.
Published: (2024)

Towards More Accurate US Presidential Election via Multi-step Reasoning with Large Language Models
by: Yu, Chenxiao, et al.
Published: (2024)

BiasScope: Towards Automated Detection of Bias in LLM-as-a-Judge Evaluation
by: Lai, Peng, et al.
Published: (2026)

Decoding Biases: Automated Methods and LLM Judges for Gender Bias Detection in Language Models
by: Kumar, Shachi H, et al.
Published: (2024)

AD-LLM: Benchmarking Large Language Models for Anomaly Detection
by: Yang, Tiankai, et al.
Published: (2024)

MLLM-as-a-Judge: Assessing Multimodal LLM-as-a-Judge with Vision-Language Benchmark
by: Chen, Dongping, et al.
Published: (2024)

Contrastive Decoding Mitigates Score Range Bias in LLM-as-a-Judge
by: Fujinuma, Yoshinari
Published: (2025)

Multimodal Generative Engine Optimization: Rank Manipulation for Vision-Language Model Rankers
by: Du, Yixuan, et al.
Published: (2026)

Enhancing Large Language Models for Mobility Analytics with Semantic Location Tokenization
by: Chen, Yile, et al.
Published: (2025)

Mitigating Translationese Bias in Multilingual LLM-as-a-Judge via Disentangled Information Bottleneck
by: Zhang, Hongbin, et al.
Published: (2026)

StealthRank: LLM Ranking Manipulation via Stealthy Prompt Optimization
by: Tang, Yiming, et al.
Published: (2025)

BadJudge: Backdoor Vulnerabilities of LLM-as-a-Judge
by: Tong, Terry, et al.
Published: (2025)

Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge
by: Ye, Jiayi, et al.
Published: (2024)

Dynamics of Adversarial Attacks on Large Language Model-Based Search Engines
by: Hu, Xiyang
Published: (2025)

Lost in Execution: On the Multilingual Robustness of Tool Calling in Large Language Models
by: Luo, Zheng, et al.
Published: (2026)

Investigating Non-Transitivity in LLM-as-a-Judge
by: Xu, Yi, et al.
Published: (2025)

Benchmarking Adversarial Robustness to Bias Elicitation in Large Language Models: Scalable Automated Assessment with LLM-as-a-Judge
by: Cantini, Riccardo, et al.
Published: (2025)

TrustJudge: Inconsistencies of LLM-as-a-Judge and How to Alleviate Them
by: Wang, Yidong, et al.
Published: (2025)

HarmMetric Eval: Benchmarking Metrics and Judges for LLM Harmfulness Assessment
by: Yang, Langqi, et al.
Published: (2025)

Persuasiveness and Bias in LLM: Investigating the Impact of Persuasiveness and Reinforcement of Bias in Language Models
by: Roy, Saumya
Published: (2025)

Counterfactual Trace Auditing of LLM Agent Skills
by: Zhou, Xiaolin, et al.
Published: (2026)

RLearner-LLM: Balancing Logical Grounding and Fluency in Large Language Models via Hybrid Direct Preference Optimization
by: Bao, Qiming, et al.
Published: (2026)

Play Favorites: A Statistical Method to Measure Self-Bias in LLM-as-a-Judge
by: Spiliopoulou, Evangelia, et al.
Published: (2025)

Investigating Thinking Behaviours of Reasoning-Based Language Models for Social Bias Mitigation
by: Luo, Guoqing, et al.
Published: (2025)

Value-Action Alignment in Large Language Models under Privacy-Prosocial Conflict
by: Chen, Guanyu, et al.
Published: (2026)

A Survey on LLM-as-a-Judge
by: Gu, Jiawei, et al.
Published: (2024)

Digital Gatekeepers: Exploring Large Language Model's Role in Immigration Decisions
by: Mao, Yicheng, et al.
Published: (2025)

Judge's Verdict: A Comprehensive Analysis of LLM Judge Capability Through Human Agreement
by: Han, Steve, et al.
Published: (2025)

Mitigating Hallucinations in Large Language Models via Causal Reasoning
by: Li, Yuangang, et al.
Published: (2025)

R-Judge: Benchmarking Safety Risk Awareness for LLM Agents
by: Yuan, Tongxin, et al.
Published: (2024)

Think-J: Learning to Think for Generative LLM-as-a-Judge
by: Huang, Hui, et al.
Published: (2025)

Bayesian Calibration of Win Rate Estimation with LLM Evaluators
by: Gao, Yicheng, et al.
Published: (2024)

Detect, Investigate, Judge and Determine: A Knowledge-guided Framework for Few-shot Fake News Detection
by: Liu, Ye, et al.
Published: (2024)

Criterion Validity of LLM-as-Judge for Business Outcomes in Conversational Commerce
by: Chen, Liang, et al.
Published: (2026)

Where is the answer? Investigating Positional Bias in Language Model Knowledge Extraction
by: Saito, Kuniaki, et al.
Published: (2024)

Do LLM Decoders Listen Fairly? Benchmarking How Language Model Priors Shape Bias in Speech Recognition
by: Ginjala, Srishti, et al.
Published: (2026)

Full-ECE: A Metric For Token-level Calibration on Large Language Models
by: Liu, Han, et al.
Published: (2024)

LFTF: Locating First and Then Fine-Tuning for Mitigating Gender Bias in Large Language Models
by: Qin, Zhanyue, et al.
Published: (2025)