Saved in:
| Main Authors: | Miao, Yongliang, Liang, Yangyang, Du, Mengnan |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.08097 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
NeuronScope: A Multi-Agent Framework for Explaining Polysemantic Neurons in Language Models
by: Liu, Weiqi, et al.
Published: (2026)
by: Liu, Weiqi, et al.
Published: (2026)
Debate Helps Weak Judges Reward Stronger Models
by: Elasky, Ethan, et al.
Published: (2026)
by: Elasky, Ethan, et al.
Published: (2026)
JudgeRLVR: Judge First, Generate Second for Efficient Reasoning
by: Duo, Jiangshan, et al.
Published: (2026)
by: Duo, Jiangshan, et al.
Published: (2026)
AutoJudge: Judge Decoding Without Manual Annotation
by: Garipov, Roman, et al.
Published: (2025)
by: Garipov, Roman, et al.
Published: (2025)
Judge Circuits
by: Feldhus, Nils, et al.
Published: (2026)
by: Feldhus, Nils, et al.
Published: (2026)
Beyond Scalar Reward Model: Learning Generative Judge from Preference Data
by: Ye, Ziyi, et al.
Published: (2024)
by: Ye, Ziyi, et al.
Published: (2024)
Quantitative LLM Judges
by: Sahoo, Aishwarya, et al.
Published: (2025)
by: Sahoo, Aishwarya, et al.
Published: (2025)
MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?
by: Chen, Zhaorun, et al.
Published: (2024)
by: Chen, Zhaorun, et al.
Published: (2024)
Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators
by: Zhou, Yilun, et al.
Published: (2025)
by: Zhou, Yilun, et al.
Published: (2025)
JudgeBench: A Benchmark for Evaluating LLM-based Judges
by: Tan, Sijun, et al.
Published: (2024)
by: Tan, Sijun, et al.
Published: (2024)
Judge Decoding: Faster Speculative Sampling Requires Going Beyond Model Alignment
by: Bachmann, Gregor, et al.
Published: (2025)
by: Bachmann, Gregor, et al.
Published: (2025)
Rethinking LLM-as-a-Judge: Representation-as-a-Judge with Small Language Models via Semantic Capacity Asymmetry
by: Li, Zhuochun, et al.
Published: (2026)
by: Li, Zhuochun, et al.
Published: (2026)
One Token to Fool LLM-as-a-Judge
by: Zhao, Yulai, et al.
Published: (2025)
by: Zhao, Yulai, et al.
Published: (2025)
JAF: Judge Agent Forest
by: Garg, Sahil, et al.
Published: (2026)
by: Garg, Sahil, et al.
Published: (2026)
Does Context Matter? ContextualJudgeBench for Evaluating LLM-based Judges in Contextual Settings
by: Xu, Austin, et al.
Published: (2025)
by: Xu, Austin, et al.
Published: (2025)
AdaLRS: Loss-Guided Adaptive Learning Rate Search for Efficient Foundation Model Pretraining
by: Dong, Hongyuan, et al.
Published: (2025)
by: Dong, Hongyuan, et al.
Published: (2025)
SAE-FiRE: Enhancing Earnings Surprise Predictions Through Sparse Autoencoder Feature Selection
by: Zhang, Huopu, et al.
Published: (2025)
by: Zhang, Huopu, et al.
Published: (2025)
The Judge Variable: Challenging Judge-Agnostic Legal Judgment Prediction
by: Zambrano, Guillaume
Published: (2025)
by: Zambrano, Guillaume
Published: (2025)
Reasoning Is Not Free: Robust Adaptive Cost-Efficient Routing for LLM-as-a-Judge
by: Zhang, Wenbo, et al.
Published: (2026)
by: Zhang, Wenbo, et al.
Published: (2026)
AdvJudge-Zero: Binary Decision Flips in LLM-as-a-Judge via Adversarial Control Tokens
by: Li, Tung-Ling, et al.
Published: (2025)
by: Li, Tung-Ling, et al.
Published: (2025)
Learning to Judge: LLMs Designing and Applying Evaluation Rubrics
by: Siro, Clemencia, et al.
Published: (2026)
by: Siro, Clemencia, et al.
Published: (2026)
Comparative Analysis of Demonstration Selection Algorithms for LLM In-Context Learning
by: Shu, Dong, et al.
Published: (2024)
by: Shu, Dong, et al.
Published: (2024)
AdaSplash: Adaptive Sparse Flash Attention
by: Gonçalves, Nuno, et al.
Published: (2025)
by: Gonçalves, Nuno, et al.
Published: (2025)
Approximating Human Preferences Using a Multi-Judge Learned System
by: Sprejer, Eitán, et al.
Published: (2025)
by: Sprejer, Eitán, et al.
Published: (2025)
Benchmarks Saturate When The Model Gets Smarter Than The Judge
by: Ballon, Marthe, et al.
Published: (2026)
by: Ballon, Marthe, et al.
Published: (2026)
CodeJudge: Evaluating Code Generation with Large Language Models
by: Tong, Weixi, et al.
Published: (2024)
by: Tong, Weixi, et al.
Published: (2024)
Reinforcement Learning-based Knowledge Distillation with LLM-as-a-Judge
by: Shen, Yiyang, et al.
Published: (2026)
by: Shen, Yiyang, et al.
Published: (2026)
Gained in Translation: Privileged Pairwise Judges Enhance Multilingual Reasoning
by: Sutawika, Lintang, et al.
Published: (2026)
by: Sutawika, Lintang, et al.
Published: (2026)
Trust or Escalate: LLM Judges with Provable Guarantees for Human Agreement
by: Jung, Jaehun, et al.
Published: (2024)
by: Jung, Jaehun, et al.
Published: (2024)
How to Correctly Report LLM-as-a-Judge Evaluations
by: Lee, Chungpa, et al.
Published: (2025)
by: Lee, Chungpa, et al.
Published: (2025)
ProfBench: Multi-Domain Rubrics requiring Professional Knowledge to Answer and Judge
by: Wang, Zhilin, et al.
Published: (2025)
by: Wang, Zhilin, et al.
Published: (2025)
On Evaluating LLM Alignment by Evaluating LLMs as Judges
by: Liu, Yixin, et al.
Published: (2025)
by: Liu, Yixin, et al.
Published: (2025)
The Perfect Blend: Redefining RLHF with Mixture of Judges
by: Xu, Tengyu, et al.
Published: (2024)
by: Xu, Tengyu, et al.
Published: (2024)
Investigating Non-Transitivity in LLM-as-a-Judge
by: Xu, Yi, et al.
Published: (2025)
by: Xu, Yi, et al.
Published: (2025)
Strategic Demonstration Selection for Improved Fairness in LLM In-Context Learning
by: Hu, Jingyu, et al.
Published: (2024)
by: Hu, Jingyu, et al.
Published: (2024)
Emoji Attack: Enhancing Jailbreak Attacks Against Judge LLM Detection
by: Wei, Zhipeng, et al.
Published: (2024)
by: Wei, Zhipeng, et al.
Published: (2024)
Training an LLM-as-a-Judge Model: Pipeline, Insights, and Practical Lessons
by: Hu, Renjun, et al.
Published: (2025)
by: Hu, Renjun, et al.
Published: (2025)
AdaLomo: Low-memory Optimization with Adaptive Learning Rate
by: Lv, Kai, et al.
Published: (2023)
by: Lv, Kai, et al.
Published: (2023)
Build, Judge, Optimize: A Blueprint for Continuous Improvement of Multi-Agent Consumer Assistants
by: Herrera, Alejandro Breen, et al.
Published: (2026)
by: Herrera, Alejandro Breen, et al.
Published: (2026)
Quantifying and Mitigating Self-Preference Bias of LLM Judges
by: Yang, Jinming, et al.
Published: (2026)
by: Yang, Jinming, et al.
Published: (2026)
Similar Items
-
NeuronScope: A Multi-Agent Framework for Explaining Polysemantic Neurons in Language Models
by: Liu, Weiqi, et al.
Published: (2026) -
Debate Helps Weak Judges Reward Stronger Models
by: Elasky, Ethan, et al.
Published: (2026) -
JudgeRLVR: Judge First, Generate Second for Efficient Reasoning
by: Duo, Jiangshan, et al.
Published: (2026) -
AutoJudge: Judge Decoding Without Manual Annotation
by: Garipov, Roman, et al.
Published: (2025) -
Judge Circuits
by: Feldhus, Nils, et al.
Published: (2026)