Saved in:
| Main Authors: | Lee, Dongryeol, Hwang, Yerin, Kim, Yongil, Park, Joonsuk, Jung, Kyomin |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2410.20774 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Can You Trick the Grader? Adversarial Persuasion of LLM Judges
by: Hwang, Yerin, et al.
Published: (2025)
by: Hwang, Yerin, et al.
Published: (2025)
Don't Judge Code by Its Cover: Exploring Biases in LLM Judges for Code Evaluation
by: Moon, Jiwon, et al.
Published: (2025)
by: Moon, Jiwon, et al.
Published: (2025)
Judging Against the Reference: Uncovering Knowledge-Driven Failures in LLM-Judges on QA Evaluation
by: Lee, Dongryeol, et al.
Published: (2026)
by: Lee, Dongryeol, et al.
Published: (2026)
When Wording Steers the Evaluation: Framing Bias in LLM judges
by: Hwang, Yerin, et al.
Published: (2026)
by: Hwang, Yerin, et al.
Published: (2026)
Fooling the LVLM Judges: Visual Biases in LVLM-Based Evaluation
by: Hwang, Yerin, et al.
Published: (2025)
by: Hwang, Yerin, et al.
Published: (2025)
Return of EM: Entity-driven Answer Set Expansion for QA Evaluation
by: Lee, Dongryeol, et al.
Published: (2024)
by: Lee, Dongryeol, et al.
Published: (2024)
SWITCH: Studying with Teacher for Knowledge Distillation of Large Language Models
by: Koo, Jahyun, et al.
Published: (2024)
by: Koo, Jahyun, et al.
Published: (2024)
MP2D: An Automated Topic Shift Dialogue Generation Framework Leveraging Knowledge Graphs
by: Hwang, Yerin, et al.
Published: (2024)
by: Hwang, Yerin, et al.
Published: (2024)
LLMs can be easily Confused by Instructional Distractions
by: Hwang, Yerin, et al.
Published: (2025)
by: Hwang, Yerin, et al.
Published: (2025)
AdvisorQA: Towards Helpful and Harmless Advice-seeking Question Answering with Collective Intelligence
by: Kim, Minbeom, et al.
Published: (2024)
by: Kim, Minbeom, et al.
Published: (2024)
LifeTox: Unveiling Implicit Toxicity in Life Advice
by: Kim, Minbeom, et al.
Published: (2023)
by: Kim, Minbeom, et al.
Published: (2023)
Mitigating Hallucinations in Large Vision-Language Models via Summary-Guided Decoding
by: Min, Kyungmin, et al.
Published: (2024)
by: Min, Kyungmin, et al.
Published: (2024)
VLind-Bench: Measuring Language Priors in Large Vision-Language Models
by: Lee, Kang-il, et al.
Published: (2024)
by: Lee, Kang-il, et al.
Published: (2024)
Program Synthesis via Test-Time Transduction
by: Lee, Kang-il, et al.
Published: (2025)
by: Lee, Kang-il, et al.
Published: (2025)
Reliability-Aware Adaptive Self-Consistency for Efficient Sampling in LLM Reasoning
by: Kim, Junseok, et al.
Published: (2026)
by: Kim, Junseok, et al.
Published: (2026)
A Character-Centric Creative Story Generation via Imagination
by: Park, Kyeongman, et al.
Published: (2024)
by: Park, Kyeongman, et al.
Published: (2024)
Can LLMs Recognize Toxicity? A Structured Investigation Framework and Toxicity Metric
by: Koh, Hyukhun, et al.
Published: (2024)
by: Koh, Hyukhun, et al.
Published: (2024)
Knowledge Beyond Language: Bridging the Gap in Multilingual Machine Unlearning Evaluation
by: Hwang, Kyomin, et al.
Published: (2026)
by: Hwang, Kyomin, et al.
Published: (2026)
Hierarchical Deconstruction of LLM Reasoning: A Graph-Based Framework for Analyzing Knowledge Utilization
by: Ko, Miyoung, et al.
Published: (2024)
by: Ko, Miyoung, et al.
Published: (2024)
ReflAct: World-Grounded Decision Making in LLM Agents via Goal-State Reflection
by: Kim, Jeonghye, et al.
Published: (2025)
by: Kim, Jeonghye, et al.
Published: (2025)
Retrieval-Augmented Generation Based Nurse Observation Extraction
by: Hwang, Kyomin, et al.
Published: (2026)
by: Hwang, Kyomin, et al.
Published: (2026)
Is LLM-as-a-Judge Robust? Investigating Universal Adversarial Attacks on Zero-shot LLM Assessment
by: Raina, Vyas, et al.
Published: (2024)
by: Raina, Vyas, et al.
Published: (2024)
Analyzing Uncertainty of LLM-as-a-Judge: Interval Evaluations with Conformal Prediction
by: Sheng, Huanxin, et al.
Published: (2025)
by: Sheng, Huanxin, et al.
Published: (2025)
Beyond Gold Standards: Epistemic Ensemble of LLM Judges for Formal Mathematical Reasoning
by: Zhang, Lan, et al.
Published: (2025)
by: Zhang, Lan, et al.
Published: (2025)
CyclicJudge: Mitigating Judge Bias Efficiently in LLM-based Evaluation
by: Zhu, Ziyi, et al.
Published: (2026)
by: Zhu, Ziyi, et al.
Published: (2026)
Fine-grained Gender Control in Machine Translation with Large Language Models
by: Lee, Minwoo, et al.
Published: (2024)
by: Lee, Minwoo, et al.
Published: (2024)
Black-Box Hallucination Detection via Consistency Under the Uncertain Expression
by: Joo, Seongho, et al.
Published: (2025)
by: Joo, Seongho, et al.
Published: (2025)
LongStory: Coherent, Complete and Length Controlled Long story Generation
by: Park, Kyeongman, et al.
Published: (2023)
by: Park, Kyeongman, et al.
Published: (2023)
Avoidance Decoding for Diverse Multi-Branch Story Generation
by: Park, Kyeongman, et al.
Published: (2025)
by: Park, Kyeongman, et al.
Published: (2025)
A Universal Avoidance Method for Diverse Multi-branch Generation
by: Park, Kyeongman, et al.
Published: (2026)
by: Park, Kyeongman, et al.
Published: (2026)
Persona is a Double-edged Sword: Mitigating the Negative Impact of Role-playing Prompts in Zero-shot Reasoning Tasks
by: Kim, Junseok, et al.
Published: (2024)
by: Kim, Junseok, et al.
Published: (2024)
Persona Switch: Mixing Distinct Perspectives in Decoding Time
by: Kim, Junseok, et al.
Published: (2026)
by: Kim, Junseok, et al.
Published: (2026)
Do not think about pink elephant!
by: Hwang, Kyomin, et al.
Published: (2024)
by: Hwang, Kyomin, et al.
Published: (2024)
JudgeBench: A Benchmark for Evaluating LLM-based Judges
by: Tan, Sijun, et al.
Published: (2024)
by: Tan, Sijun, et al.
Published: (2024)
FaithUn: Toward Faithful Forgetting in Language Models by Investigating the Interconnectedness of Knowledge
by: Yang, Nakyeong, et al.
Published: (2025)
by: Yang, Nakyeong, et al.
Published: (2025)
Revisiting Epistemic Markers in Confidence Estimation: Can Markers Accurately Reflect Large Language Models' Uncertainty?
by: Liu, Jiayu, et al.
Published: (2025)
by: Liu, Jiayu, et al.
Published: (2025)
How to Correctly Report LLM-as-a-Judge Evaluations
by: Lee, Chungpa, et al.
Published: (2025)
by: Lee, Chungpa, et al.
Published: (2025)
How You Ask Matters! Adaptive RAG Robustness to Query Variations
by: Jang, Yunah, et al.
Published: (2026)
by: Jang, Yunah, et al.
Published: (2026)
Toward Robust LLM-Based Judges: Taxonomic Bias Evaluation and Debiasing Optimization
by: Zhou, Hongli, et al.
Published: (2026)
by: Zhou, Hongli, et al.
Published: (2026)
Confidence-Guided Stepwise Model Routing for Cost-Efficient Reasoning
by: Lee, Sangmook, et al.
Published: (2025)
by: Lee, Sangmook, et al.
Published: (2025)
Similar Items
-
Can You Trick the Grader? Adversarial Persuasion of LLM Judges
by: Hwang, Yerin, et al.
Published: (2025) -
Don't Judge Code by Its Cover: Exploring Biases in LLM Judges for Code Evaluation
by: Moon, Jiwon, et al.
Published: (2025) -
Judging Against the Reference: Uncovering Knowledge-Driven Failures in LLM-Judges on QA Evaluation
by: Lee, Dongryeol, et al.
Published: (2026) -
When Wording Steers the Evaluation: Framing Bias in LLM judges
by: Hwang, Yerin, et al.
Published: (2026) -
Fooling the LVLM Judges: Visual Biases in LVLM-Based Evaluation
by: Hwang, Yerin, et al.
Published: (2025)