:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Lee, Dongryeol, Hwang, Yerin, Kim, Yongil, Park, Joonsuk, Jung, Kyomin
Format:	Preprint
Published:	2024
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2410.20774
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Can You Trick the Grader? Adversarial Persuasion of LLM Judges
by: Hwang, Yerin, et al.
Published: (2025)

Don't Judge Code by Its Cover: Exploring Biases in LLM Judges for Code Evaluation
by: Moon, Jiwon, et al.
Published: (2025)

Judging Against the Reference: Uncovering Knowledge-Driven Failures in LLM-Judges on QA Evaluation
by: Lee, Dongryeol, et al.
Published: (2026)

When Wording Steers the Evaluation: Framing Bias in LLM judges
by: Hwang, Yerin, et al.
Published: (2026)

Fooling the LVLM Judges: Visual Biases in LVLM-Based Evaluation
by: Hwang, Yerin, et al.
Published: (2025)

Return of EM: Entity-driven Answer Set Expansion for QA Evaluation
by: Lee, Dongryeol, et al.
Published: (2024)

SWITCH: Studying with Teacher for Knowledge Distillation of Large Language Models
by: Koo, Jahyun, et al.
Published: (2024)

MP2D: An Automated Topic Shift Dialogue Generation Framework Leveraging Knowledge Graphs
by: Hwang, Yerin, et al.
Published: (2024)

LLMs can be easily Confused by Instructional Distractions
by: Hwang, Yerin, et al.
Published: (2025)

AdvisorQA: Towards Helpful and Harmless Advice-seeking Question Answering with Collective Intelligence
by: Kim, Minbeom, et al.
Published: (2024)

LifeTox: Unveiling Implicit Toxicity in Life Advice
by: Kim, Minbeom, et al.
Published: (2023)

Mitigating Hallucinations in Large Vision-Language Models via Summary-Guided Decoding
by: Min, Kyungmin, et al.
Published: (2024)

VLind-Bench: Measuring Language Priors in Large Vision-Language Models
by: Lee, Kang-il, et al.
Published: (2024)

Program Synthesis via Test-Time Transduction
by: Lee, Kang-il, et al.
Published: (2025)

Reliability-Aware Adaptive Self-Consistency for Efficient Sampling in LLM Reasoning
by: Kim, Junseok, et al.
Published: (2026)

A Character-Centric Creative Story Generation via Imagination
by: Park, Kyeongman, et al.
Published: (2024)

Can LLMs Recognize Toxicity? A Structured Investigation Framework and Toxicity Metric
by: Koh, Hyukhun, et al.
Published: (2024)

Knowledge Beyond Language: Bridging the Gap in Multilingual Machine Unlearning Evaluation
by: Hwang, Kyomin, et al.
Published: (2026)

Hierarchical Deconstruction of LLM Reasoning: A Graph-Based Framework for Analyzing Knowledge Utilization
by: Ko, Miyoung, et al.
Published: (2024)

ReflAct: World-Grounded Decision Making in LLM Agents via Goal-State Reflection
by: Kim, Jeonghye, et al.
Published: (2025)

Retrieval-Augmented Generation Based Nurse Observation Extraction
by: Hwang, Kyomin, et al.
Published: (2026)

Is LLM-as-a-Judge Robust? Investigating Universal Adversarial Attacks on Zero-shot LLM Assessment
by: Raina, Vyas, et al.
Published: (2024)

Analyzing Uncertainty of LLM-as-a-Judge: Interval Evaluations with Conformal Prediction
by: Sheng, Huanxin, et al.
Published: (2025)

Beyond Gold Standards: Epistemic Ensemble of LLM Judges for Formal Mathematical Reasoning
by: Zhang, Lan, et al.
Published: (2025)

CyclicJudge: Mitigating Judge Bias Efficiently in LLM-based Evaluation
by: Zhu, Ziyi, et al.
Published: (2026)

Fine-grained Gender Control in Machine Translation with Large Language Models
by: Lee, Minwoo, et al.
Published: (2024)

Black-Box Hallucination Detection via Consistency Under the Uncertain Expression
by: Joo, Seongho, et al.
Published: (2025)

LongStory: Coherent, Complete and Length Controlled Long story Generation
by: Park, Kyeongman, et al.
Published: (2023)

Avoidance Decoding for Diverse Multi-Branch Story Generation
by: Park, Kyeongman, et al.
Published: (2025)

A Universal Avoidance Method for Diverse Multi-branch Generation
by: Park, Kyeongman, et al.
Published: (2026)

Persona is a Double-edged Sword: Mitigating the Negative Impact of Role-playing Prompts in Zero-shot Reasoning Tasks
by: Kim, Junseok, et al.
Published: (2024)

Persona Switch: Mixing Distinct Perspectives in Decoding Time
by: Kim, Junseok, et al.
Published: (2026)

Do not think about pink elephant!
by: Hwang, Kyomin, et al.
Published: (2024)

JudgeBench: A Benchmark for Evaluating LLM-based Judges
by: Tan, Sijun, et al.
Published: (2024)

FaithUn: Toward Faithful Forgetting in Language Models by Investigating the Interconnectedness of Knowledge
by: Yang, Nakyeong, et al.
Published: (2025)

Revisiting Epistemic Markers in Confidence Estimation: Can Markers Accurately Reflect Large Language Models' Uncertainty?
by: Liu, Jiayu, et al.
Published: (2025)

How to Correctly Report LLM-as-a-Judge Evaluations
by: Lee, Chungpa, et al.
Published: (2025)

How You Ask Matters! Adaptive RAG Robustness to Query Variations
by: Jang, Yunah, et al.
Published: (2026)

Toward Robust LLM-Based Judges: Taxonomic Bias Evaluation and Debiasing Optimization
by: Zhou, Hongli, et al.
Published: (2026)

Confidence-Guided Stepwise Model Routing for Cost-Efficient Reasoning
by: Lee, Sangmook, et al.
Published: (2025)