:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Oh, Myeong Seok, Kim, Dong-Yun, Oh, Hanseok, Kang, Chaean, Kang, Joeun, Wang, Xiaonan, Park, Hyunjung, Jung, Young Cheol, Kim, Hansaem
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2604.21211
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Block-wise Codeword Embedding for Reliable Multi-bit Text Watermarking
by: Kim, Joeun, et al.
Published: (2026)

KULTURE Bench: A Benchmark for Assessing Language Model in Korean Cultural Context
by: Wang, Xiaonan, et al.
Published: (2024)

Toward Reliable VLM: A Fine-Grained Benchmark and Framework for Exposure, Bias, and Inference in Korean Street Views
by: Wang, Xiaonan, et al.
Published: (2025)

KTRL+F: Knowledge-Augmented In-Document Search
by: Oh, Hanseok, et al.
Published: (2023)

Are they lovers or friends? Evaluating LLMs' Social Reasoning in English and Korean Dialogues
by: Kim, Eunsu, et al.
Published: (2025)

PerMix-RLVR: Preserving Persona Expressivity under Verifiable-Reward Alignment
by: Oh, Jihwan, et al.
Published: (2026)

The Generative AI Paradox on Evaluation: What It Can Solve, It May Not Evaluate
by: Oh, Juhyun, et al.
Published: (2024)

On the Effect of Uncertainty on Layer-wise Inference Dynamics
by: Kim, Sunwoo, et al.
Published: (2025)

AutoMedic: An Automated Evaluation Framework for Clinical Conversational Agents with Medical Dataset Grounding
by: Oh, Gyutaek, et al.
Published: (2025)

Perceptions to Beliefs: Exploring Precursory Inferences for Theory of Mind in Large Language Models
by: Jung, Chani, et al.
Published: (2024)

BAPO: Base-Anchored Preference Optimization for Overcoming Forgetting in Large Language Models Personalization
by: Lee, Gihun, et al.
Published: (2024)

Flex-TravelPlanner: A Benchmark for Flexible Planning with Language Agents
by: Oh, Juhyun, et al.
Published: (2025)

Designing and Evaluating Multi-Chatbot Interface for Human-AI Communication: Preliminary Findings from a Persuasion Task
by: Yoon, Sion, et al.
Published: (2024)

Gated Low-rank Adaptation for personalized Code-Switching Automatic Speech Recognition on the low-spec devices
by: Kim, Gwantae, et al.
Published: (2024)

RoleConflictBench: A Benchmark of Role Conflict Scenarios for Evaluating LLMs' Contextual Sensitivity
by: Shin, Jisu, et al.
Published: (2025)

INSTRUCTIR: A Benchmark for Instruction Following of Information Retrieval Models
by: Oh, Hanseok, et al.
Published: (2024)

Social Bias Benchmark for Generation: A Comparison of Generation and QA-Based Evaluations
by: Jin, Jiho, et al.
Published: (2025)

Optimizing Language Augmentation for Multilingual Large Language Models: A Case Study on Korean
by: Choi, ChangSu, et al.
Published: (2024)

Denoising Table-Text Retrieval for Open-Domain Question Answering
by: Kang, Deokhyung, et al.
Published: (2024)

Style Extraction on Text Embeddings Using VAE and Parallel Dataset
by: Kong, InJin, et al.
Published: (2025)

Multi-FAct: Assessing Factuality of Multilingual LLMs using FActScore
by: Shafayat, Sheikh, et al.
Published: (2024)

Mitigating the Linguistic Gap with Phonemic Representations for Robust Cross-lingual Transfer
by: Jung, Haeji, et al.
Published: (2024)

Spotting Out-of-Character Behavior: Atomic-Level Evaluation of Persona Fidelity in Open-Ended Generation
by: Shin, Jisu, et al.
Published: (2025)

PapersPlease: A Benchmark for Evaluating Motivational Values of Large Language Models Based on ERG Theory
by: Myung, Junho, et al.
Published: (2025)

Successful repigmentation of hypopigmented scars with micropunch grafting with a skin‐seeding technique
by: Dong Seok Kim, et al.
Published: (2024)

TARDiS : Text Augmentation for Refining Diversity and Separability
by: Kim, Kyungmin, et al.
Published: (2025)

AIAP: A No-Code Workflow Builder for Non-Experts with Natural Language and Multi-Agent Collaboration
by: An, Hyunjn, et al.
Published: (2025)

Rethinking Test-Time Scaling for Medical AI: Model and Task-Aware Strategies for LLMs and VLMs
by: Oh, Gyutaek, et al.
Published: (2025)

MentalBench: A DSM-Grounded Benchmark for Evaluating Psychiatric Diagnostic Capability of Large Language Models
by: Song, Hoyun, et al.
Published: (2026)

MEME: Multi-entity & Evolving Memory Evaluation
by: Jung, Seokwon, et al.
Published: (2026)

GECKO: Generative Language Model for English, Code and Korean
by: Oh, Sungwoo, et al.
Published: (2024)

Flex-Judge: Text-Only Reasoning Unleashes Zero-Shot Multimodal Evaluators
by: Ko, Jongwoo, et al.
Published: (2025)

LoCar: Localization-Aware Evaluation of In-Vehicle Assistants through Fine-Grained Sociolinguistic Control
by: Jeong, Seogyeong, et al.
Published: (2026)

Toward Evaluative Thinking: Meta Policy Optimization with Evolving Reward Models
by: Kim, Zae Myung, et al.
Published: (2025)

References Indeed Matter? Reference-Free Preference Optimization for Conversational Query Reformulation
by: Kim, Doyoung, et al.
Published: (2025)

Multi-Drafter Speculative Decoding with Alignment Feedback
by: Kim, Taehyeon, et al.
Published: (2026)

LLM-as-an-Interviewer: Beyond Static Testing Through Dynamic LLM Evaluation
by: Kim, Eunsu, et al.
Published: (2024)

Degradation of 2,4‐dinitrotoluene by iron sulfide/manganese sulfide–biochar composites in the presence of persulfate and hydrogen peroxide
by: Seok‐Young Oh, et al.
Published: (2025)

Instruction Tuning with and without Context: Behavioral Shifts and Downstream Impact
by: Lee, Hyunji, et al.
Published: (2025)

MATE: Meet At The Embedding -- Connecting Images with Long Texts
by: Jang, Young Kyun, et al.
Published: (2024)