Saved in:
| Main Authors: | Oh, Myeong Seok, Kim, Dong-Yun, Oh, Hanseok, Kang, Chaean, Kang, Joeun, Wang, Xiaonan, Park, Hyunjung, Jung, Young Cheol, Kim, Hansaem |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.21211 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Block-wise Codeword Embedding for Reliable Multi-bit Text Watermarking
by: Kim, Joeun, et al.
Published: (2026)
by: Kim, Joeun, et al.
Published: (2026)
KULTURE Bench: A Benchmark for Assessing Language Model in Korean Cultural Context
by: Wang, Xiaonan, et al.
Published: (2024)
by: Wang, Xiaonan, et al.
Published: (2024)
Toward Reliable VLM: A Fine-Grained Benchmark and Framework for Exposure, Bias, and Inference in Korean Street Views
by: Wang, Xiaonan, et al.
Published: (2025)
by: Wang, Xiaonan, et al.
Published: (2025)
KTRL+F: Knowledge-Augmented In-Document Search
by: Oh, Hanseok, et al.
Published: (2023)
by: Oh, Hanseok, et al.
Published: (2023)
Are they lovers or friends? Evaluating LLMs' Social Reasoning in English and Korean Dialogues
by: Kim, Eunsu, et al.
Published: (2025)
by: Kim, Eunsu, et al.
Published: (2025)
PerMix-RLVR: Preserving Persona Expressivity under Verifiable-Reward Alignment
by: Oh, Jihwan, et al.
Published: (2026)
by: Oh, Jihwan, et al.
Published: (2026)
The Generative AI Paradox on Evaluation: What It Can Solve, It May Not Evaluate
by: Oh, Juhyun, et al.
Published: (2024)
by: Oh, Juhyun, et al.
Published: (2024)
On the Effect of Uncertainty on Layer-wise Inference Dynamics
by: Kim, Sunwoo, et al.
Published: (2025)
by: Kim, Sunwoo, et al.
Published: (2025)
AutoMedic: An Automated Evaluation Framework for Clinical Conversational Agents with Medical Dataset Grounding
by: Oh, Gyutaek, et al.
Published: (2025)
by: Oh, Gyutaek, et al.
Published: (2025)
Perceptions to Beliefs: Exploring Precursory Inferences for Theory of Mind in Large Language Models
by: Jung, Chani, et al.
Published: (2024)
by: Jung, Chani, et al.
Published: (2024)
BAPO: Base-Anchored Preference Optimization for Overcoming Forgetting in Large Language Models Personalization
by: Lee, Gihun, et al.
Published: (2024)
by: Lee, Gihun, et al.
Published: (2024)
Flex-TravelPlanner: A Benchmark for Flexible Planning with Language Agents
by: Oh, Juhyun, et al.
Published: (2025)
by: Oh, Juhyun, et al.
Published: (2025)
Designing and Evaluating Multi-Chatbot Interface for Human-AI Communication: Preliminary Findings from a Persuasion Task
by: Yoon, Sion, et al.
Published: (2024)
by: Yoon, Sion, et al.
Published: (2024)
Gated Low-rank Adaptation for personalized Code-Switching Automatic Speech Recognition on the low-spec devices
by: Kim, Gwantae, et al.
Published: (2024)
by: Kim, Gwantae, et al.
Published: (2024)
RoleConflictBench: A Benchmark of Role Conflict Scenarios for Evaluating LLMs' Contextual Sensitivity
by: Shin, Jisu, et al.
Published: (2025)
by: Shin, Jisu, et al.
Published: (2025)
INSTRUCTIR: A Benchmark for Instruction Following of Information Retrieval Models
by: Oh, Hanseok, et al.
Published: (2024)
by: Oh, Hanseok, et al.
Published: (2024)
Social Bias Benchmark for Generation: A Comparison of Generation and QA-Based Evaluations
by: Jin, Jiho, et al.
Published: (2025)
by: Jin, Jiho, et al.
Published: (2025)
Optimizing Language Augmentation for Multilingual Large Language Models: A Case Study on Korean
by: Choi, ChangSu, et al.
Published: (2024)
by: Choi, ChangSu, et al.
Published: (2024)
Denoising Table-Text Retrieval for Open-Domain Question Answering
by: Kang, Deokhyung, et al.
Published: (2024)
by: Kang, Deokhyung, et al.
Published: (2024)
Style Extraction on Text Embeddings Using VAE and Parallel Dataset
by: Kong, InJin, et al.
Published: (2025)
by: Kong, InJin, et al.
Published: (2025)
Multi-FAct: Assessing Factuality of Multilingual LLMs using FActScore
by: Shafayat, Sheikh, et al.
Published: (2024)
by: Shafayat, Sheikh, et al.
Published: (2024)
Mitigating the Linguistic Gap with Phonemic Representations for Robust Cross-lingual Transfer
by: Jung, Haeji, et al.
Published: (2024)
by: Jung, Haeji, et al.
Published: (2024)
Spotting Out-of-Character Behavior: Atomic-Level Evaluation of Persona Fidelity in Open-Ended Generation
by: Shin, Jisu, et al.
Published: (2025)
by: Shin, Jisu, et al.
Published: (2025)
PapersPlease: A Benchmark for Evaluating Motivational Values of Large Language Models Based on ERG Theory
by: Myung, Junho, et al.
Published: (2025)
by: Myung, Junho, et al.
Published: (2025)
Successful repigmentation of hypopigmented scars with micropunch grafting with a skin‐seeding technique
by: Dong Seok Kim, et al.
Published: (2024)
by: Dong Seok Kim, et al.
Published: (2024)
TARDiS : Text Augmentation for Refining Diversity and Separability
by: Kim, Kyungmin, et al.
Published: (2025)
by: Kim, Kyungmin, et al.
Published: (2025)
AIAP: A No-Code Workflow Builder for Non-Experts with Natural Language and Multi-Agent Collaboration
by: An, Hyunjn, et al.
Published: (2025)
by: An, Hyunjn, et al.
Published: (2025)
Rethinking Test-Time Scaling for Medical AI: Model and Task-Aware Strategies for LLMs and VLMs
by: Oh, Gyutaek, et al.
Published: (2025)
by: Oh, Gyutaek, et al.
Published: (2025)
MentalBench: A DSM-Grounded Benchmark for Evaluating Psychiatric Diagnostic Capability of Large Language Models
by: Song, Hoyun, et al.
Published: (2026)
by: Song, Hoyun, et al.
Published: (2026)
MEME: Multi-entity & Evolving Memory Evaluation
by: Jung, Seokwon, et al.
Published: (2026)
by: Jung, Seokwon, et al.
Published: (2026)
GECKO: Generative Language Model for English, Code and Korean
by: Oh, Sungwoo, et al.
Published: (2024)
by: Oh, Sungwoo, et al.
Published: (2024)
Flex-Judge: Text-Only Reasoning Unleashes Zero-Shot Multimodal Evaluators
by: Ko, Jongwoo, et al.
Published: (2025)
by: Ko, Jongwoo, et al.
Published: (2025)
LoCar: Localization-Aware Evaluation of In-Vehicle Assistants through Fine-Grained Sociolinguistic Control
by: Jeong, Seogyeong, et al.
Published: (2026)
by: Jeong, Seogyeong, et al.
Published: (2026)
Toward Evaluative Thinking: Meta Policy Optimization with Evolving Reward Models
by: Kim, Zae Myung, et al.
Published: (2025)
by: Kim, Zae Myung, et al.
Published: (2025)
References Indeed Matter? Reference-Free Preference Optimization for Conversational Query Reformulation
by: Kim, Doyoung, et al.
Published: (2025)
by: Kim, Doyoung, et al.
Published: (2025)
Multi-Drafter Speculative Decoding with Alignment Feedback
by: Kim, Taehyeon, et al.
Published: (2026)
by: Kim, Taehyeon, et al.
Published: (2026)
LLM-as-an-Interviewer: Beyond Static Testing Through Dynamic LLM Evaluation
by: Kim, Eunsu, et al.
Published: (2024)
by: Kim, Eunsu, et al.
Published: (2024)
Degradation of 2,4‐dinitrotoluene by iron sulfide/manganese sulfide–biochar composites in the presence of persulfate and hydrogen peroxide
by: Seok‐Young Oh, et al.
Published: (2025)
by: Seok‐Young Oh, et al.
Published: (2025)
Instruction Tuning with and without Context: Behavioral Shifts and Downstream Impact
by: Lee, Hyunji, et al.
Published: (2025)
by: Lee, Hyunji, et al.
Published: (2025)
MATE: Meet At The Embedding -- Connecting Images with Long Texts
by: Jang, Young Kyun, et al.
Published: (2024)
by: Jang, Young Kyun, et al.
Published: (2024)
Similar Items
-
Block-wise Codeword Embedding for Reliable Multi-bit Text Watermarking
by: Kim, Joeun, et al.
Published: (2026) -
KULTURE Bench: A Benchmark for Assessing Language Model in Korean Cultural Context
by: Wang, Xiaonan, et al.
Published: (2024) -
Toward Reliable VLM: A Fine-Grained Benchmark and Framework for Exposure, Bias, and Inference in Korean Street Views
by: Wang, Xiaonan, et al.
Published: (2025) -
KTRL+F: Knowledge-Augmented In-Document Search
by: Oh, Hanseok, et al.
Published: (2023) -
Are they lovers or friends? Evaluating LLMs' Social Reasoning in English and Korean Dialogues
by: Kim, Eunsu, et al.
Published: (2025)