:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Kim, Sewon, Kim, Jiwon, Shin, Seungwoo, Chung, Hyejin, Moon, Daeun, Kwon, Yejin, Yoon, Hyunsoo
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2508.16921
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

LogicQA: Logical Anomaly Detection with Vision Language Model Generated Questions
by: Kwon, Yejin, et al.
Published: (2025)

Bidirectional Multimodal Prompt Learning with Scale-Aware Training for Few-Shot Multi-Class Anomaly Detection
by: Lee, Yujin, et al.
Published: (2024)

M3-SLU: Evaluating Speaker-Attributed Reasoning in Multimodal Large Language Models
by: Kwon, Yejin, et al.
Published: (2025)

Talk Isn't Always Cheap: Understanding Failure Modes in Multi-Agent Debate
by: Wynn, Andrea, et al.
Published: (2025)

Reasoning Isn't Enough: Examining Truth-Bias and Sycophancy in LLMs
by: Barkett, Emilio, et al.
Published: (2025)

Inverse Scaling: When Bigger Isn't Better
by: McKenzie, Ian R., et al.
Published: (2023)

Unlearning Isn't Deletion: Investigating Reversibility of Machine Unlearning in LLMs
by: Xu, Xiaoyu, et al.
Published: (2025)

Word Boundary Information Isn't Useful for Encoder Language Models
by: Gow-Smith, Edward, et al.
Published: (2024)

Strong Reasoning Isn't Enough: Evaluating Evidence Elicitation in Interactive Diagnosis
by: Long, Zhuohan, et al.
Published: (2026)

When Meaning Isn't Literal: Exploring Idiomatic Meaning Across Languages and Modalities
by: Das, Sarmistha, et al.
Published: (2026)

Seeing Isn't Believing: Uncovering Blind Spots in Evaluator Vision-Language Models
by: Khan, Mohammed Safi Ur Rahman, et al.
Published: (2026)

Latent Preference Modeling for Cross-Session Personalized Tool Calling
by: Yoon, Yejin, et al.
Published: (2026)

Mathematics Isn't Culture-Free: Probing Cultural Gaps via Entity and Scenario Perturbations
by: Tomar, Aditya, et al.
Published: (2025)

SAIE Framework: Support Alone Isn't Enough -- Advancing LLM Training with Adversarial Remarks
by: Loem, Mengsay, et al.
Published: (2023)

Recall Isn't Enough: Bounding Commitments in Personalized Language Systems
by: Tang, Rui, et al.
Published: (2026)

When Fairness Isn't Statistical: The Limits of Machine Learning in Evaluating Legal Reasoning
by: Barale, Claire, et al.
Published: (2025)

Speculative Verification: Exploiting Information Gain to Refine Speculative Decoding
by: Kim, Sungkyun, et al.
Published: (2025)

Seeing Isn't Believing: Mitigating Belief Inertia via Active Intervention in Embodied Agents
by: Wang, Hanlin, et al.
Published: (2026)

Seeing Isn't Knowing: Do VLMs Know When Not to Answer Spatial Questions (and Why)?
by: Zhang, Yue, et al.
Published: (2026)

When Bigger Isn't Better: A Comprehensive Fairness Evaluation of Political Bias in Multi-News Summarisation
by: Huang, Nannan, et al.
Published: (2026)

Surprise! Uniform Information Density Isn't the Whole Story: Predicting Surprisal Contours in Long-form Discourse
by: Tsipidi, Eleftheria, et al.
Published: (2024)

"Not Aligned" is Not "Malicious": Being Careful about Hallucinations of Large Language Models' Jailbreak
by: Mei, Lingrui, et al.
Published: (2024)

Why Synthetic Isn't Real Yet: A Diagnostic Framework for Contact Center Dialogue Generation
by: Devanathan, Rishikesh, et al.
Published: (2025)

When Correct Isn't Usable: Improving Structured Output Reliability in Small Language Models
by: Galeone, Cosimo, et al.
Published: (2026)

AVCap: Leveraging Audio-Visual Features as Text Tokens for Captioning
by: Kim, Jongsuk, et al.
Published: (2024)

BlendX: Complex Multi-Intent Detection with Blended Patterns
by: Yoon, Yejin, et al.
Published: (2024)

Are Today's LLMs Ready to Explain Well-Being Concepts?
by: Jiang, Bohan, et al.
Published: (2025)

More Isn't Always Better: Balancing Decision Accuracy and Conformity Pressures in Multi-AI Advice
by: Tsuchiya, Yuta, et al.
Published: (2026)

Curveball Steering: The Right Direction To Steer Isn't Always Linear
by: Raval, Shivam, et al.
Published: (2026)

Why and How LLMs Hallucinate: Connecting the Dots with Subsequence Associations
by: Sun, Yiyou, et al.
Published: (2025)

Predicting Psychological Well-Being from Spontaneous Speech using LLMs
by: Loweimi, Erfan, et al.
Published: (2026)

Truth-Aware Context Selection: Mitigating Hallucinations of Large Language Models Being Misled by Untruthful Contexts
by: Yu, Tian, et al.
Published: (2024)

SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks
by: Song, Jiwon, et al.
Published: (2024)

What You Read Isn't What You Hear: Linguistic Sensitivity in Deepfake Speech Detection
by: Nguyen, Binh, et al.
Published: (2025)

CXR-LLAVA: a multimodal large language model for interpreting chest X-ray images
by: Lee, Seowoo, et al.
Published: (2023)

LLM Self Defense: By Self Examination, LLMs Know They Are Being Tricked
by: Phute, Mansi, et al.
Published: (2023)

The LLM Effect: Are Humans Truly Using LLMs, or Are They Being Influenced By Them Instead?
by: Choi, Alexander S., et al.
Published: (2024)

Unplug and Play Language Models: Decomposing Experts in Language Models at Inference Time
by: Yang, Nakyeong, et al.
Published: (2024)

Relevance Isn't All You Need: Scaling RAG Systems With Inference-Time Compute Via Multi-Criteria Reranking
by: LeVine, Will, et al.
Published: (2025)

Bigger Isn't Always Memorizing: Early Stopping Overparameterized Diffusion Models
by: Favero, Alessandro, et al.
Published: (2025)