Saved in:
| Main Authors: | Kim, Sewon, Kim, Jiwon, Shin, Seungwoo, Chung, Hyejin, Moon, Daeun, Kwon, Yejin, Yoon, Hyunsoo |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.16921 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
LogicQA: Logical Anomaly Detection with Vision Language Model Generated Questions
by: Kwon, Yejin, et al.
Published: (2025)
by: Kwon, Yejin, et al.
Published: (2025)
Bidirectional Multimodal Prompt Learning with Scale-Aware Training for Few-Shot Multi-Class Anomaly Detection
by: Lee, Yujin, et al.
Published: (2024)
by: Lee, Yujin, et al.
Published: (2024)
M3-SLU: Evaluating Speaker-Attributed Reasoning in Multimodal Large Language Models
by: Kwon, Yejin, et al.
Published: (2025)
by: Kwon, Yejin, et al.
Published: (2025)
Talk Isn't Always Cheap: Understanding Failure Modes in Multi-Agent Debate
by: Wynn, Andrea, et al.
Published: (2025)
by: Wynn, Andrea, et al.
Published: (2025)
Reasoning Isn't Enough: Examining Truth-Bias and Sycophancy in LLMs
by: Barkett, Emilio, et al.
Published: (2025)
by: Barkett, Emilio, et al.
Published: (2025)
Inverse Scaling: When Bigger Isn't Better
by: McKenzie, Ian R., et al.
Published: (2023)
by: McKenzie, Ian R., et al.
Published: (2023)
Unlearning Isn't Deletion: Investigating Reversibility of Machine Unlearning in LLMs
by: Xu, Xiaoyu, et al.
Published: (2025)
by: Xu, Xiaoyu, et al.
Published: (2025)
Word Boundary Information Isn't Useful for Encoder Language Models
by: Gow-Smith, Edward, et al.
Published: (2024)
by: Gow-Smith, Edward, et al.
Published: (2024)
Strong Reasoning Isn't Enough: Evaluating Evidence Elicitation in Interactive Diagnosis
by: Long, Zhuohan, et al.
Published: (2026)
by: Long, Zhuohan, et al.
Published: (2026)
When Meaning Isn't Literal: Exploring Idiomatic Meaning Across Languages and Modalities
by: Das, Sarmistha, et al.
Published: (2026)
by: Das, Sarmistha, et al.
Published: (2026)
Seeing Isn't Believing: Uncovering Blind Spots in Evaluator Vision-Language Models
by: Khan, Mohammed Safi Ur Rahman, et al.
Published: (2026)
by: Khan, Mohammed Safi Ur Rahman, et al.
Published: (2026)
Latent Preference Modeling for Cross-Session Personalized Tool Calling
by: Yoon, Yejin, et al.
Published: (2026)
by: Yoon, Yejin, et al.
Published: (2026)
Mathematics Isn't Culture-Free: Probing Cultural Gaps via Entity and Scenario Perturbations
by: Tomar, Aditya, et al.
Published: (2025)
by: Tomar, Aditya, et al.
Published: (2025)
SAIE Framework: Support Alone Isn't Enough -- Advancing LLM Training with Adversarial Remarks
by: Loem, Mengsay, et al.
Published: (2023)
by: Loem, Mengsay, et al.
Published: (2023)
Recall Isn't Enough: Bounding Commitments in Personalized Language Systems
by: Tang, Rui, et al.
Published: (2026)
by: Tang, Rui, et al.
Published: (2026)
When Fairness Isn't Statistical: The Limits of Machine Learning in Evaluating Legal Reasoning
by: Barale, Claire, et al.
Published: (2025)
by: Barale, Claire, et al.
Published: (2025)
Speculative Verification: Exploiting Information Gain to Refine Speculative Decoding
by: Kim, Sungkyun, et al.
Published: (2025)
by: Kim, Sungkyun, et al.
Published: (2025)
Seeing Isn't Believing: Mitigating Belief Inertia via Active Intervention in Embodied Agents
by: Wang, Hanlin, et al.
Published: (2026)
by: Wang, Hanlin, et al.
Published: (2026)
Seeing Isn't Knowing: Do VLMs Know When Not to Answer Spatial Questions (and Why)?
by: Zhang, Yue, et al.
Published: (2026)
by: Zhang, Yue, et al.
Published: (2026)
When Bigger Isn't Better: A Comprehensive Fairness Evaluation of Political Bias in Multi-News Summarisation
by: Huang, Nannan, et al.
Published: (2026)
by: Huang, Nannan, et al.
Published: (2026)
Surprise! Uniform Information Density Isn't the Whole Story: Predicting Surprisal Contours in Long-form Discourse
by: Tsipidi, Eleftheria, et al.
Published: (2024)
by: Tsipidi, Eleftheria, et al.
Published: (2024)
"Not Aligned" is Not "Malicious": Being Careful about Hallucinations of Large Language Models' Jailbreak
by: Mei, Lingrui, et al.
Published: (2024)
by: Mei, Lingrui, et al.
Published: (2024)
Why Synthetic Isn't Real Yet: A Diagnostic Framework for Contact Center Dialogue Generation
by: Devanathan, Rishikesh, et al.
Published: (2025)
by: Devanathan, Rishikesh, et al.
Published: (2025)
When Correct Isn't Usable: Improving Structured Output Reliability in Small Language Models
by: Galeone, Cosimo, et al.
Published: (2026)
by: Galeone, Cosimo, et al.
Published: (2026)
AVCap: Leveraging Audio-Visual Features as Text Tokens for Captioning
by: Kim, Jongsuk, et al.
Published: (2024)
by: Kim, Jongsuk, et al.
Published: (2024)
BlendX: Complex Multi-Intent Detection with Blended Patterns
by: Yoon, Yejin, et al.
Published: (2024)
by: Yoon, Yejin, et al.
Published: (2024)
Are Today's LLMs Ready to Explain Well-Being Concepts?
by: Jiang, Bohan, et al.
Published: (2025)
by: Jiang, Bohan, et al.
Published: (2025)
More Isn't Always Better: Balancing Decision Accuracy and Conformity Pressures in Multi-AI Advice
by: Tsuchiya, Yuta, et al.
Published: (2026)
by: Tsuchiya, Yuta, et al.
Published: (2026)
Curveball Steering: The Right Direction To Steer Isn't Always Linear
by: Raval, Shivam, et al.
Published: (2026)
by: Raval, Shivam, et al.
Published: (2026)
Why and How LLMs Hallucinate: Connecting the Dots with Subsequence Associations
by: Sun, Yiyou, et al.
Published: (2025)
by: Sun, Yiyou, et al.
Published: (2025)
Predicting Psychological Well-Being from Spontaneous Speech using LLMs
by: Loweimi, Erfan, et al.
Published: (2026)
by: Loweimi, Erfan, et al.
Published: (2026)
Truth-Aware Context Selection: Mitigating Hallucinations of Large Language Models Being Misled by Untruthful Contexts
by: Yu, Tian, et al.
Published: (2024)
by: Yu, Tian, et al.
Published: (2024)
SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks
by: Song, Jiwon, et al.
Published: (2024)
by: Song, Jiwon, et al.
Published: (2024)
What You Read Isn't What You Hear: Linguistic Sensitivity in Deepfake Speech Detection
by: Nguyen, Binh, et al.
Published: (2025)
by: Nguyen, Binh, et al.
Published: (2025)
CXR-LLAVA: a multimodal large language model for interpreting chest X-ray images
by: Lee, Seowoo, et al.
Published: (2023)
by: Lee, Seowoo, et al.
Published: (2023)
LLM Self Defense: By Self Examination, LLMs Know They Are Being Tricked
by: Phute, Mansi, et al.
Published: (2023)
by: Phute, Mansi, et al.
Published: (2023)
The LLM Effect: Are Humans Truly Using LLMs, or Are They Being Influenced By Them Instead?
by: Choi, Alexander S., et al.
Published: (2024)
by: Choi, Alexander S., et al.
Published: (2024)
Unplug and Play Language Models: Decomposing Experts in Language Models at Inference Time
by: Yang, Nakyeong, et al.
Published: (2024)
by: Yang, Nakyeong, et al.
Published: (2024)
Relevance Isn't All You Need: Scaling RAG Systems With Inference-Time Compute Via Multi-Criteria Reranking
by: LeVine, Will, et al.
Published: (2025)
by: LeVine, Will, et al.
Published: (2025)
Bigger Isn't Always Memorizing: Early Stopping Overparameterized Diffusion Models
by: Favero, Alessandro, et al.
Published: (2025)
by: Favero, Alessandro, et al.
Published: (2025)
Similar Items
-
LogicQA: Logical Anomaly Detection with Vision Language Model Generated Questions
by: Kwon, Yejin, et al.
Published: (2025) -
Bidirectional Multimodal Prompt Learning with Scale-Aware Training for Few-Shot Multi-Class Anomaly Detection
by: Lee, Yujin, et al.
Published: (2024) -
M3-SLU: Evaluating Speaker-Attributed Reasoning in Multimodal Large Language Models
by: Kwon, Yejin, et al.
Published: (2025) -
Talk Isn't Always Cheap: Understanding Failure Modes in Multi-Agent Debate
by: Wynn, Andrea, et al.
Published: (2025) -
Reasoning Isn't Enough: Examining Truth-Bias and Sycophancy in LLMs
by: Barkett, Emilio, et al.
Published: (2025)