:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Lee, Wonhyuk, Kim, Youngchol, Park, Yunjin, Moon, Junhyung, Jeong, Dongyoung, Park, Wanjin
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2509.23381
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Responsible AI Technical Report
by: KT, et al.
Published: (2025)

Adaptive Task Vectors for Large Language Models
by: Kang, Joonseong, et al.
Published: (2025)

SentGuard: Sentence-Level Streaming Guardrails for Large Language Models
by: Yu, Jiaqi, et al.
Published: (2026)

MrGuard: A Multilingual Reasoning Guardrail for Universal LLM Safety
by: Yang, Yahan, et al.
Published: (2025)

ConsisGuard: Aligning Safety Deliberation with Policy Enforcement in LLM Guardrails
by: Wang, Yan, et al.
Published: (2026)

BehaviorSFT: Behavioral Token Conditioning for Clinical Agents Across the Proactivity Spectrum
by: Kim, Yubin, et al.
Published: (2025)

Rare-to-Frequent: Unlocking Compositional Generation Power of Diffusion Models on Rare Concepts with LLM Guidance
by: Park, Dongmin, et al.
Published: (2024)

Does a Large Language Model Really Speak in Human-Like Language?
by: Park, Mose, et al.
Published: (2025)

EMBGuard: Constructing Hazard-Aware Guardrails for Safe Planning in Embodied Agents
by: Choi, Dongwook, et al.
Published: (2026)

Beyond Task-Oriented and Chitchat Dialogues: Proactive and Transition-Aware Conversational Agents
by: Yoon, Yejin, et al.
Published: (2025)

KLAAD: Refining Attention Mechanisms to Reduce Societal Bias in Generative Language Models
by: Kim, Seorin, et al.
Published: (2025)

DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails
by: Deng, Yihe, et al.
Published: (2025)

Spread Preference Annotation: Direct Preference Judgment for Efficient LLM Alignment
by: Kim, Dongyoung, et al.
Published: (2024)

InstaTrans: An Instruction-Aware Translation Framework for Non-English Instruction Datasets
by: Kim, Yungi, et al.
Published: (2024)

Prefixing Attention Sinks can Mitigate Activation Outliers for Large Language Model Quantization
by: Son, Seungwoo, et al.
Published: (2024)

Task-Aware LoRA Adapter Composition via Similarity Retrieval in Vector Databases
by: Adsul, Riya, et al.
Published: (2026)

Beyond One Path: Evaluating and Enhancing Divergent Thinking in Interactive LLM Agents
by: Park, Jihyeong, et al.
Published: (2026)

Beyond Two-Stage Training: Cooperative SFT and RL for LLM Reasoning
by: Chen, Liang, et al.
Published: (2025)

Guarding the Guardrails: A Taxonomy-Driven Approach to Jailbreak Detection
by: Giarrusso, Francesco, et al.
Published: (2025)

LitE-SQL: A Lightweight and Efficient Text-to-SQL Framework with Vector-based Schema Linking and Execution-Guided Self-Correction
by: Piao, Shengmin, et al.
Published: (2025)

WebGuard: Building a Generalizable Guardrail for Web Agents
by: Zheng, Boyuan, et al.
Published: (2025)

LLM-Enhanced Linear Autoencoders for Recommendation
by: Moon, Jaewan, et al.
Published: (2025)

Benchmarking LLM Guardrails in Handling Multilingual Toxicity
by: Yang, Yahan, et al.
Published: (2024)

Moral Outrage Shapes Commitments Beyond Attention: Multimodal Moral Emotions on YouTube in Korea and the US
by: Park, Seongchan, et al.
Published: (2026)

Beyond Line-Level Filtering for the Pretraining Corpora of LLMs
by: Park, Chanwoo, et al.
Published: (2025)

EnSToM: Enhancing Dialogue Systems with Entropy-Scaled Steering Vectors for Topic Maintenance
by: Suh, Heejae, et al.
Published: (2025)

Advancing Beyond Identification: Multi-bit Watermark for Large Language Models
by: Yoo, KiYoon, et al.
Published: (2023)

ThinkGuard: Deliberative Slow Thinking Leads to Cautious Guardrails
by: Wen, Xiaofei, et al.
Published: (2025)

OmniGuard: Unified Omni-Modal Guardrails with Deliberate Reasoning
by: Zhu, Boyu, et al.
Published: (2025)

UniGuard: Towards Universal Safety Guardrails for Jailbreak Attacks on Multimodal Large Language Models
by: Oh, Sejoon, et al.
Published: (2024)

Beyond the Final Answer: Evaluating the Reasoning Trajectories of Tool-Augmented Agents
by: Kim, Wonjoong, et al.
Published: (2025)

ConceptGuard: Neuro-Symbolic Safety Guardrails via Sparse Interpretable Jailbreak Concepts
by: Aswal, Darpan, et al.
Published: (2025)

Maximizing Local Entropy Where It Matters: Prefix-Aware Localized LLM Unlearning
by: Zhai, Naixin, et al.
Published: (2026)

CUE-M: Contextual Understanding and Enhanced Search with Multimodal Large Language Model
by: Go, Dongyoung, et al.
Published: (2024)

Label Words as Local Task Vectors in In-Context Learning
by: Zheng, Bowen, et al.
Published: (2024)

Efficient and Scalable Estimation of Tool Representations in Vector Space
by: Moon, Suhong, et al.
Published: (2024)

InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail Models
by: Li, Hao, et al.
Published: (2024)

Linguistics-Aware Non-Distortionary LLM Watermarking
by: Park, Shinwoo, et al.
Published: (2026)

PrefixMemory-Tuning: Modernizing Prefix-Tuning by Decoupling the Prefix from Attention
by: Wang, Haonan, et al.
Published: (2025)

Breaking the Pre-Sampling Barrier: Activation-Informed Difficulty-Aware Self-Consistency
by: Yoon, Taewoong, et al.
Published: (2026)