Saved in:
| Main Authors: | Lee, Wonhyuk, Kim, Youngchol, Park, Yunjin, Moon, Junhyung, Jeong, Dongyoung, Park, Wanjin |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.23381 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Responsible AI Technical Report
by: KT, et al.
Published: (2025)
by: KT, et al.
Published: (2025)
Adaptive Task Vectors for Large Language Models
by: Kang, Joonseong, et al.
Published: (2025)
by: Kang, Joonseong, et al.
Published: (2025)
SentGuard: Sentence-Level Streaming Guardrails for Large Language Models
by: Yu, Jiaqi, et al.
Published: (2026)
by: Yu, Jiaqi, et al.
Published: (2026)
MrGuard: A Multilingual Reasoning Guardrail for Universal LLM Safety
by: Yang, Yahan, et al.
Published: (2025)
by: Yang, Yahan, et al.
Published: (2025)
ConsisGuard: Aligning Safety Deliberation with Policy Enforcement in LLM Guardrails
by: Wang, Yan, et al.
Published: (2026)
by: Wang, Yan, et al.
Published: (2026)
BehaviorSFT: Behavioral Token Conditioning for Clinical Agents Across the Proactivity Spectrum
by: Kim, Yubin, et al.
Published: (2025)
by: Kim, Yubin, et al.
Published: (2025)
Rare-to-Frequent: Unlocking Compositional Generation Power of Diffusion Models on Rare Concepts with LLM Guidance
by: Park, Dongmin, et al.
Published: (2024)
by: Park, Dongmin, et al.
Published: (2024)
Does a Large Language Model Really Speak in Human-Like Language?
by: Park, Mose, et al.
Published: (2025)
by: Park, Mose, et al.
Published: (2025)
EMBGuard: Constructing Hazard-Aware Guardrails for Safe Planning in Embodied Agents
by: Choi, Dongwook, et al.
Published: (2026)
by: Choi, Dongwook, et al.
Published: (2026)
Beyond Task-Oriented and Chitchat Dialogues: Proactive and Transition-Aware Conversational Agents
by: Yoon, Yejin, et al.
Published: (2025)
by: Yoon, Yejin, et al.
Published: (2025)
KLAAD: Refining Attention Mechanisms to Reduce Societal Bias in Generative Language Models
by: Kim, Seorin, et al.
Published: (2025)
by: Kim, Seorin, et al.
Published: (2025)
DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails
by: Deng, Yihe, et al.
Published: (2025)
by: Deng, Yihe, et al.
Published: (2025)
Spread Preference Annotation: Direct Preference Judgment for Efficient LLM Alignment
by: Kim, Dongyoung, et al.
Published: (2024)
by: Kim, Dongyoung, et al.
Published: (2024)
InstaTrans: An Instruction-Aware Translation Framework for Non-English Instruction Datasets
by: Kim, Yungi, et al.
Published: (2024)
by: Kim, Yungi, et al.
Published: (2024)
Prefixing Attention Sinks can Mitigate Activation Outliers for Large Language Model Quantization
by: Son, Seungwoo, et al.
Published: (2024)
by: Son, Seungwoo, et al.
Published: (2024)
Task-Aware LoRA Adapter Composition via Similarity Retrieval in Vector Databases
by: Adsul, Riya, et al.
Published: (2026)
by: Adsul, Riya, et al.
Published: (2026)
Beyond One Path: Evaluating and Enhancing Divergent Thinking in Interactive LLM Agents
by: Park, Jihyeong, et al.
Published: (2026)
by: Park, Jihyeong, et al.
Published: (2026)
Beyond Two-Stage Training: Cooperative SFT and RL for LLM Reasoning
by: Chen, Liang, et al.
Published: (2025)
by: Chen, Liang, et al.
Published: (2025)
Guarding the Guardrails: A Taxonomy-Driven Approach to Jailbreak Detection
by: Giarrusso, Francesco, et al.
Published: (2025)
by: Giarrusso, Francesco, et al.
Published: (2025)
LitE-SQL: A Lightweight and Efficient Text-to-SQL Framework with Vector-based Schema Linking and Execution-Guided Self-Correction
by: Piao, Shengmin, et al.
Published: (2025)
by: Piao, Shengmin, et al.
Published: (2025)
WebGuard: Building a Generalizable Guardrail for Web Agents
by: Zheng, Boyuan, et al.
Published: (2025)
by: Zheng, Boyuan, et al.
Published: (2025)
LLM-Enhanced Linear Autoencoders for Recommendation
by: Moon, Jaewan, et al.
Published: (2025)
by: Moon, Jaewan, et al.
Published: (2025)
Benchmarking LLM Guardrails in Handling Multilingual Toxicity
by: Yang, Yahan, et al.
Published: (2024)
by: Yang, Yahan, et al.
Published: (2024)
Moral Outrage Shapes Commitments Beyond Attention: Multimodal Moral Emotions on YouTube in Korea and the US
by: Park, Seongchan, et al.
Published: (2026)
by: Park, Seongchan, et al.
Published: (2026)
Beyond Line-Level Filtering for the Pretraining Corpora of LLMs
by: Park, Chanwoo, et al.
Published: (2025)
by: Park, Chanwoo, et al.
Published: (2025)
EnSToM: Enhancing Dialogue Systems with Entropy-Scaled Steering Vectors for Topic Maintenance
by: Suh, Heejae, et al.
Published: (2025)
by: Suh, Heejae, et al.
Published: (2025)
Advancing Beyond Identification: Multi-bit Watermark for Large Language Models
by: Yoo, KiYoon, et al.
Published: (2023)
by: Yoo, KiYoon, et al.
Published: (2023)
ThinkGuard: Deliberative Slow Thinking Leads to Cautious Guardrails
by: Wen, Xiaofei, et al.
Published: (2025)
by: Wen, Xiaofei, et al.
Published: (2025)
OmniGuard: Unified Omni-Modal Guardrails with Deliberate Reasoning
by: Zhu, Boyu, et al.
Published: (2025)
by: Zhu, Boyu, et al.
Published: (2025)
UniGuard: Towards Universal Safety Guardrails for Jailbreak Attacks on Multimodal Large Language Models
by: Oh, Sejoon, et al.
Published: (2024)
by: Oh, Sejoon, et al.
Published: (2024)
Beyond the Final Answer: Evaluating the Reasoning Trajectories of Tool-Augmented Agents
by: Kim, Wonjoong, et al.
Published: (2025)
by: Kim, Wonjoong, et al.
Published: (2025)
ConceptGuard: Neuro-Symbolic Safety Guardrails via Sparse Interpretable Jailbreak Concepts
by: Aswal, Darpan, et al.
Published: (2025)
by: Aswal, Darpan, et al.
Published: (2025)
Maximizing Local Entropy Where It Matters: Prefix-Aware Localized LLM Unlearning
by: Zhai, Naixin, et al.
Published: (2026)
by: Zhai, Naixin, et al.
Published: (2026)
CUE-M: Contextual Understanding and Enhanced Search with Multimodal Large Language Model
by: Go, Dongyoung, et al.
Published: (2024)
by: Go, Dongyoung, et al.
Published: (2024)
Label Words as Local Task Vectors in In-Context Learning
by: Zheng, Bowen, et al.
Published: (2024)
by: Zheng, Bowen, et al.
Published: (2024)
Efficient and Scalable Estimation of Tool Representations in Vector Space
by: Moon, Suhong, et al.
Published: (2024)
by: Moon, Suhong, et al.
Published: (2024)
InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail Models
by: Li, Hao, et al.
Published: (2024)
by: Li, Hao, et al.
Published: (2024)
Linguistics-Aware Non-Distortionary LLM Watermarking
by: Park, Shinwoo, et al.
Published: (2026)
by: Park, Shinwoo, et al.
Published: (2026)
PrefixMemory-Tuning: Modernizing Prefix-Tuning by Decoupling the Prefix from Attention
by: Wang, Haonan, et al.
Published: (2025)
by: Wang, Haonan, et al.
Published: (2025)
Breaking the Pre-Sampling Barrier: Activation-Informed Difficulty-Aware Self-Consistency
by: Yoon, Taewoong, et al.
Published: (2026)
by: Yoon, Taewoong, et al.
Published: (2026)
Similar Items
-
Responsible AI Technical Report
by: KT, et al.
Published: (2025) -
Adaptive Task Vectors for Large Language Models
by: Kang, Joonseong, et al.
Published: (2025) -
SentGuard: Sentence-Level Streaming Guardrails for Large Language Models
by: Yu, Jiaqi, et al.
Published: (2026) -
MrGuard: A Multilingual Reasoning Guardrail for Universal LLM Safety
by: Yang, Yahan, et al.
Published: (2025) -
ConsisGuard: Aligning Safety Deliberation with Policy Enforcement in LLM Guardrails
by: Wang, Yan, et al.
Published: (2026)