:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Kim, Su-Hyeon, Jin, Hyundong, Lee, Yejin, Han, Yo-Sub
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2604.01604
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

How Does the Thinking Step Influence Model Safety? An Entropy-based Safety Reminder for LRMs
by: Kim, Su-Hyeon, et al.
Published: (2026)

Obfuscation Rules for Detecting and Detoxifying Korean Toxicity
by: Lee, Yejin, et al.
Published: (2025)

Cross-Family Universality of Behavioral Axes via Anchor-Projected Representations
by: Kim, Su-Hyeon, et al.
Published: (2026)

EPIC: Efficient and Parallel Inference under CFG Constraints for Diffusion Language Models
by: Jin, Hyundong, et al.
Published: (2026)

NCO: A Versatile Plug-in for Handling Negative Constraints in Decoding
by: Jin, Hyundong, et al.
Published: (2026)

Detection of LLM-Paraphrased Code and Identification of the Responsible LLM Using Coding Style Features
by: Park, Shinwoo, et al.
Published: (2025)

RegexPSPACE: A Benchmark for Evaluating LLM Reasoning on PSPACE-complete Regex Problems
by: Jin, Hyundong, et al.
Published: (2025)

Steering Language Models Before They Speak: Logit-Level Interventions
by: An, Hyeseon, et al.
Published: (2026)

STAB: Specification-driven Testing for Algorithmic Bottlenecks
by: Lim, Soohan, et al.
Published: (2026)

ECO: Enhanced Code Optimization via Performance-Aware Prompting for Code-LLMs
by: Kim, Su-Hyeon, et al.
Published: (2025)

RV-HATE: Reinforced Multi-Module Voting for Implicit Hate Speech Detection
by: Lee, Yejin, et al.
Published: (2025)

CRaFT: An Explanation-Based Framework for Evaluating Cultural Reasoning in Multilingual Language Models
by: Hossain, Shehenaz, et al.
Published: (2025)

TRAPDOC: Deceiving LLM Users by Injecting Imperceptible Phantom Tokens into Documents
by: Jin, Hyundong, et al.
Published: (2025)

AmpleHate: Amplifying the Attention for Versatile Implicit Hate Detection
by: Lee, Yejin, et al.
Published: (2025)

Repairing Regex Vulnerabilities via Localization-Guided Instructions
by: Sung, Sicheol, et al.
Published: (2025)

DLM-SWAI: Steering Diffusion Language Models Before They Unmask
by: An, Hyeseon, et al.
Published: (2026)

KatFishNet: Detecting LLM-Generated Korean Text through Linguistic Feature Analysis
by: Park, Shinwoo, et al.
Published: (2025)

From Intuition to Calibrated Judgment: A Rubric-Based Expert-Panel Study of Human Detection of LLM-Generated Korean Text
by: Park, Shinwoo, et al.
Published: (2026)

Marking Code Without Breaking It: Code Watermarking for Detecting LLM-Generated Code
by: Kim, Jungin, et al.
Published: (2025)

Sequential Behavioral Watermarking for LLM Agents
by: An, Hyeseon, et al.
Published: (2026)

A Linguistics-Aware LLM Watermarking via Syntactic Predictability
by: Park, Shinwoo, et al.
Published: (2025)

DITTO: A Spoofing Attack Framework on Watermarked LLMs via Knowledge Distillation
by: An, Hyeseon, et al.
Published: (2025)

Adaptive Steering and Remasking for Safe Generation in Diffusion Language Models
by: Lee, Yejin, et al.
Published: (2026)

URECA: The Chain of Two Minimum Set Cover Problems exists behind Adaptation to Shifts in Semantic Code Search
by: Choi, Seok-Ung, et al.
Published: (2025)

Linguistics-Aware Non-Distortionary LLM Watermarking
by: Park, Shinwoo, et al.
Published: (2026)

WaterMod: Modular Token-Rank Partitioning for Probability-Balanced LLM Watermarking
by: Park, Shinwoo, et al.
Published: (2025)

MEC$^3$O: Multi-Expert Consensus for Code Time Complexity Prediction
by: Hahn, Joonghyuk, et al.
Published: (2025)

GuruAgents: Emulating Wise Investors with Prompt-Guided LLM Agents
by: Kim, Yejin, et al.
Published: (2025)

Can Cross-Layer Transcoders Replace Vision Transformer Activations? An Interpretable Perspective on Vision
by: Chatzoudis, Gerasimos, et al.
Published: (2026)

Continual Learning for Multiple Modalities
by: Jin, Hyundong, et al.
Published: (2025)

TCProF: Time-Complexity Prediction SSL Framework
by: Hahn, Joonghyuk, et al.
Published: (2025)

Feature Selection via Dynamic Graph-based Attention Block in MI-based EEG Signals
by: Han, Hyeon-Taek, et al.
Published: (2024)

LogiCase: Effective Test Case Generation from Logical Description in Competitive Programming
by: Sung, Sicheol, et al.
Published: (2025)

A Recommender System for NFT Collectibles with Item Feature
by: Choi, Minjoo, et al.
Published: (2024)

Feature-Guided SAE Steering for Refusal-Rate Control using Contrasting Prompts
by: Bhargav, Samaksh, et al.
Published: (2025)

Latent Preference Modeling for Cross-Session Personalized Tool Calling
by: Yoon, Yejin, et al.
Published: (2026)

RefusalBench: Generative Evaluation of Selective Refusal in Grounded Language Models
by: Muhamed, Aashiq, et al.
Published: (2025)

ContractEval: A Benchmark for Evaluating Contract-Satisfying Assertions in Code Generation
by: Lim, Soohan, et al.
Published: (2025)

Distilling to Hybrid Attention Models via KL-Guided Layer Selection
by: Li, Yanhong, et al.
Published: (2025)

From Refusal Tokens to Refusal Control: Discovering and Steering Category-Specific Refusal Directions
by: Alagharu, Rishab, et al.
Published: (2026)