Saved in:
| Main Authors: | Kim, Su-Hyeon, Jin, Hyundong, Lee, Yejin, Han, Yo-Sub |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.01604 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
How Does the Thinking Step Influence Model Safety? An Entropy-based Safety Reminder for LRMs
by: Kim, Su-Hyeon, et al.
Published: (2026)
by: Kim, Su-Hyeon, et al.
Published: (2026)
Obfuscation Rules for Detecting and Detoxifying Korean Toxicity
by: Lee, Yejin, et al.
Published: (2025)
by: Lee, Yejin, et al.
Published: (2025)
Cross-Family Universality of Behavioral Axes via Anchor-Projected Representations
by: Kim, Su-Hyeon, et al.
Published: (2026)
by: Kim, Su-Hyeon, et al.
Published: (2026)
EPIC: Efficient and Parallel Inference under CFG Constraints for Diffusion Language Models
by: Jin, Hyundong, et al.
Published: (2026)
by: Jin, Hyundong, et al.
Published: (2026)
NCO: A Versatile Plug-in for Handling Negative Constraints in Decoding
by: Jin, Hyundong, et al.
Published: (2026)
by: Jin, Hyundong, et al.
Published: (2026)
Detection of LLM-Paraphrased Code and Identification of the Responsible LLM Using Coding Style Features
by: Park, Shinwoo, et al.
Published: (2025)
by: Park, Shinwoo, et al.
Published: (2025)
RegexPSPACE: A Benchmark for Evaluating LLM Reasoning on PSPACE-complete Regex Problems
by: Jin, Hyundong, et al.
Published: (2025)
by: Jin, Hyundong, et al.
Published: (2025)
Steering Language Models Before They Speak: Logit-Level Interventions
by: An, Hyeseon, et al.
Published: (2026)
by: An, Hyeseon, et al.
Published: (2026)
STAB: Specification-driven Testing for Algorithmic Bottlenecks
by: Lim, Soohan, et al.
Published: (2026)
by: Lim, Soohan, et al.
Published: (2026)
ECO: Enhanced Code Optimization via Performance-Aware Prompting for Code-LLMs
by: Kim, Su-Hyeon, et al.
Published: (2025)
by: Kim, Su-Hyeon, et al.
Published: (2025)
RV-HATE: Reinforced Multi-Module Voting for Implicit Hate Speech Detection
by: Lee, Yejin, et al.
Published: (2025)
by: Lee, Yejin, et al.
Published: (2025)
CRaFT: An Explanation-Based Framework for Evaluating Cultural Reasoning in Multilingual Language Models
by: Hossain, Shehenaz, et al.
Published: (2025)
by: Hossain, Shehenaz, et al.
Published: (2025)
TRAPDOC: Deceiving LLM Users by Injecting Imperceptible Phantom Tokens into Documents
by: Jin, Hyundong, et al.
Published: (2025)
by: Jin, Hyundong, et al.
Published: (2025)
AmpleHate: Amplifying the Attention for Versatile Implicit Hate Detection
by: Lee, Yejin, et al.
Published: (2025)
by: Lee, Yejin, et al.
Published: (2025)
Repairing Regex Vulnerabilities via Localization-Guided Instructions
by: Sung, Sicheol, et al.
Published: (2025)
by: Sung, Sicheol, et al.
Published: (2025)
DLM-SWAI: Steering Diffusion Language Models Before They Unmask
by: An, Hyeseon, et al.
Published: (2026)
by: An, Hyeseon, et al.
Published: (2026)
KatFishNet: Detecting LLM-Generated Korean Text through Linguistic Feature Analysis
by: Park, Shinwoo, et al.
Published: (2025)
by: Park, Shinwoo, et al.
Published: (2025)
From Intuition to Calibrated Judgment: A Rubric-Based Expert-Panel Study of Human Detection of LLM-Generated Korean Text
by: Park, Shinwoo, et al.
Published: (2026)
by: Park, Shinwoo, et al.
Published: (2026)
Marking Code Without Breaking It: Code Watermarking for Detecting LLM-Generated Code
by: Kim, Jungin, et al.
Published: (2025)
by: Kim, Jungin, et al.
Published: (2025)
Sequential Behavioral Watermarking for LLM Agents
by: An, Hyeseon, et al.
Published: (2026)
by: An, Hyeseon, et al.
Published: (2026)
A Linguistics-Aware LLM Watermarking via Syntactic Predictability
by: Park, Shinwoo, et al.
Published: (2025)
by: Park, Shinwoo, et al.
Published: (2025)
DITTO: A Spoofing Attack Framework on Watermarked LLMs via Knowledge Distillation
by: An, Hyeseon, et al.
Published: (2025)
by: An, Hyeseon, et al.
Published: (2025)
Adaptive Steering and Remasking for Safe Generation in Diffusion Language Models
by: Lee, Yejin, et al.
Published: (2026)
by: Lee, Yejin, et al.
Published: (2026)
URECA: The Chain of Two Minimum Set Cover Problems exists behind Adaptation to Shifts in Semantic Code Search
by: Choi, Seok-Ung, et al.
Published: (2025)
by: Choi, Seok-Ung, et al.
Published: (2025)
Linguistics-Aware Non-Distortionary LLM Watermarking
by: Park, Shinwoo, et al.
Published: (2026)
by: Park, Shinwoo, et al.
Published: (2026)
WaterMod: Modular Token-Rank Partitioning for Probability-Balanced LLM Watermarking
by: Park, Shinwoo, et al.
Published: (2025)
by: Park, Shinwoo, et al.
Published: (2025)
MEC$^3$O: Multi-Expert Consensus for Code Time Complexity Prediction
by: Hahn, Joonghyuk, et al.
Published: (2025)
by: Hahn, Joonghyuk, et al.
Published: (2025)
GuruAgents: Emulating Wise Investors with Prompt-Guided LLM Agents
by: Kim, Yejin, et al.
Published: (2025)
by: Kim, Yejin, et al.
Published: (2025)
Can Cross-Layer Transcoders Replace Vision Transformer Activations? An Interpretable Perspective on Vision
by: Chatzoudis, Gerasimos, et al.
Published: (2026)
by: Chatzoudis, Gerasimos, et al.
Published: (2026)
Continual Learning for Multiple Modalities
by: Jin, Hyundong, et al.
Published: (2025)
by: Jin, Hyundong, et al.
Published: (2025)
TCProF: Time-Complexity Prediction SSL Framework
by: Hahn, Joonghyuk, et al.
Published: (2025)
by: Hahn, Joonghyuk, et al.
Published: (2025)
Feature Selection via Dynamic Graph-based Attention Block in MI-based EEG Signals
by: Han, Hyeon-Taek, et al.
Published: (2024)
by: Han, Hyeon-Taek, et al.
Published: (2024)
LogiCase: Effective Test Case Generation from Logical Description in Competitive Programming
by: Sung, Sicheol, et al.
Published: (2025)
by: Sung, Sicheol, et al.
Published: (2025)
A Recommender System for NFT Collectibles with Item Feature
by: Choi, Minjoo, et al.
Published: (2024)
by: Choi, Minjoo, et al.
Published: (2024)
Feature-Guided SAE Steering for Refusal-Rate Control using Contrasting Prompts
by: Bhargav, Samaksh, et al.
Published: (2025)
by: Bhargav, Samaksh, et al.
Published: (2025)
Latent Preference Modeling for Cross-Session Personalized Tool Calling
by: Yoon, Yejin, et al.
Published: (2026)
by: Yoon, Yejin, et al.
Published: (2026)
RefusalBench: Generative Evaluation of Selective Refusal in Grounded Language Models
by: Muhamed, Aashiq, et al.
Published: (2025)
by: Muhamed, Aashiq, et al.
Published: (2025)
ContractEval: A Benchmark for Evaluating Contract-Satisfying Assertions in Code Generation
by: Lim, Soohan, et al.
Published: (2025)
by: Lim, Soohan, et al.
Published: (2025)
Distilling to Hybrid Attention Models via KL-Guided Layer Selection
by: Li, Yanhong, et al.
Published: (2025)
by: Li, Yanhong, et al.
Published: (2025)
From Refusal Tokens to Refusal Control: Discovering and Steering Category-Specific Refusal Directions
by: Alagharu, Rishab, et al.
Published: (2026)
by: Alagharu, Rishab, et al.
Published: (2026)
Similar Items
-
How Does the Thinking Step Influence Model Safety? An Entropy-based Safety Reminder for LRMs
by: Kim, Su-Hyeon, et al.
Published: (2026) -
Obfuscation Rules for Detecting and Detoxifying Korean Toxicity
by: Lee, Yejin, et al.
Published: (2025) -
Cross-Family Universality of Behavioral Axes via Anchor-Projected Representations
by: Kim, Su-Hyeon, et al.
Published: (2026) -
EPIC: Efficient and Parallel Inference under CFG Constraints for Diffusion Language Models
by: Jin, Hyundong, et al.
Published: (2026) -
NCO: A Versatile Plug-in for Handling Negative Constraints in Decoding
by: Jin, Hyundong, et al.
Published: (2026)