Saved in:
| Main Authors: | Huang, ShiYing, Lin, Liang, Li, Yuer, Luo, Kaiwen, Zhou, Zhenhong, Zhang, An, Dong, Junhao, Wang, Kun, Zeng, Zhigang |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.11679 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
CSSBench: Evaluating the Safety of Lightweight LLMs against Chinese-Specific Adversarial Patterns
by: Zhou, Zhenhong, et al.
Published: (2026)
by: Zhou, Zhenhong, et al.
Published: (2026)
EchoDistill:Alignment Noisy-to-Clean Self-Distillation for Robust Audio LLMs
by: Lin, Liang, et al.
Published: (2026)
by: Lin, Liang, et al.
Published: (2026)
How Alignment and Jailbreak Work: Explain LLM Safety through Intermediate Hidden States
by: Zhou, Zhenhong, et al.
Published: (2024)
by: Zhou, Zhenhong, et al.
Published: (2024)
MAD-OPD: Breaking the Ceiling in On-Policy Distillation via Multi-Agent Debate
by: Wang, Jianze, et al.
Published: (2026)
by: Wang, Jianze, et al.
Published: (2026)
Breaking the Ceiling: Exploring the Potential of Jailbreak Attacks through Expanding Strategy Space
by: Huang, Yao, et al.
Published: (2025)
by: Huang, Yao, et al.
Published: (2025)
Omni-Safety under Cross-Modality Conflict: Vulnerabilities, Dynamics Mechanisms and Efficient Alignment
by: Wang, Kun, et al.
Published: (2026)
by: Wang, Kun, et al.
Published: (2026)
HelpSteer2-Preference: Complementing Ratings with Preferences
by: Wang, Zhilin, et al.
Published: (2024)
by: Wang, Zhilin, et al.
Published: (2024)
Interior Eigensolver Based on Rational Filter with Composite rule
by: Chen, Yuer, et al.
Published: (2023)
by: Chen, Yuer, et al.
Published: (2023)
Course-Correction: Safety Alignment Using Synthetic Preferences
by: Xu, Rongwu, et al.
Published: (2024)
by: Xu, Rongwu, et al.
Published: (2024)
CeRA: Overcoming the Linear Ceiling of Low-Rank Adaptation via Capacity Expansion
by: Chen, Hung-Hsuan
Published: (2026)
by: Chen, Hung-Hsuan
Published: (2026)
Breaking the Compression Ceiling: Data-Free Pipeline for Ultra-Efficient Delta Compression
by: Wang, Xiaohui, et al.
Published: (2025)
by: Wang, Xiaohui, et al.
Published: (2025)
Breaking the Performance Ceiling in Reinforcement Learning requires Inference Strategies
by: Chalumeau, Felix, et al.
Published: (2025)
by: Chalumeau, Felix, et al.
Published: (2025)
Hit-RAG: Learning to Reason with Long Contexts via Preference Alignment
by: Liu, Junming, et al.
Published: (2026)
by: Liu, Junming, et al.
Published: (2026)
ChronosAudio: A Comprehensive Long-Audio Benchmark for Evaluating Audio-Large Language Models
by: Luo, Kaiwen, et al.
Published: (2026)
by: Luo, Kaiwen, et al.
Published: (2026)
RSA-Bench: Benchmarking Audio Large Models in Real-World Acoustic Scenarios
by: Zhang, Yibo, et al.
Published: (2026)
by: Zhang, Yibo, et al.
Published: (2026)
HearSay Benchmark: Do Audio LLMs Leak What They Hear?
by: Wang, Jin, et al.
Published: (2026)
by: Wang, Jin, et al.
Published: (2026)
Adaptive Helpfulness-Harmlessness Alignment with Preference Vectors
by: Liang, Ren-Wei, et al.
Published: (2025)
by: Liang, Ren-Wei, et al.
Published: (2025)
Breaking the Capability Ceiling of LLM Post-Training by Reintroducing Markov States
by: Yuan, Yurun, et al.
Published: (2026)
by: Yuan, Yurun, et al.
Published: (2026)
Pcc-tuning: Breaking the Contrastive Learning Ceiling in Semantic Textual Similarity
by: Zhang, Bowen, et al.
Published: (2024)
by: Zhang, Bowen, et al.
Published: (2024)
Improving 3D Finger Traits Recognition via Generalizable Neural Rendering
by: Xu, Hongbin, et al.
Published: (2024)
by: Xu, Hongbin, et al.
Published: (2024)
Can LLMs Help Decentralized Dispute Arbitration? A Case Study of UMA-Resolved Markets on Polymarket
by: Wen, Junhao, et al.
Published: (2026)
by: Wen, Junhao, et al.
Published: (2026)
Hidden in the Noise: Unveiling Backdoors in Audio LLMs Alignment through Latent Acoustic Pattern Triggers
by: Lin, Liang, et al.
Published: (2025)
by: Lin, Liang, et al.
Published: (2025)
Breaking the Ceiling of the LLM Community by Treating Token Generation as a Classification for Ensembling
by: Yu, Yao-Ching, et al.
Published: (2024)
by: Yu, Yao-Ching, et al.
Published: (2024)
On the Role of Attention Heads in Large Language Model Safety
by: Zhou, Zhenhong, et al.
Published: (2024)
by: Zhou, Zhenhong, et al.
Published: (2024)
Attribute-Grounded Selective Reasoning for Artwork Emotion Understanding with Multimodal Large Language Models
by: Zhang, Cheng, et al.
Published: (2026)
by: Zhang, Cheng, et al.
Published: (2026)
Backdoor Collapse: Eliminating Unknown Threats via Known Backdoor Aggregation in Language Models
by: Lin, Liang, et al.
Published: (2025)
by: Lin, Liang, et al.
Published: (2025)
Ceiling of Barium Substitution for B‐Site Cation in Organometal Halide Perovskite Solar Cells
by: Kai-Chi Hsiao, et al.
Published: (2024)
by: Kai-Chi Hsiao, et al.
Published: (2024)
Estimation of Riemannian Quantities from Noisy Data via Density Derivatives
by: Chen, Junhao, et al.
Published: (2026)
by: Chen, Junhao, et al.
Published: (2026)
Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models
by: Zhang, Wenxuan, et al.
Published: (2024)
by: Zhang, Wenxuan, et al.
Published: (2024)
MetaAgent-X : Breaking the Ceiling of Automatic Multi-Agent Systems via End-to-End Reinforcement Learning
by: Zhang, Yaolun, et al.
Published: (2026)
by: Zhang, Yaolun, et al.
Published: (2026)
Tavan / Ceiling
by: petaaerial
Published: (2020)
by: petaaerial
Published: (2020)
A Geometric Probe of the Accuracy-Robustness Trade-off: Sharp Boundaries in Symmetry-Breaking Dimensional Expansion
by: Bai, Yu, et al.
Published: (2026)
by: Bai, Yu, et al.
Published: (2026)
Explaining Human Preferences via Metrics for Structured 3D Reconstruction
by: Langerman, Jack, et al.
Published: (2025)
by: Langerman, Jack, et al.
Published: (2025)
Jailbreaking Large Language Diffusion Models: Revealing Hidden Safety Flaws in Diffusion-Based Text Generation
by: Zhang, Yuanhe, et al.
Published: (2025)
by: Zhang, Yuanhe, et al.
Published: (2025)
HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages
by: Wang, Zhilin, et al.
Published: (2025)
by: Wang, Zhilin, et al.
Published: (2025)
Metis-SPECS: Decoupling Multimodal Learning via Self-distilled Preference-based Cold Start
by: Chen, Kun, et al.
Published: (2025)
by: Chen, Kun, et al.
Published: (2025)
Balancing Safety and Helpfulness in Healthcare AI Assistants through Iterative Preference Alignment
by: Nghiem, Huy, et al.
Published: (2025)
by: Nghiem, Huy, et al.
Published: (2025)
Does Using Counterfactual Help LLMs Explain Textual Importance in Classification?
by: Tan, Nelvin, et al.
Published: (2025)
by: Tan, Nelvin, et al.
Published: (2025)
ImageVeriBypasser: An image verification code recognition approach based on Convolutional Neural Network
by: Tong Ji, et al.
Published: (2024)
by: Tong Ji, et al.
Published: (2024)
Attention Masks Help Adversarial Attacks to Bypass Safety Detectors
by: Shi, Yunfan
Published: (2024)
by: Shi, Yunfan
Published: (2024)
Similar Items
-
CSSBench: Evaluating the Safety of Lightweight LLMs against Chinese-Specific Adversarial Patterns
by: Zhou, Zhenhong, et al.
Published: (2026) -
EchoDistill:Alignment Noisy-to-Clean Self-Distillation for Robust Audio LLMs
by: Lin, Liang, et al.
Published: (2026) -
How Alignment and Jailbreak Work: Explain LLM Safety through Intermediate Hidden States
by: Zhou, Zhenhong, et al.
Published: (2024) -
MAD-OPD: Breaking the Ceiling in On-Policy Distillation via Multi-Agent Debate
by: Wang, Jianze, et al.
Published: (2026) -
Breaking the Ceiling: Exploring the Potential of Jailbreak Attacks through Expanding Strategy Space
by: Huang, Yao, et al.
Published: (2025)