Saved in:
| Main Authors: | Xiong, Chen, Wang, Zihao, Zhu, Rui, Ho, Tsung-Yi, Chen, Pin-Yu, Xiong, Jingwei, Tang, Haixu |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.06057 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
CoP: Agentic Red-teaming for Large Language Models using Composition of Principles
by: Xiong, Chen, et al.
Published: (2025)
by: Xiong, Chen, et al.
Published: (2025)
Gradient Cuff: Detecting Jailbreak Attacks on Large Language Models by Exploring Refusal Loss Landscapes
by: Hu, Xiaomeng, et al.
Published: (2024)
by: Hu, Xiaomeng, et al.
Published: (2024)
Defining and Evaluating Physical Safety for Large Language Models
by: Tang, Yung-Chen, et al.
Published: (2024)
by: Tang, Yung-Chen, et al.
Published: (2024)
Attention Slipping: A Mechanistic Understanding of Jailbreak Attacks and Defenses in LLMs
by: Hu, Xiaomeng, et al.
Published: (2025)
by: Hu, Xiaomeng, et al.
Published: (2025)
Can LLMs Predict Polymer Physics Just by Reading Synthesis and Processing Prose?
by: Liu, Yuchu, et al.
Published: (2026)
by: Liu, Yuchu, et al.
Published: (2026)
Steering Externalities: Benign Activation Steering Unintentionally Increases Jailbreak Risk for Large Language Models
by: Xiong, Chen, et al.
Published: (2026)
by: Xiong, Chen, et al.
Published: (2026)
The Janus Interface: How Fine-Tuning in Large Language Models Amplifies the Privacy Risks
by: Chen, Xiaoyi, et al.
Published: (2023)
by: Chen, Xiaoyi, et al.
Published: (2023)
Hey AI Can You Grade My Essay?: Automatic Essay Grading
by: Maliha, Maisha, et al.
Published: (2024)
by: Maliha, Maisha, et al.
Published: (2024)
Retention Score: Quantifying Jailbreak Risks for Vision Language Models
by: Li, Zaitang, et al.
Published: (2024)
by: Li, Zaitang, et al.
Published: (2024)
Token Highlighter: Inspecting and Mitigating Jailbreak Prompts for Large Language Models
by: Hu, Xiaomeng, et al.
Published: (2024)
by: Hu, Xiaomeng, et al.
Published: (2024)
Chimera: A Lossless Decoding Method for Accelerating Large Language Models Inference by Fusing all Tokens
by: Zeng, Ziqian, et al.
Published: (2024)
by: Zeng, Ziqian, et al.
Published: (2024)
Generating Pretraining Tokens from Organic Data for Data-Bound Scaling
by: Yu, Zichun, et al.
Published: (2026)
by: Yu, Zichun, et al.
Published: (2026)
From Tokens to Lattices: Emergent Lattice Structures in Language Models
by: Xiong, Bo, et al.
Published: (2025)
by: Xiong, Bo, et al.
Published: (2025)
Duwak: Dual Watermarks in Large Language Models
by: Zhu, Chaoyi, et al.
Published: (2024)
by: Zhu, Chaoyi, et al.
Published: (2024)
HeySQuAD: A Spoken Question Answering Dataset
by: Wu, Yijing, et al.
Published: (2023)
by: Wu, Yijing, et al.
Published: (2023)
Hey, wait a minute: on at-issue sensitivity in Language Models
by: Kim, Sanghee J., et al.
Published: (2025)
by: Kim, Sanghee J., et al.
Published: (2025)
LLM4DistReconfig: A Fine-tuned Large Language Model for Power Distribution Network Reconfiguration
by: Christou, Panayiotis, et al.
Published: (2025)
by: Christou, Panayiotis, et al.
Published: (2025)
Enhancing Patient-Centric Communication: Leveraging LLMs to Simulate Patient Perspectives
by: Ma, Xinyao, et al.
Published: (2025)
by: Ma, Xinyao, et al.
Published: (2025)
Elephant in the Room: Unveiling the Impact of Reward Model Quality in Alignment
by: Liu, Yan, et al.
Published: (2024)
by: Liu, Yan, et al.
Published: (2024)
Direct Token Optimization: A Self-contained Approach to Large Language Model Unlearning
by: Lee, Hong kyu, et al.
Published: (2025)
by: Lee, Hong kyu, et al.
Published: (2025)
I-LLM: Efficient Integer-Only Inference for Fully-Quantized Low-Bit Large Language Models
by: Hu, Xing, et al.
Published: (2024)
by: Hu, Xing, et al.
Published: (2024)
Self-Alignment of Large Language Models via Monopolylogue-based Social Scene Simulation
by: Pang, Xianghe, et al.
Published: (2024)
by: Pang, Xianghe, et al.
Published: (2024)
Fundamental Safety-Capability Trade-offs in Fine-tuning Large Language Models
by: Chen, Pin-Yu, et al.
Published: (2025)
by: Chen, Pin-Yu, et al.
Published: (2025)
GREAT Score: Global Robustness Evaluation of Adversarial Perturbation using Generative Models
by: Li, Zaitang, et al.
Published: (2023)
by: Li, Zaitang, et al.
Published: (2023)
NaNa and MiGu: Semantic Data Augmentation Techniques to Enhance Protein Classification in Graph Neural Networks
by: Lan, Yi-Shan, et al.
Published: (2024)
by: Lan, Yi-Shan, et al.
Published: (2024)
Adversarial Attack-Defense Co-Evolution for LLM Safety Alignment via Tree-Group Dual-Aware Search and Optimization
by: Li, Xurui, et al.
Published: (2025)
by: Li, Xurui, et al.
Published: (2025)
Parameter-Efficient Fine-Tuning of Large Language Models via Deconvolution in Subspace
by: Zhang, Jia-Chen, et al.
Published: (2025)
by: Zhang, Jia-Chen, et al.
Published: (2025)
Unraveling the cognitive patterns of Large Language Models through module communities
by: Bhandari, Kushal Raj, et al.
Published: (2025)
by: Bhandari, Kushal Raj, et al.
Published: (2025)
LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models
by: Diao, Shizhe, et al.
Published: (2023)
by: Diao, Shizhe, et al.
Published: (2023)
Adapting Large Language Models for Education: Foundational Capabilities, Potentials, and Challenges
by: Li, Qingyao, et al.
Published: (2023)
by: Li, Qingyao, et al.
Published: (2023)
Saliency-driven Dynamic Token Pruning for Large Language Models
by: Tao, Yao, et al.
Published: (2025)
by: Tao, Yao, et al.
Published: (2025)
Scalable Token-Level Hallucination Detection in Large Language Models
by: Min, Rui, et al.
Published: (2026)
by: Min, Rui, et al.
Published: (2026)
Aligning Large Language Models with Searcher Preferences
by: Wu, Wei, et al.
Published: (2026)
by: Wu, Wei, et al.
Published: (2026)
Improve Decoding Factuality by Token-wise Cross Layer Entropy of Large Language Models
by: Wu, Jialiang, et al.
Published: (2025)
by: Wu, Jialiang, et al.
Published: (2025)
Near-Lossless Model Compression Enables Longer Context Inference in DNA Large Language Models
by: Zhu, Rui, et al.
Published: (2025)
by: Zhu, Rui, et al.
Published: (2025)
FastMem: Fast Memorization of Prompt Improves Context Awareness of Large Language Models
by: Zhu, Junyi, et al.
Published: (2024)
by: Zhu, Junyi, et al.
Published: (2024)
FuxiMT: Sparsifying Large Language Models for Chinese-Centric Multilingual Machine Translation
by: Zhu, Shaolin, et al.
Published: (2025)
by: Zhu, Shaolin, et al.
Published: (2025)
The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Pre-trained Language Models
by: Liu, Yan, et al.
Published: (2024)
by: Liu, Yan, et al.
Published: (2024)
Tokens for Learning, Tokens for Unlearning: Mitigating Membership Inference Attacks in Large Language Models via Dual-Purpose Training
by: Tran, Toan, et al.
Published: (2025)
by: Tran, Toan, et al.
Published: (2025)
Knowledgeable Agents by Offline Reinforcement Learning from Large Language Model Rollouts
by: Pang, Jing-Cheng, et al.
Published: (2024)
by: Pang, Jing-Cheng, et al.
Published: (2024)
Similar Items
-
CoP: Agentic Red-teaming for Large Language Models using Composition of Principles
by: Xiong, Chen, et al.
Published: (2025) -
Gradient Cuff: Detecting Jailbreak Attacks on Large Language Models by Exploring Refusal Loss Landscapes
by: Hu, Xiaomeng, et al.
Published: (2024) -
Defining and Evaluating Physical Safety for Large Language Models
by: Tang, Yung-Chen, et al.
Published: (2024) -
Attention Slipping: A Mechanistic Understanding of Jailbreak Attacks and Defenses in LLMs
by: Hu, Xiaomeng, et al.
Published: (2025) -
Can LLMs Predict Polymer Physics Just by Reading Synthesis and Processing Prose?
by: Liu, Yuchu, et al.
Published: (2026)