Saved in:
| Main Authors: | Min, Nay Myat, Pham, Long H., Li, Yige, Sun, Jun |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2411.12768 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Layerwise Convergence Fingerprints for Runtime Misbehavior Detection in Large Language Models
by: Min, Nay Myat, et al.
Published: (2026)
by: Min, Nay Myat, et al.
Published: (2026)
CORVUS: Red-Teaming Hallucination Detectors via Internal Signal Camouflage in Large Language Models
by: Min, Nay Myat, et al.
Published: (2026)
by: Min, Nay Myat, et al.
Published: (2026)
Propaganda AI: An Analysis of Semantic Divergence in Large Language Models
by: Min, Nay Myat, et al.
Published: (2025)
by: Min, Nay Myat, et al.
Published: (2025)
Unified Neural Backdoor Removal with Only Few Clean Samples through Unlearning and Relearning
by: Min, Nay Myat, et al.
Published: (2024)
by: Min, Nay Myat, et al.
Published: (2024)
AutoBackdoor: Automating Backdoor Attacks via LLM Agents
by: Li, Yige, et al.
Published: (2025)
by: Li, Yige, et al.
Published: (2025)
Where Did It Go Wrong? Attributing Undesirable LLM Behaviors via Representation Gradient Tracing
by: Li, Zhe, et al.
Published: (2025)
by: Li, Zhe, et al.
Published: (2025)
The Consistency Hypothesis in Uncertainty Quantification for Large Language Models
by: Xiao, Quan, et al.
Published: (2025)
by: Xiao, Quan, et al.
Published: (2025)
Backdoor4Good: Benchmarking Beneficial Uses of Backdoors in LLMs
by: Li, Yige, et al.
Published: (2026)
by: Li, Yige, et al.
Published: (2026)
Zero-Shot Defense Against Toxic Images via Inherent Multimodal Alignment in LVLMs
by: Zhao, Wei, et al.
Published: (2025)
by: Zhao, Wei, et al.
Published: (2025)
Instructions as Backdoors: Backdoor Vulnerabilities of Instruction Tuning for Large Language Models
by: Xu, Jiashu, et al.
Published: (2023)
by: Xu, Jiashu, et al.
Published: (2023)
ConsistRM: Improving Generative Reward Models via Consistency-Aware Self-Training
by: Liang, Yu, et al.
Published: (2026)
by: Liang, Yu, et al.
Published: (2026)
Automating Steering for Safe Multimodal Large Language Models
by: Wu, Lyucheng, et al.
Published: (2025)
by: Wu, Lyucheng, et al.
Published: (2025)
Mitigating Biases for Instruction-following Language Models via Bias Neurons Elimination
by: Yang, Nakyeong, et al.
Published: (2023)
by: Yang, Nakyeong, et al.
Published: (2023)
Leveraging Large Language Models for Suicide Detection on Social Media with Limited Labels
by: Nguyen, Vy, et al.
Published: (2024)
by: Nguyen, Vy, et al.
Published: (2024)
Knowledge-based Consistency Testing of Large Language Models
by: Rajan, Sai Sathiesh, et al.
Published: (2024)
by: Rajan, Sai Sathiesh, et al.
Published: (2024)
Internal Value Alignment in Large Language Models through Controlled Value Vector Activation
by: Jin, Haoran, et al.
Published: (2025)
by: Jin, Haoran, et al.
Published: (2025)
Bridging Internal Probability and Self-Consistency for Effective and Efficient LLM Reasoning
by: Zhou, Zhi, et al.
Published: (2025)
by: Zhou, Zhi, et al.
Published: (2025)
KnowPhish: Large Language Models Meet Multimodal Knowledge Graphs for Enhancing Reference-Based Phishing Detection
by: Li, Yuexin, et al.
Published: (2024)
by: Li, Yuexin, et al.
Published: (2024)
SPIN: Sparsifying and Integrating Internal Neurons in Large Language Models for Text Classification
by: Jiao, Difan, et al.
Published: (2023)
by: Jiao, Difan, et al.
Published: (2023)
A Survey on Sparse Autoencoders: Interpreting the Internal Mechanisms of Large Language Models
by: Shu, Dong, et al.
Published: (2025)
by: Shu, Dong, et al.
Published: (2025)
Analyzing And Editing Inner Mechanisms Of Backdoored Language Models
by: Lamparth, Max, et al.
Published: (2023)
by: Lamparth, Max, et al.
Published: (2023)
Stepwise Self-Consistent Mathematical Reasoning with Large Language Models
by: Zhao, Zilong, et al.
Published: (2024)
by: Zhao, Zilong, et al.
Published: (2024)
Flaming-hot Initiation with Regular Execution Sampling for Large Language Models
by: Chen, Weizhe, et al.
Published: (2024)
by: Chen, Weizhe, et al.
Published: (2024)
Induced Numerical Instability: Hidden Costs in Multimodal Large Language Models
by: Wong, Wai Tuck, et al.
Published: (2026)
by: Wong, Wai Tuck, et al.
Published: (2026)
PromptIntern: Saving Inference Costs by Internalizing Recurrent Prompt during Large Language Model Fine-tuning
by: Zou, Jiaru, et al.
Published: (2024)
by: Zou, Jiaru, et al.
Published: (2024)
Rethinking Entropy Regularization in Large Reasoning Models
by: Jiang, Yuxian, et al.
Published: (2025)
by: Jiang, Yuxian, et al.
Published: (2025)
Logically Consistent Language Models via Neuro-Symbolic Integration
by: Calanzone, Diego, et al.
Published: (2024)
by: Calanzone, Diego, et al.
Published: (2024)
Do Influence Functions Work on Large Language Models?
by: Li, Zhe, et al.
Published: (2024)
by: Li, Zhe, et al.
Published: (2024)
Backdoor Token Unlearning: Exposing and Defending Backdoors in Pretrained Language Models
by: Jiang, Peihai, et al.
Published: (2025)
by: Jiang, Peihai, et al.
Published: (2025)
DReSS: Data-driven Regularized Structured Streamlining for Large Language Models
by: Feng, Mingkuan, et al.
Published: (2025)
by: Feng, Mingkuan, et al.
Published: (2025)
Towards Consistent Natural-Language Explanations via Explanation-Consistency Finetuning
by: Chen, Yanda, et al.
Published: (2024)
by: Chen, Yanda, et al.
Published: (2024)
Pre-training Limited Memory Language Models with Internal and External Knowledge
by: Zhao, Linxi, et al.
Published: (2025)
by: Zhao, Linxi, et al.
Published: (2025)
UniGuardian: A Unified Defense for Detecting Prompt Injection, Backdoor Attacks and Adversarial Attacks in Large Language Models
by: Lin, Huawei, et al.
Published: (2025)
by: Lin, Huawei, et al.
Published: (2025)
FAS: Fast ANN-SNN Conversion for Spiking Large Language Models
by: Chen, Long, et al.
Published: (2025)
by: Chen, Long, et al.
Published: (2025)
Mitigating Backdoor Threats to Large Language Models: Advancement and Challenges
by: Liu, Qin, et al.
Published: (2024)
by: Liu, Qin, et al.
Published: (2024)
Consistency Checks for Language Model Forecasters
by: Paleka, Daniel, et al.
Published: (2024)
by: Paleka, Daniel, et al.
Published: (2024)
LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models
by: Nguyen, Nam V., et al.
Published: (2024)
by: Nguyen, Nam V., et al.
Published: (2024)
ReLearn: Unlearning via Learning for Large Language Models
by: Xu, Haoming, et al.
Published: (2025)
by: Xu, Haoming, et al.
Published: (2025)
Unveiling and Addressing Pseudo Forgetting in Large Language Models
by: Sun, Huashan, et al.
Published: (2024)
by: Sun, Huashan, et al.
Published: (2024)
$π^2$: Structure-Originated Reasoning Data Improves Long-Context Reasoning Ability of Large Language Models
by: Do, Quyet V., et al.
Published: (2026)
by: Do, Quyet V., et al.
Published: (2026)
Similar Items
-
Layerwise Convergence Fingerprints for Runtime Misbehavior Detection in Large Language Models
by: Min, Nay Myat, et al.
Published: (2026) -
CORVUS: Red-Teaming Hallucination Detectors via Internal Signal Camouflage in Large Language Models
by: Min, Nay Myat, et al.
Published: (2026) -
Propaganda AI: An Analysis of Semantic Divergence in Large Language Models
by: Min, Nay Myat, et al.
Published: (2025) -
Unified Neural Backdoor Removal with Only Few Clean Samples through Unlearning and Relearning
by: Min, Nay Myat, et al.
Published: (2024) -
AutoBackdoor: Automating Backdoor Attacks via LLM Agents
by: Li, Yige, et al.
Published: (2025)