Saved in:
| Main Authors: | Zhao, Mengnan, Zhang, Lihe, Yang, Xingyi, Zheng, Tianhang, Yin, Baocai |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2501.00054 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Unveiling the Backdoor Mechanism Hidden Behind Catastrophic Overfitting in Fast Adversarial Training
by: Zhao, Mengnan, et al.
Published: (2026)
by: Zhao, Mengnan, et al.
Published: (2026)
Separable Multi-Concept Erasure from Diffusion Models
by: Zhao, Mengnan, et al.
Published: (2024)
by: Zhao, Mengnan, et al.
Published: (2024)
Adversarial Training: A Survey
by: Zhao, Mengnan, et al.
Published: (2024)
by: Zhao, Mengnan, et al.
Published: (2024)
Reinforcement Learning with Rubric Anchors
by: Huang, Zenan, et al.
Published: (2025)
by: Huang, Zenan, et al.
Published: (2025)
Constraining Sequential Model Editing with Editing Anchor Compression
by: Xu, Hao-Xiang, et al.
Published: (2025)
by: Xu, Hao-Xiang, et al.
Published: (2025)
Laying Anchors: Semantically Priming Numerals in Language Modeling
by: Sharma, Mandar, et al.
Published: (2024)
by: Sharma, Mandar, et al.
Published: (2024)
Understanding Post-hoc Explainers: The Case of Anchors
by: Lopardo, Gianluigi, et al.
Published: (2023)
by: Lopardo, Gianluigi, et al.
Published: (2023)
ChatTraffic: Text-to-Traffic Generation via Diffusion Model
by: Zhang, Chengyang, et al.
Published: (2023)
by: Zhang, Chengyang, et al.
Published: (2023)
SAGE: Shaping Anchors for Guided Exploration in RLVR of LLMs
by: Lee, Chanuk, et al.
Published: (2026)
by: Lee, Chanuk, et al.
Published: (2026)
Thought Anchors: Which LLM Reasoning Steps Matter?
by: Bogdan, Paul C., et al.
Published: (2025)
by: Bogdan, Paul C., et al.
Published: (2025)
Thinking in Latents: Adaptive Anchor Refinement for Implicit Reasoning in LLMs
by: Sheshanarayana, Disha, et al.
Published: (2026)
by: Sheshanarayana, Disha, et al.
Published: (2026)
A Sea of Words: An In-Depth Analysis of Anchors for Text Data
by: Lopardo, Gianluigi, et al.
Published: (2022)
by: Lopardo, Gianluigi, et al.
Published: (2022)
Semantic Anchors in In-Context Learning: Why Small LLMs Cannot Flip Their Labels
by: Kumar, Anantha Padmanaban Krishna
Published: (2025)
by: Kumar, Anantha Padmanaban Krishna
Published: (2025)
Multi-Objective Large Language Model Unlearning
by: Pan, Zibin, et al.
Published: (2024)
by: Pan, Zibin, et al.
Published: (2024)
Language-Driven Anchors for Zero-Shot Adversarial Robustness
by: Li, Xiao, et al.
Published: (2023)
by: Li, Xiao, et al.
Published: (2023)
CoreUnlearn: Rethinking Concept Unlearning through Disentangled Component-Level Erasure in Text-guided Diffusion Models
by: Zhao, Mengnan, et al.
Published: (2026)
by: Zhao, Mengnan, et al.
Published: (2026)
Learn while Unlearn: An Iterative Unlearning Framework for Generative Language Models
by: Tang, Haoyu, et al.
Published: (2024)
by: Tang, Haoyu, et al.
Published: (2024)
Towards Robust Knowledge Unlearning: An Adversarial Framework for Assessing and Improving Unlearning Robustness in Large Language Models
by: Yuan, Hongbang, et al.
Published: (2024)
by: Yuan, Hongbang, et al.
Published: (2024)
From Anchors to Supervision: Memory-Graph Guided Corpus-Free Unlearning for Large Language Models
by: Li, Wenxuan, et al.
Published: (2026)
by: Li, Wenxuan, et al.
Published: (2026)
Beyond Single Concept Vector: Modeling Concept Subspace in LLMs with Gaussian Distribution
by: Zhao, Haiyan, et al.
Published: (2024)
by: Zhao, Haiyan, et al.
Published: (2024)
Exploring Multilingual Probing in Large Language Models: A Cross-Language Analysis
by: Li, Daoyang, et al.
Published: (2024)
by: Li, Daoyang, et al.
Published: (2024)
Fine-Grained Interpretation of Political Opinions in Large Language Models
by: Hu, Jingyu, et al.
Published: (2025)
by: Hu, Jingyu, et al.
Published: (2025)
Large Language Model Unlearning
by: Yao, Yuanshun, et al.
Published: (2023)
by: Yao, Yuanshun, et al.
Published: (2023)
ZeroUnlearn: Few-Shot Knowledge Unlearning in Large Language Models
by: Lin, Yujie, et al.
Published: (2026)
by: Lin, Yujie, et al.
Published: (2026)
Comparative Analysis of Demonstration Selection Algorithms for LLM In-Context Learning
by: Shu, Dong, et al.
Published: (2024)
by: Shu, Dong, et al.
Published: (2024)
Hierarchical Federated Unlearning for Large Language Models
by: Zhong, Yisheng, et al.
Published: (2025)
by: Zhong, Yisheng, et al.
Published: (2025)
An Adversarial Perspective on Machine Unlearning for AI Safety
by: Łucki, Jakub, et al.
Published: (2024)
by: Łucki, Jakub, et al.
Published: (2024)
SAIF: A Sparse Autoencoder Framework for Interpreting and Steering Instruction Following of Language Models
by: He, Zirui, et al.
Published: (2025)
by: He, Zirui, et al.
Published: (2025)
MAGIC: A Co-Evolving Attacker-Defender Adversarial Game for Robust LLM Safety
by: Wen, Xiaoyu, et al.
Published: (2026)
by: Wen, Xiaoyu, et al.
Published: (2026)
Geometric-disentangelment Unlearning
by: Zhou, Duo, et al.
Published: (2025)
by: Zhou, Duo, et al.
Published: (2025)
Offset Unlearning for Large Language Models
by: Huang, James Y., et al.
Published: (2024)
by: Huang, James Y., et al.
Published: (2024)
GUI-AIMA: Aligning Intrinsic Multimodal Attention with a Context Anchor for GUI Grounding
by: Zhou, Shijie, et al.
Published: (2025)
by: Zhou, Shijie, et al.
Published: (2025)
Catastrophic Overfitting: A Potential Blessing in Disguise
by: Zhao, Mengnan, et al.
Published: (2024)
by: Zhao, Mengnan, et al.
Published: (2024)
Robust LLM Unlearning with MUDMAN: Meta-Unlearning with Disruption Masking And Normalization
by: Sondej, Filip, et al.
Published: (2025)
by: Sondej, Filip, et al.
Published: (2025)
A Neuro-inspired Interpretation of Unlearning in Large Language Models through Sample-level Unlearning Difficulty
by: Feng, Xiaohua, et al.
Published: (2025)
by: Feng, Xiaohua, et al.
Published: (2025)
Unlearning Isn't Deletion: Investigating Reversibility of Machine Unlearning in LLMs
by: Xu, Xiaoyu, et al.
Published: (2025)
by: Xu, Xiaoyu, et al.
Published: (2025)
Large Language Model Unlearning via Embedding-Corrupted Prompts
by: Liu, Chris Yuhao, et al.
Published: (2024)
by: Liu, Chris Yuhao, et al.
Published: (2024)
Beyond Input Activations: Identifying Influential Latents by Gradient Sparse Autoencoders
by: Shu, Dong, et al.
Published: (2025)
by: Shu, Dong, et al.
Published: (2025)
A Survey on Sparse Autoencoders: Interpreting the Internal Mechanisms of Large Language Models
by: Shu, Dong, et al.
Published: (2025)
by: Shu, Dong, et al.
Published: (2025)
Rep2Text: Decoding Full Text from a Single LLM Token Representation
by: Zhao, Haiyan, et al.
Published: (2025)
by: Zhao, Haiyan, et al.
Published: (2025)
Similar Items
-
Unveiling the Backdoor Mechanism Hidden Behind Catastrophic Overfitting in Fast Adversarial Training
by: Zhao, Mengnan, et al.
Published: (2026) -
Separable Multi-Concept Erasure from Diffusion Models
by: Zhao, Mengnan, et al.
Published: (2024) -
Adversarial Training: A Survey
by: Zhao, Mengnan, et al.
Published: (2024) -
Reinforcement Learning with Rubric Anchors
by: Huang, Zenan, et al.
Published: (2025) -
Constraining Sequential Model Editing with Editing Anchor Compression
by: Xu, Hao-Xiang, et al.
Published: (2025)