Saved in:
| Main Authors: | Haider, Muhammad Umair, Rizwan, Hammad, Sajjad, Hassan, Ju, Peizhong, Siddique, A. B. |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.06809 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Multi-Granular Node Pruning for Circuit Discovery
by: Haider, Muhammad Umair, et al.
Published: (2025)
by: Haider, Muhammad Umair, et al.
Published: (2025)
LangFIR: Discovering Sparse Language-Specific Features from Monolingual Data for Language Steering
by: Wong, Sing Hieng, et al.
Published: (2026)
by: Wong, Sing Hieng, et al.
Published: (2026)
NEAT: Concept driven Neuron Attribution in LLMs
by: Kavuri, Vivek Hruday, et al.
Published: (2025)
by: Kavuri, Vivek Hruday, et al.
Published: (2025)
Evaluating Sparse Autoencoders for Monosemantic Representation
by: Fereidouni, Moghis, et al.
Published: (2025)
by: Fereidouni, Moghis, et al.
Published: (2025)
Cross-Layer Discrete Concept Discovery for Interpreting Language Models
by: Garg, Ankur, et al.
Published: (2025)
by: Garg, Ankur, et al.
Published: (2025)
NeuronTune: Fine-Grained Neuron Modulation for Balanced Safety-Utility Alignment in LLMs
by: Pan, Birong, et al.
Published: (2025)
by: Pan, Birong, et al.
Published: (2025)
Confidence Regulation Neurons in Language Models
by: Stolfo, Alessandro, et al.
Published: (2024)
by: Stolfo, Alessandro, et al.
Published: (2024)
Interpreting the Effects of Quantization on LLMs
by: Singh, Manpreet, et al.
Published: (2025)
by: Singh, Manpreet, et al.
Published: (2025)
Quantifying the Capabilities of LLMs across Scale and Precision
by: Badshah, Sher, et al.
Published: (2024)
by: Badshah, Sher, et al.
Published: (2024)
CoSy: Evaluating Textual Explanations of Neurons
by: Kopf, Laura, et al.
Published: (2024)
by: Kopf, Laura, et al.
Published: (2024)
Decomposing Attention To Find Context-Sensitive Neurons
by: Gibson, Alex
Published: (2025)
by: Gibson, Alex
Published: (2025)
Universal Neurons in GPT2 Language Models
by: Gurnee, Wes, et al.
Published: (2024)
by: Gurnee, Wes, et al.
Published: (2024)
Finding Culture-Sensitive Neurons in Vision-Language Models
by: Zhao, Xiutian, et al.
Published: (2025)
by: Zhao, Xiutian, et al.
Published: (2025)
Language-specific Neurons Do Not Facilitate Cross-Lingual Transfer
by: Mondal, Soumen Kumar, et al.
Published: (2025)
by: Mondal, Soumen Kumar, et al.
Published: (2025)
Learnable Privacy Neurons Localization in Language Models
by: Chen, Ruizhe, et al.
Published: (2024)
by: Chen, Ruizhe, et al.
Published: (2024)
NeuroAda: Activating Each Neuron's Potential for Parameter-Efficient Fine-Tuning
by: Zhang, Zhi, et al.
Published: (2025)
by: Zhang, Zhi, et al.
Published: (2025)
Zeroth-Order Adaptive Neuron Alignment Based Pruning without Re-Training
by: Cunegatti, Elia, et al.
Published: (2024)
by: Cunegatti, Elia, et al.
Published: (2024)
A Single Neuron Is Sufficient to Bypass Safety Alignment in Large Language Models
by: Kazemi, Hamid, et al.
Published: (2026)
by: Kazemi, Hamid, et al.
Published: (2026)
Mitigating Biases for Instruction-following Language Models via Bias Neurons Elimination
by: Yang, Nakyeong, et al.
Published: (2023)
by: Yang, Nakyeong, et al.
Published: (2023)
SPIN: Sparsifying and Integrating Internal Neurons in Large Language Models for Text Classification
by: Jiao, Difan, et al.
Published: (2023)
by: Jiao, Difan, et al.
Published: (2023)
Towards Understanding Safety Alignment: A Mechanistic Perspective from Safety Neurons
by: Chen, Jianhui, et al.
Published: (2024)
by: Chen, Jianhui, et al.
Published: (2024)
Polysemy of Synthetic Neurons Towards a New Type of Explanatory Categorical Vector Spaces
by: Pichat, Michael, et al.
Published: (2025)
by: Pichat, Michael, et al.
Published: (2025)
The Transfer Neurons Hypothesis: An Underlying Mechanism for Language Latent Space Transitions in Multilingual LLMs
by: Tezuka, Hinata, et al.
Published: (2025)
by: Tezuka, Hinata, et al.
Published: (2025)
Classification of Safety Events at Nuclear Sites using Large Language Models
by: de Costa, Mishca, et al.
Published: (2024)
by: de Costa, Mishca, et al.
Published: (2024)
Do Neurons Dream of Primitive Operators? Wake-Sleep Compression Rediscovers Schank's Event Semantics
by: Balogh, Peter
Published: (2026)
by: Balogh, Peter
Published: (2026)
Query Attribute Modeling: Improving search relevance with Semantic Search and Meta Data Filtering
by: Menon, Karthik, et al.
Published: (2025)
by: Menon, Karthik, et al.
Published: (2025)
Discrete Flow Matching for Offline-to-Online Reinforcement Learning
by: Khan, Fairoz Nower, et al.
Published: (2026)
by: Khan, Fairoz Nower, et al.
Published: (2026)
Steering Without Breaking: Mechanistically Informed Interventions for Discrete Diffusion Language Models
by: Zhou, Hanhan, et al.
Published: (2026)
by: Zhou, Hanhan, et al.
Published: (2026)
Qalb: Largest State-of-the-Art Urdu Large Language Model for 230M Speakers with Systematic Continued Pre-training
by: Hassan, Muhammad Taimoor, et al.
Published: (2026)
by: Hassan, Muhammad Taimoor, et al.
Published: (2026)
Cite Pretrain: Retrieval-Free Knowledge Attribution for Large Language Models
by: Huang, Yukun, et al.
Published: (2025)
by: Huang, Yukun, et al.
Published: (2025)
Let Models Speak Ciphers: Multiagent Debate through Embeddings
by: Pham, Chau, et al.
Published: (2023)
by: Pham, Chau, et al.
Published: (2023)
TokDrift: When LLM Speaks in Subwords but Code Speaks in Grammar
by: Li, Yinxi, et al.
Published: (2025)
by: Li, Yinxi, et al.
Published: (2025)
LEGAL-UQA: A Low-Resource Urdu-English Dataset for Legal Question Answering
by: Faisal, Faizan, et al.
Published: (2024)
by: Faisal, Faizan, et al.
Published: (2024)
AttributionBench: How Hard is Automatic Attribution Evaluation?
by: Li, Yifei, et al.
Published: (2024)
by: Li, Yifei, et al.
Published: (2024)
Learn Globally, Speak Locally: Bridging the Gaps in Multilingual Reasoning
by: Hwang, Jaedong, et al.
Published: (2025)
by: Hwang, Jaedong, et al.
Published: (2025)
Graphically Speaking: Unmasking Abuse in Social Media with Conversation Insights
by: Nouri, Célia, et al.
Published: (2025)
by: Nouri, Célia, et al.
Published: (2025)
Attention Speaks Volumes: Localizing and Mitigating Bias in Language Models
by: Adiga, Rishabh, et al.
Published: (2024)
by: Adiga, Rishabh, et al.
Published: (2024)
Planner and Executor: Collaboration between Discrete Diffusion And Autoregressive Models in Reasoning
by: Berrayana, Lina, et al.
Published: (2025)
by: Berrayana, Lina, et al.
Published: (2025)
CoVerRL: Breaking the Consensus Trap in Label-Free Reasoning via Generator-Verifier Co-Evolution
by: Pan, Teng, et al.
Published: (2026)
by: Pan, Teng, et al.
Published: (2026)
Neuron-Level Knowledge Attribution in Large Language Models
by: Yu, Zeping, et al.
Published: (2023)
by: Yu, Zeping, et al.
Published: (2023)
Similar Items
-
Multi-Granular Node Pruning for Circuit Discovery
by: Haider, Muhammad Umair, et al.
Published: (2025) -
LangFIR: Discovering Sparse Language-Specific Features from Monolingual Data for Language Steering
by: Wong, Sing Hieng, et al.
Published: (2026) -
NEAT: Concept driven Neuron Attribution in LLMs
by: Kavuri, Vivek Hruday, et al.
Published: (2025) -
Evaluating Sparse Autoencoders for Monosemantic Representation
by: Fereidouni, Moghis, et al.
Published: (2025) -
Cross-Layer Discrete Concept Discovery for Interpreting Language Models
by: Garg, Ankur, et al.
Published: (2025)