Saved in:
| Main Authors: | Shu, Dong, Zhao, Haiyan, Hu, Jingyu, Liu, Weiru, Payani, Ali, Cheng, Lu, Du, Mengnan |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2501.01346 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Fine-Grained Interpretation of Political Opinions in Large Language Models
by: Hu, Jingyu, et al.
Published: (2025)
by: Hu, Jingyu, et al.
Published: (2025)
Strategic Demonstration Selection for Improved Fairness in LLM In-Context Learning
by: Hu, Jingyu, et al.
Published: (2024)
by: Hu, Jingyu, et al.
Published: (2024)
Towards Uncovering How Large Language Model Works: An Explainability Perspective
by: Zhao, Haiyan, et al.
Published: (2024)
by: Zhao, Haiyan, et al.
Published: (2024)
Universal Activation Verbalizer: A Unified Framework for Cross-Model Activation Explanation
by: Zhao, Haiyan, et al.
Published: (2026)
by: Zhao, Haiyan, et al.
Published: (2026)
Beyond Single Concept Vector: Modeling Concept Subspace in LLMs with Gaussian Distribution
by: Zhao, Haiyan, et al.
Published: (2024)
by: Zhao, Haiyan, et al.
Published: (2024)
LogitTrace: Detecting Benchmark Contamination via Layerwise Logit Trajectories
by: He, Zirui, et al.
Published: (2025)
by: He, Zirui, et al.
Published: (2025)
SAIF: A Sparse Autoencoder Framework for Interpreting and Steering Instruction Following of Language Models
by: He, Zirui, et al.
Published: (2025)
by: He, Zirui, et al.
Published: (2025)
A Survey on Sparse Autoencoders: Interpreting the Internal Mechanisms of Large Language Models
by: Shu, Dong, et al.
Published: (2025)
by: Shu, Dong, et al.
Published: (2025)
SAE-SSV: Supervised Steering in Sparse Representation Spaces for Reliable Control of Language Models
by: He, Zirui, et al.
Published: (2025)
by: He, Zirui, et al.
Published: (2025)
Exploring Multilingual Probing in Large Language Models: A Cross-Language Analysis
by: Li, Daoyang, et al.
Published: (2024)
by: Li, Daoyang, et al.
Published: (2024)
Rep2Text: Decoding Full Text from a Single LLM Token Representation
by: Zhao, Haiyan, et al.
Published: (2025)
by: Zhao, Haiyan, et al.
Published: (2025)
The Impact of Reasoning Step Length on Large Language Models
by: Jin, Mingyu, et al.
Published: (2024)
by: Jin, Mingyu, et al.
Published: (2024)
MONICA: Real-Time Monitoring and Calibration of Chain-of-Thought Sycophancy in Large Reasoning Models
by: Hu, Jingyu, et al.
Published: (2025)
by: Hu, Jingyu, et al.
Published: (2025)
Beyond Input Activations: Identifying Influential Latents by Gradient Sparse Autoencoders
by: Shu, Dong, et al.
Published: (2025)
by: Shu, Dong, et al.
Published: (2025)
A Survey on Fairness in Large Language Models
by: Li, Yingji, et al.
Published: (2023)
by: Li, Yingji, et al.
Published: (2023)
Large Language Models Can Learn Temporal Reasoning
by: Xiong, Siheng, et al.
Published: (2024)
by: Xiong, Siheng, et al.
Published: (2024)
Language Ranker: A Metric for Quantifying LLM Performance Across High and Low-Resource Languages
by: Li, Zihao, et al.
Published: (2024)
by: Li, Zihao, et al.
Published: (2024)
NeuronScope: A Multi-Agent Framework for Explaining Polysemantic Neurons in Language Models
by: Liu, Weiqi, et al.
Published: (2026)
by: Liu, Weiqi, et al.
Published: (2026)
LawLLM: Law Large Language Model for the US Legal System
by: Shu, Dong, et al.
Published: (2024)
by: Shu, Dong, et al.
Published: (2024)
Comparative Analysis of Demonstration Selection Algorithms for LLM In-Context Learning
by: Shu, Dong, et al.
Published: (2024)
by: Shu, Dong, et al.
Published: (2024)
Knowledge Graph Large Language Model (KG-LLM) for Link Prediction
by: Shu, Dong, et al.
Published: (2024)
by: Shu, Dong, et al.
Published: (2024)
Denoising Concept Vectors with Sparse Autoencoders for Improved Language Model Steering
by: Zhao, Haiyan, et al.
Published: (2025)
by: Zhao, Haiyan, et al.
Published: (2025)
AALC: Large Language Model Efficient Reasoning via Adaptive Accuracy-Length Control
by: Li, Ruosen, et al.
Published: (2025)
by: Li, Ruosen, et al.
Published: (2025)
Are Aligned Large Language Models Still Misaligned?
by: Naseem, Usman, et al.
Published: (2026)
by: Naseem, Usman, et al.
Published: (2026)
Deliberate Reasoning in Language Models as Structure-Aware Planning with an Accurate World Model
by: Xiong, Siheng, et al.
Published: (2024)
by: Xiong, Siheng, et al.
Published: (2024)
Jailbreaking Large Language Models Through Alignment Vulnerabilities in Out-of-Distribution Settings
by: Huang, Yue, et al.
Published: (2024)
by: Huang, Yue, et al.
Published: (2024)
Usable XAI: 10 Strategies Towards Exploiting Explainability in the LLM Era
by: Wu, Xuansheng, et al.
Published: (2024)
by: Wu, Xuansheng, et al.
Published: (2024)
Emergent and Subliminal Misalignment Through the Lens of Data-Mediated Transfer
by: Askin, Baris, et al.
Published: (2026)
by: Askin, Baris, et al.
Published: (2026)
FinChart-Bench: Benchmarking Financial Chart Comprehension in Vision-Language Models
by: Shu, Dong, et al.
Published: (2025)
by: Shu, Dong, et al.
Published: (2025)
StakeBench: Evaluating Language Understanding Grounded in Market Commitment
by: Pei, Yunhua, et al.
Published: (2026)
by: Pei, Yunhua, et al.
Published: (2026)
Lens: Rethinking Multilingual Enhancement for Large Language Models
by: Zhao, Weixiang, et al.
Published: (2024)
by: Zhao, Weixiang, et al.
Published: (2024)
AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models
by: Wu, Yuhang, et al.
Published: (2024)
by: Wu, Yuhang, et al.
Published: (2024)
DBR: Divergence-Based Regularization for Debiasing Natural Language Understanding Models
by: Li, Zihao, et al.
Published: (2025)
by: Li, Zihao, et al.
Published: (2025)
A Survey of State of the Art Large Vision Language Models: Alignment, Benchmark, Evaluations and Challenges
by: Li, Zongxia, et al.
Published: (2025)
by: Li, Zongxia, et al.
Published: (2025)
A Survey on Multilingual Large Language Models: Corpora, Alignment, and Bias
by: Xu, Yuemei, et al.
Published: (2024)
by: Xu, Yuemei, et al.
Published: (2024)
Explainable and Interpretable Multimodal Large Language Models: A Comprehensive Survey
by: Dang, Yunkai, et al.
Published: (2024)
by: Dang, Yunkai, et al.
Published: (2024)
Mitigating Degree Bias Adaptively with Hard-to-Learn Nodes in Graph Contrastive Learning
by: Hu, Jingyu, et al.
Published: (2025)
by: Hu, Jingyu, et al.
Published: (2025)
Can Knowledge Editing Really Correct Hallucinations?
by: Huang, Baixiang, et al.
Published: (2024)
by: Huang, Baixiang, et al.
Published: (2024)
LMO-DP: Optimizing the Randomization Mechanism for Differentially Private Fine-Tuning (Large) Language Models
by: Yang, Qin, et al.
Published: (2024)
by: Yang, Qin, et al.
Published: (2024)
Enhancing Long Chain-of-Thought Reasoning through Multi-Path Plan Aggregation
by: Xiong, Siheng, et al.
Published: (2025)
by: Xiong, Siheng, et al.
Published: (2025)
Similar Items
-
Fine-Grained Interpretation of Political Opinions in Large Language Models
by: Hu, Jingyu, et al.
Published: (2025) -
Strategic Demonstration Selection for Improved Fairness in LLM In-Context Learning
by: Hu, Jingyu, et al.
Published: (2024) -
Towards Uncovering How Large Language Model Works: An Explainability Perspective
by: Zhao, Haiyan, et al.
Published: (2024) -
Universal Activation Verbalizer: A Unified Framework for Cross-Model Activation Explanation
by: Zhao, Haiyan, et al.
Published: (2026) -
Beyond Single Concept Vector: Modeling Concept Subspace in LLMs with Gaussian Distribution
by: Zhao, Haiyan, et al.
Published: (2024)