Saved in:
| Main Authors: | Abishethvarman, Vadivel, Chandna, Bhavik, Jalan, Pratik, Naseem, Usman |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.00973 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
ExtremeAIGC: Benchmarking LMM Vulnerability to AI-Generated Extremist Content
by: Chandna, Bhavik, et al.
Published: (2025)
by: Chandna, Bhavik, et al.
Published: (2025)
Dissecting Bias in LLMs: A Mechanistic Interpretability Perspective
by: Chandna, Bhavik, et al.
Published: (2025)
by: Chandna, Bhavik, et al.
Published: (2025)
Mechanistic Interpretability for Large Language Model Alignment: Progress, Challenges, and Future Directions
by: Naseem, Usman
Published: (2026)
by: Naseem, Usman
Published: (2026)
Benchmarking Large Language Models for Cryptanalysis and Side-Channel Vulnerabilities
by: Maskey, Utsav, et al.
Published: (2025)
by: Maskey, Utsav, et al.
Published: (2025)
PersoBench: Benchmarking Personalized Response Generation in Large Language Models
by: Afzoon, Saleh, et al.
Published: (2024)
by: Afzoon, Saleh, et al.
Published: (2024)
From Native Memes to Global Moderation: Cross-Cultural Evaluation of Vision-Language Models for Hateful Meme Detection
by: Wang, Mo, et al.
Published: (2026)
by: Wang, Mo, et al.
Published: (2026)
TurnBench-MS: A Benchmark for Evaluating Multi-Turn, Multi-Step Reasoning in Large Language Models
by: Zhang, Yiran, et al.
Published: (2025)
by: Zhang, Yiran, et al.
Published: (2025)
Do Large Language Models Reflect Demographic Pluralism in Safety?
by: Naseem, Usman, et al.
Published: (2026)
by: Naseem, Usman, et al.
Published: (2026)
3DSPA: A 3D Semantic Point Autoencoder for Evaluating Video Realism
by: Chandna, Bhavik, et al.
Published: (2026)
by: Chandna, Bhavik, et al.
Published: (2026)
AlignCultura: Towards Culturally Aligned Large Language Models?
by: Kashyap, Gautam Siddharth, et al.
Published: (2026)
by: Kashyap, Gautam Siddharth, et al.
Published: (2026)
Should LLM Safety Be More Than Refusing Harmful Instructions?
by: Maskey, Utsav, et al.
Published: (2025)
by: Maskey, Utsav, et al.
Published: (2025)
Steering Over-refusals Towards Safety in Retrieval Augmented Generation
by: Maskey, Utsav, et al.
Published: (2025)
by: Maskey, Utsav, et al.
Published: (2025)
CogMem: A Cognitive Memory Architecture for Sustained Multi-Turn Reasoning in Large Language Models
by: Zhang, Yiran, et al.
Published: (2025)
by: Zhang, Yiran, et al.
Published: (2025)
When the Model Said 'No Comment', We Knew Helpfulness Was Dead, Honesty Was Alive, and Safety Was Terrified
by: Kashyap, Gautam Siddharth, et al.
Published: (2026)
by: Kashyap, Gautam Siddharth, et al.
Published: (2026)
Better to Ask in English: Evaluation of Large Language Models on English, Low-resource and Cross-Lingual Settings
by: Dey, Krishno, et al.
Published: (2024)
by: Dey, Krishno, et al.
Published: (2024)
Seeing the Threat: Vulnerabilities in Vision-Language Models to Adversarial Attack
by: Ren, Juan, et al.
Published: (2025)
by: Ren, Juan, et al.
Published: (2025)
DUAL-Bench: Measuring Over-Refusal and Robustness in Vision-Language Models
by: Ren, Kaixuan, et al.
Published: (2025)
by: Ren, Kaixuan, et al.
Published: (2025)
Do Personality Traits Interfere? Geometric Limitations of Steering in Large Language Models
by: Bhandari, Pranav, et al.
Published: (2026)
by: Bhandari, Pranav, et al.
Published: (2026)
Evaluating Personality Traits in Large Language Models: Insights from Psychological Questionnaires
by: Bhandari, Pranav, et al.
Published: (2025)
by: Bhandari, Pranav, et al.
Published: (2025)
Evaluating Multimodal Large Language Models on Educational Textbook Question Answering
by: Alawwad, Hessa A., et al.
Published: (2025)
by: Alawwad, Hessa A., et al.
Published: (2025)
A Counterfactual Explanation Framework for Retrieval Models
by: Chandna, Bhavik, et al.
Published: (2024)
by: Chandna, Bhavik, et al.
Published: (2024)
Fairness Evaluation and Inference Level Mitigation in LLMs
by: Nadeem, Afrozah, et al.
Published: (2025)
by: Nadeem, Afrozah, et al.
Published: (2025)
Reversal of Thought: Enhancing Large Language Models with Preference-Guided Reverse Reasoning Warm-up
by: Yuan, Jiahao, et al.
Published: (2024)
by: Yuan, Jiahao, et al.
Published: (2024)
Can Large Language Models Make Everyone Happy?
by: Naseem, Usman, et al.
Published: (2026)
by: Naseem, Usman, et al.
Published: (2026)
Framing Political Bias in Multilingual LLMs Across Pakistani Languages
by: Nadeem, Afrozah, et al.
Published: (2025)
by: Nadeem, Afrozah, et al.
Published: (2025)
Enhancing ESG Impact Type Identification through Early Fusion and Multilingual Models
by: Veeramani, Hariram, et al.
Published: (2024)
by: Veeramani, Hariram, et al.
Published: (2024)
VITAL: A New Dataset for Benchmarking Pluralistic Alignment in Healthcare
by: Shetty, Anudeex, et al.
Published: (2025)
by: Shetty, Anudeex, et al.
Published: (2025)
Are Aligned Large Language Models Still Misaligned?
by: Naseem, Usman, et al.
Published: (2026)
by: Naseem, Usman, et al.
Published: (2026)
Evaluating Hierarchical Clinical Document Classification Using Reasoning-Based LLMs
by: Mustafa, Akram, et al.
Published: (2025)
by: Mustafa, Akram, et al.
Published: (2025)
Do Large Language Models Speak All Languages Equally? A Comparative Study in Low-Resource Settings
by: Hasan, Md. Arid, et al.
Published: (2024)
by: Hasan, Md. Arid, et al.
Published: (2024)
Over-Refusal and Representation Subspaces: A Mechanistic Analysis of Task-Conditioned Refusal in Aligned LLMs
by: Maskey, Utsav, et al.
Published: (2026)
by: Maskey, Utsav, et al.
Published: (2026)
Are Large Language Models Economically Viable for Industry Deployment?
by: Mohammad, Abdullah, et al.
Published: (2026)
by: Mohammad, Abdullah, et al.
Published: (2026)
Bias Beyond Borders: Political Ideology Evaluation and Steering in Multilingual LLMs
by: Nadeem, Afrozah, et al.
Published: (2026)
by: Nadeem, Afrozah, et al.
Published: (2026)
ChineseSafe: A Chinese Benchmark for Evaluating Safety in Large Language Models
by: Zhang, Hengxiang, et al.
Published: (2024)
by: Zhang, Hengxiang, et al.
Published: (2024)
SHIELD: Classifier-Guided Prompting for Robust and Safer LVLMs
by: Ren, Juan, et al.
Published: (2025)
by: Ren, Juan, et al.
Published: (2025)
PersoPilot: An Adaptive AI-Copilot for Transparent Contextualized Persona Classification and Personalized Response Generation
by: Afzoon, Saleh, et al.
Published: (2026)
by: Afzoon, Saleh, et al.
Published: (2026)
MaiBERT: A Pre-training Corpus and Language Model for Low-Resourced Maithili Language
by: Yadav, Sumit, et al.
Published: (2025)
by: Yadav, Sumit, et al.
Published: (2025)
PersoDPO: Scalable Preference Optimization for Instruction-Adherent, Persona-Grounded Dialogue via Multi-LLM Evaluation
by: Afzoon, Saleh, et al.
Published: (2026)
by: Afzoon, Saleh, et al.
Published: (2026)
Steering Towards Fairness: Mitigating Political Bias in LLMs
by: Nadeem, Afrozah, et al.
Published: (2025)
by: Nadeem, Afrozah, et al.
Published: (2025)
We Think, Therefore We Align LLMs to Helpful, Harmless and Honest Before They Go Wrong
by: Kashyap, Gautam Siddharth, et al.
Published: (2025)
by: Kashyap, Gautam Siddharth, et al.
Published: (2025)
Similar Items
-
ExtremeAIGC: Benchmarking LMM Vulnerability to AI-Generated Extremist Content
by: Chandna, Bhavik, et al.
Published: (2025) -
Dissecting Bias in LLMs: A Mechanistic Interpretability Perspective
by: Chandna, Bhavik, et al.
Published: (2025) -
Mechanistic Interpretability for Large Language Model Alignment: Progress, Challenges, and Future Directions
by: Naseem, Usman
Published: (2026) -
Benchmarking Large Language Models for Cryptanalysis and Side-Channel Vulnerabilities
by: Maskey, Utsav, et al.
Published: (2025) -
PersoBench: Benchmarking Personalized Response Generation in Large Language Models
by: Afzoon, Saleh, et al.
Published: (2024)