Saved in:
| Main Authors: | Aggarwal, Yash, Gorti, Atmika, Jain, Vinija, Chadha, Aman, Thirunarayan, Krishnaprasad, Gaur, Manas |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.03217 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Unboxing Occupational Bias: Grounded Debiasing of LLMs with U.S. Labor Data
by: Gorti, Atmika, et al.
Published: (2024)
by: Gorti, Atmika, et al.
Published: (2024)
Mental Health Equity in LLMs: Leveraging Multi-Hop Question Answering to Detect Amplified and Silenced Perspectives
by: Haider, Batool, et al.
Published: (2025)
by: Haider, Batool, et al.
Published: (2025)
The Reasoning Trap -- Logical Reasoning as a Mechanistic Pathway to Situational Awareness
by: Sahoo, Subramanyam, et al.
Published: (2026)
by: Sahoo, Subramanyam, et al.
Published: (2026)
Born With a Silver Spoon? Investigating Socioeconomic Bias in Large Language Models
by: Singh, Smriti, et al.
Published: (2024)
by: Singh, Smriti, et al.
Published: (2024)
Dial E for Ethical Enforcement: institutional VETO power as a governance primitive
by: Sahoo, Subramanyam, et al.
Published: (2026)
by: Sahoo, Subramanyam, et al.
Published: (2026)
Flying Pigs, FaR and Beyond: Evaluating LLM Reasoning in Counterfactual Worlds
by: Joishy, Anish R, et al.
Published: (2025)
by: Joishy, Anish R, et al.
Published: (2025)
COBIAS: Assessing the Contextual Reliability of Bias Benchmarks for Language Models
by: Govil, Priyanshul, et al.
Published: (2024)
by: Govil, Priyanshul, et al.
Published: (2024)
TRACEALIGN -- Tracing the Drift: Attributing Alignment Failures to Training-Time Belief Sources in LLMs
by: Das, Amitava, et al.
Published: (2025)
by: Das, Amitava, et al.
Published: (2025)
Reasoning or Rhetoric? An Empirical Analysis of Moral Reasoning Explanations in Large Language Models
by: Kasat, Aryan, et al.
Published: (2026)
by: Kasat, Aryan, et al.
Published: (2026)
From Prejudice to Parity: A New Approach to Debiasing Large Language Model Word Embeddings
by: Rakshit, Aishik, et al.
Published: (2024)
by: Rakshit, Aishik, et al.
Published: (2024)
Guiding Vision-Language Model Selection for Visual Question-Answering Across Tasks, Domains, and Knowledge Types
by: Sinha, Neelabh, et al.
Published: (2024)
by: Sinha, Neelabh, et al.
Published: (2024)
Are Small Language Models Ready to Compete with Large Language Models for Practical Applications?
by: Sinha, Neelabh, et al.
Published: (2024)
by: Sinha, Neelabh, et al.
Published: (2024)
D-STEER - Preference Alignment Techniques Learn to Behave, not to Believe -- Beneath the Surface, DPO as Steering Vector Perturbation in Activation Space
by: Raina, Samarth, et al.
Published: (2025)
by: Raina, Samarth, et al.
Published: (2025)
Personality Shapes Gender Bias in Persona-Conditioned LLM Narratives Across English and Hindi: An Empirical Investigation
by: Kumar, Tanay, et al.
Published: (2026)
by: Kumar, Tanay, et al.
Published: (2026)
AlignGuard-LoRA: Alignment-Preserving Fine-Tuning via Fisher-Guided Decomposition and Riemannian-Geodesic Collision Regularization
by: Das, Amitava, et al.
Published: (2025)
by: Das, Amitava, et al.
Published: (2025)
MAAT: Multi-phase Adapter-Aware Targeted Unlearning
by: Yagnik, Suryash, et al.
Published: (2026)
by: Yagnik, Suryash, et al.
Published: (2026)
Neural FOXP2 -- Language Specific Neuron Steering for Targeted Language Improvement in LLMs
by: Saha, Anusa, et al.
Published: (2026)
by: Saha, Anusa, et al.
Published: (2026)
When Shallow Wins: Silent Failures and the Depth-Accuracy Paradox in Latent Reasoning
by: Sahoo, Subramanyam, et al.
Published: (2026)
by: Sahoo, Subramanyam, et al.
Published: (2026)
SAHOO: Safeguarded Alignment for High-Order Optimization Objectives in Recursive Self-Improvement
by: Sahoo, Subramanyam, et al.
Published: (2026)
by: Sahoo, Subramanyam, et al.
Published: (2026)
I Can't Believe It's Not Robust: Catastrophic Collapse of Safety Classifiers under Embedding Drift
by: Sahoo, Subramanyam, et al.
Published: (2026)
by: Sahoo, Subramanyam, et al.
Published: (2026)
Position: The Complexity of Perfect AI Alignment -- Formalizing the RLHF Trilemma
by: Sahoo, Subramanyam, et al.
Published: (2025)
by: Sahoo, Subramanyam, et al.
Published: (2025)
Exploring the Impact of Large Language Models on Recommender Systems: An Extensive Review
by: Vats, Arpita, et al.
Published: (2024)
by: Vats, Arpita, et al.
Published: (2024)
SleepWalk: A Three-Tier Benchmark for Stress-Testing Instruction-Guided Vision-Language Navigation
by: Rawal, Niyati, et al.
Published: (2026)
by: Rawal, Niyati, et al.
Published: (2026)
LLMsAgainstHate @ NLU of Devanagari Script Languages 2025: Hate Speech Detection and Target Identification in Devanagari Languages via Parameter Efficient Fine-Tuning of LLMs
by: Sidibomma, Rushendra, et al.
Published: (2024)
by: Sidibomma, Rushendra, et al.
Published: (2024)
Towards Robust Evaluation of Unlearning in LLMs via Data Transformations
by: Joshi, Abhinav, et al.
Published: (2024)
by: Joshi, Abhinav, et al.
Published: (2024)
Human-Readable Adversarial Prompts: An Investigation into LLM Vulnerabilities Using Situational Context
by: Das, Nilanjana, et al.
Published: (2024)
by: Das, Nilanjana, et al.
Published: (2024)
ECLIPTICA -- A Framework for Switchable LLM Alignment via CITA - Contrastive Instruction-Tuned Alignment
by: Wanaskar, Kapil, et al.
Published: (2026)
by: Wanaskar, Kapil, et al.
Published: (2026)
AlignMerge - Alignment-Preserving Large Language Model Merging via Fisher-Guided Geometric Constraints
by: Roy, Aniruddha, et al.
Published: (2025)
by: Roy, Aniruddha, et al.
Published: (2025)
How Culturally Aware are Vision-Language Models?
by: Burda-Lassen, Olena, et al.
Published: (2024)
by: Burda-Lassen, Olena, et al.
Published: (2024)
Mechanistic Steering of LLMs Reveals Layer-wise Feature Vulnerabilities in Adversarial Settings
by: Das, Nilanjana, et al.
Published: (2026)
by: Das, Nilanjana, et al.
Published: (2026)
IMRNNs: An Efficient Method for Interpretable Dense Retrieval via Embedding Modulation
by: Saxena, Yash, et al.
Published: (2026)
by: Saxena, Yash, et al.
Published: (2026)
Exploring the Frontier of Vision-Language Models: A Survey of Current Methodologies and Future Directions
by: Ghosh, Akash, et al.
Published: (2024)
by: Ghosh, Akash, et al.
Published: (2024)
A Comprehensive Survey of Accelerated Generation Techniques in Large Language Models
by: Khoshnoodi, Mahsa, et al.
Published: (2024)
by: Khoshnoodi, Mahsa, et al.
Published: (2024)
Assessing LLM Reliability on Temporally Recent Open-Domain Questions
by: Krishnappa, Pushwitha, et al.
Published: (2026)
by: Krishnappa, Pushwitha, et al.
Published: (2026)
Multilingual State Space Models for Structured Question Answering in Indic Languages
by: Vats, Arpita, et al.
Published: (2025)
by: Vats, Arpita, et al.
Published: (2025)
MOD-X: A Modular Open Decentralized eXchange Framework proposal for Heterogeneous Interoperable Artificial Intelligence Agents
by: Ioannides, Georgios, et al.
Published: (2025)
by: Ioannides, Georgios, et al.
Published: (2025)
Decoding the Diversity: A Review of the Indic AI Research Landscape
by: KJ, Sankalp, et al.
Published: (2024)
by: KJ, Sankalp, et al.
Published: (2024)
Refining Text-to-Image Generation: Towards Accurate Training-Free Glyph-Enhanced Image Generation
by: Lakhanpal, Sanyam, et al.
Published: (2024)
by: Lakhanpal, Sanyam, et al.
Published: (2024)
Are Language Models Sensitive to Morally Irrelevant Distractors?
by: Shaw, Andrew, et al.
Published: (2026)
by: Shaw, Andrew, et al.
Published: (2026)
Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models Aligned with Human Cognitive Principles
by: Budagam, Devichand, et al.
Published: (2024)
by: Budagam, Devichand, et al.
Published: (2024)
Similar Items
-
Unboxing Occupational Bias: Grounded Debiasing of LLMs with U.S. Labor Data
by: Gorti, Atmika, et al.
Published: (2024) -
Mental Health Equity in LLMs: Leveraging Multi-Hop Question Answering to Detect Amplified and Silenced Perspectives
by: Haider, Batool, et al.
Published: (2025) -
The Reasoning Trap -- Logical Reasoning as a Mechanistic Pathway to Situational Awareness
by: Sahoo, Subramanyam, et al.
Published: (2026) -
Born With a Silver Spoon? Investigating Socioeconomic Bias in Large Language Models
by: Singh, Smriti, et al.
Published: (2024) -
Dial E for Ethical Enforcement: institutional VETO power as a governance primitive
by: Sahoo, Subramanyam, et al.
Published: (2026)