Saved in:
| Main Authors: | Garg, Kartik, Mishra, Shourya, Sinha, Kartikeya, Singh, Ojaswi Pratap, Chopra, Ayush, Rai, Kanishk, Sheikh, Ammar, Maheshwari, Raghav, Chadha, Aman, Jain, Vinija, Das, Amitava |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.17937 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
TRACEALIGN -- Tracing the Drift: Attributing Alignment Failures to Training-Time Belief Sources in LLMs
by: Das, Amitava, et al.
Published: (2025)
by: Das, Amitava, et al.
Published: (2025)
AlignGuard-LoRA: Alignment-Preserving Fine-Tuning via Fisher-Guided Decomposition and Riemannian-Geodesic Collision Regularization
by: Das, Amitava, et al.
Published: (2025)
by: Das, Amitava, et al.
Published: (2025)
Guiding Vision-Language Model Selection for Visual Question-Answering Across Tasks, Domains, and Knowledge Types
by: Sinha, Neelabh, et al.
Published: (2024)
by: Sinha, Neelabh, et al.
Published: (2024)
Are Small Language Models Ready to Compete with Large Language Models for Practical Applications?
by: Sinha, Neelabh, et al.
Published: (2024)
by: Sinha, Neelabh, et al.
Published: (2024)
Stochastic CHAOS: Why Deterministic Inference Kills, and Distributional Variability Is the Heartbeat of Artifical Cognition
by: Joshi, Tanmay, et al.
Published: (2026)
by: Joshi, Tanmay, et al.
Published: (2026)
D-STEER - Preference Alignment Techniques Learn to Behave, not to Believe -- Beneath the Surface, DPO as Steering Vector Perturbation in Activation Space
by: Raina, Samarth, et al.
Published: (2025)
by: Raina, Samarth, et al.
Published: (2025)
AlignMerge - Alignment-Preserving Large Language Model Merging via Fisher-Guided Geometric Constraints
by: Roy, Aniruddha, et al.
Published: (2025)
by: Roy, Aniruddha, et al.
Published: (2025)
Neural FOXP2 -- Language Specific Neuron Steering for Targeted Language Improvement in LLMs
by: Saha, Anusa, et al.
Published: (2026)
by: Saha, Anusa, et al.
Published: (2026)
ECLIPTICA -- A Framework for Switchable LLM Alignment via CITA - Contrastive Instruction-Tuned Alignment
by: Wanaskar, Kapil, et al.
Published: (2026)
by: Wanaskar, Kapil, et al.
Published: (2026)
Refining Text-to-Image Generation: Towards Accurate Training-Free Glyph-Enhanced Image Generation
by: Lakhanpal, Sanyam, et al.
Published: (2024)
by: Lakhanpal, Sanyam, et al.
Published: (2024)
LLMsAgainstHate @ NLU of Devanagari Script Languages 2025: Hate Speech Detection and Target Identification in Devanagari Languages via Parameter Efficient Fine-Tuning of LLMs
by: Sidibomma, Rushendra, et al.
Published: (2024)
by: Sidibomma, Rushendra, et al.
Published: (2024)
MAAT: Multi-phase Adapter-Aware Targeted Unlearning
by: Yagnik, Suryash, et al.
Published: (2026)
by: Yagnik, Suryash, et al.
Published: (2026)
SPINAL -- Scaling-law and Preference Integration in Neural Alignment Layers
by: Das, Arion, et al.
Published: (2026)
by: Das, Arion, et al.
Published: (2026)
Peccavi: Visual Paraphrase Attack Safe and Distortion Free Image Watermarking Technique for AI-Generated Images
by: Dixit, Shreyas, et al.
Published: (2025)
by: Dixit, Shreyas, et al.
Published: (2025)
PermaFrost-Attack: Stealth Pretraining Seeding(SPS) for planting Logic Landmines During LLM Training
by: Kumar, Harsh, et al.
Published: (2026)
by: Kumar, Harsh, et al.
Published: (2026)
A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications
by: Sahoo, Pranab, et al.
Published: (2024)
by: Sahoo, Pranab, et al.
Published: (2024)
On the Relationship between Sentence Analogy Identification and Sentence Structure Encoding in Large Language Models
by: Wijesiriwardene, Thilini, et al.
Published: (2023)
by: Wijesiriwardene, Thilini, et al.
Published: (2023)
The What, Why, and How of Context Length Extension Techniques in Large Language Models -- A Detailed Survey
by: Pawar, Saurav, et al.
Published: (2024)
by: Pawar, Saurav, et al.
Published: (2024)
MENTIS: What Belief Changes Under Alignment? Measuring Multi-Scale Latent Torsion in Language Models
by: Saha, Partha Pratim, et al.
Published: (2026)
by: Saha, Partha Pratim, et al.
Published: (2026)
When Shallow Wins: Silent Failures and the Depth-Accuracy Paradox in Latent Reasoning
by: Sahoo, Subramanyam, et al.
Published: (2026)
by: Sahoo, Subramanyam, et al.
Published: (2026)
Reasoning or Rhetoric? An Empirical Analysis of Moral Reasoning Explanations in Large Language Models
by: Kasat, Aryan, et al.
Published: (2026)
by: Kasat, Aryan, et al.
Published: (2026)
Dial E for Ethical Enforcement: institutional VETO power as a governance primitive
by: Sahoo, Subramanyam, et al.
Published: (2026)
by: Sahoo, Subramanyam, et al.
Published: (2026)
The Reasoning Trap -- Logical Reasoning as a Mechanistic Pathway to Situational Awareness
by: Sahoo, Subramanyam, et al.
Published: (2026)
by: Sahoo, Subramanyam, et al.
Published: (2026)
Born With a Silver Spoon? Investigating Socioeconomic Bias in Large Language Models
by: Singh, Smriti, et al.
Published: (2024)
by: Singh, Smriti, et al.
Published: (2024)
SAHOO: Safeguarded Alignment for High-Order Optimization Objectives in Recursive Self-Improvement
by: Sahoo, Subramanyam, et al.
Published: (2026)
by: Sahoo, Subramanyam, et al.
Published: (2026)
I Can't Believe It's Not Robust: Catastrophic Collapse of Safety Classifiers under Embedding Drift
by: Sahoo, Subramanyam, et al.
Published: (2026)
by: Sahoo, Subramanyam, et al.
Published: (2026)
Position: The Complexity of Perfect AI Alignment -- Formalizing the RLHF Trilemma
by: Sahoo, Subramanyam, et al.
Published: (2025)
by: Sahoo, Subramanyam, et al.
Published: (2025)
Exploring the Impact of Large Language Models on Recommender Systems: An Extensive Review
by: Vats, Arpita, et al.
Published: (2024)
by: Vats, Arpita, et al.
Published: (2024)
AMBEDKAR-A Multi-level Bias Elimination through a Decoding Approach with Knowledge Augmentation for Robust Constitutional Alignment of Language Models
by: Mukhopadhyay, Snehasis, et al.
Published: (2025)
by: Mukhopadhyay, Snehasis, et al.
Published: (2025)
KnowledgePrompts: Exploring the Abilities of Large Language Models to Solve Proportional Analogies via Knowledge-Enhanced Prompting
by: Wijesiriwardene, Thilini, et al.
Published: (2024)
by: Wijesiriwardene, Thilini, et al.
Published: (2024)
YINYANG-ALIGN: Benchmarking Contradictory Objectives and Proposing Multi-Objective Optimization based DPO for Text-to-Image Alignment
by: Das, Amitava, et al.
Published: (2025)
by: Das, Amitava, et al.
Published: (2025)
SEPSIS: I Can Catch Your Lies -- A New Paradigm for Deception Detection
by: Rani, Anku, et al.
Published: (2023)
by: Rani, Anku, et al.
Published: (2023)
SPHERICAL KV: Angle-Domain Attention and Rate-Distortion Retention for Efficient Long-Context Inference
by: Chauhan, Anay, et al.
Published: (2026)
by: Chauhan, Anay, et al.
Published: (2026)
CONSCIENTIA: Can LLM Agents Learn to Strategize? Emergent Deception and Trust in a Multi-Agent NYC Simulation
by: Sinha, Aarush, et al.
Published: (2026)
by: Sinha, Aarush, et al.
Published: (2026)
Language Models Entangle Language and Culture
by: Jain, Shourya, et al.
Published: (2026)
by: Jain, Shourya, et al.
Published: (2026)
How Culturally Aware are Vision-Language Models?
by: Burda-Lassen, Olena, et al.
Published: (2024)
by: Burda-Lassen, Olena, et al.
Published: (2024)
SleepWalk: A Three-Tier Benchmark for Stress-Testing Instruction-Guided Vision-Language Navigation
by: Rawal, Niyati, et al.
Published: (2026)
by: Rawal, Niyati, et al.
Published: (2026)
A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models
by: Tonmoy, S. M Towhidul Islam, et al.
Published: (2024)
by: Tonmoy, S. M Towhidul Islam, et al.
Published: (2024)
AdversariaL attacK sAfety aLIgnment(ALKALI): Safeguarding LLMs through GRACE: Geometric Representation-Aware Contrastive Enhancement- Introducing Adversarial Vulnerability Quality Index (AVQI)
by: Khanna, Danush, et al.
Published: (2025)
by: Khanna, Danush, et al.
Published: (2025)
Overview of Factify5WQA: Fact Verification through 5W Question-Answering
by: Suresh, Suryavardan, et al.
Published: (2024)
by: Suresh, Suryavardan, et al.
Published: (2024)
Similar Items
-
TRACEALIGN -- Tracing the Drift: Attributing Alignment Failures to Training-Time Belief Sources in LLMs
by: Das, Amitava, et al.
Published: (2025) -
AlignGuard-LoRA: Alignment-Preserving Fine-Tuning via Fisher-Guided Decomposition and Riemannian-Geodesic Collision Regularization
by: Das, Amitava, et al.
Published: (2025) -
Guiding Vision-Language Model Selection for Visual Question-Answering Across Tasks, Domains, and Knowledge Types
by: Sinha, Neelabh, et al.
Published: (2024) -
Are Small Language Models Ready to Compete with Large Language Models for Practical Applications?
by: Sinha, Neelabh, et al.
Published: (2024) -
Stochastic CHAOS: Why Deterministic Inference Kills, and Distributional Variability Is the Heartbeat of Artifical Cognition
by: Joshi, Tanmay, et al.
Published: (2026)