Saved in:
| Main Authors: | Rath, Prasanjit, Shrawgi, Hari, Agrawal, Parag, Dandapat, Sandipan |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.12552 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SAGE: A Generic Framework for LLM Safety Evaluation
by: Jindal, Madhur, et al.
Published: (2025)
by: Jindal, Madhur, et al.
Published: (2025)
Navigating the Cultural Kaleidoscope: A Hitchhiker's Guide to Sensitivity in Large Language Models
by: Banerjee, Somnath, et al.
Published: (2024)
by: Banerjee, Somnath, et al.
Published: (2024)
Socio-Culturally Aware Evaluation Framework for LLM-Based Content Moderation
by: Kumar, Shanu, et al.
Published: (2024)
by: Kumar, Shanu, et al.
Published: (2024)
Enhancing Zero-shot Chain of Thought Prompting via Uncertainty-Guided Strategy Selection
by: Kumar, Shanu, et al.
Published: (2024)
by: Kumar, Shanu, et al.
Published: (2024)
Litmus (Re)Agent: A Benchmark and Agentic System for Predictive Evaluation of Multilingual Models
by: Mittal, Avni, et al.
Published: (2026)
by: Mittal, Avni, et al.
Published: (2026)
Rethinking Tokenization for Rich Morphology: The Dominance of Unigram over BPE and Morphological Alignment
by: Vemula, Saketh Reddy, et al.
Published: (2025)
by: Vemula, Saketh Reddy, et al.
Published: (2025)
Harnessing Large Language Models for Mental Health: Opportunities, Challenges, and Ethical Considerations
by: Pandey, Hari Mohan
Published: (2024)
by: Pandey, Hari Mohan
Published: (2024)
Exploring Disparity-Accuracy Trade-offs in Face Recognition Systems: The Role of Datasets, Architectures, and Loss Functions
by: Jaiswal, Siddharth D, et al.
Published: (2025)
by: Jaiswal, Siddharth D, et al.
Published: (2025)
Wide Reflective Equilibrium in LLM Alignment: Bridging Moral Epistemology and AI Safety
by: Brophy, Matthew
Published: (2025)
by: Brophy, Matthew
Published: (2025)
CR4T: Rewrite-Based Guardrails for Adolescent LLM Safety
by: An, Heajun, et al.
Published: (2026)
by: An, Heajun, et al.
Published: (2026)
LLM Safety Alignment is Divergence Estimation in Disguise
by: Haldar, Rajdeep, et al.
Published: (2025)
by: Haldar, Rajdeep, et al.
Published: (2025)
Unmasking the Canvas: A Dynamic Benchmark for Image Generation Jailbreaking and LLM Content Safety
by: Nair, Variath Madhupal Gautham, et al.
Published: (2025)
by: Nair, Variath Madhupal Gautham, et al.
Published: (2025)
AI Safety is Stuck in Technical Terms -- A System Safety Response to the International AI Safety Report
by: Dobbe, Roel
Published: (2025)
by: Dobbe, Roel
Published: (2025)
Safety Cases: A Scalable Approach to Frontier AI Safety
by: Hilton, Benjamin, et al.
Published: (2025)
by: Hilton, Benjamin, et al.
Published: (2025)
Safety Cases: How to Justify the Safety of Advanced AI Systems
by: Clymer, Joshua, et al.
Published: (2024)
by: Clymer, Joshua, et al.
Published: (2024)
MindCraft: Revolutionizing Education through AI-Powered Personalized Learning and Mentorship for Rural India
by: Bardia, Arihant, et al.
Published: (2025)
by: Bardia, Arihant, et al.
Published: (2025)
Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey
by: Dong, Zhichen, et al.
Published: (2024)
by: Dong, Zhichen, et al.
Published: (2024)
International Agreements on AI Safety: Review and Recommendations for a Conditional AI Safety Treaty
by: Scholefield, Rebecca, et al.
Published: (2025)
by: Scholefield, Rebecca, et al.
Published: (2025)
Toxic HallucinAItions: Perturbing Prompts and Tracing LLM Circuits
by: Shimgekar, Soorya Ram, et al.
Published: (2026)
by: Shimgekar, Soorya Ram, et al.
Published: (2026)
LSSF: Safety Alignment for Large Language Models through Low-Rank Safety Subspace Fusion
by: Zhou, Guanghao, et al.
Published: (2026)
by: Zhou, Guanghao, et al.
Published: (2026)
Present and Future of AI in Renewable Energy Domain : A Comprehensive Survey
by: Rashid, Abdur, et al.
Published: (2024)
by: Rashid, Abdur, et al.
Published: (2024)
Toward an African Agenda for AI Safety
by: Segun, Samuel T., et al.
Published: (2025)
by: Segun, Samuel T., et al.
Published: (2025)
Concrete Problems in AI Safety, Revisited
by: Raji, Inioluwa Deborah, et al.
Published: (2023)
by: Raji, Inioluwa Deborah, et al.
Published: (2023)
Emerging Practices in Frontier AI Safety Frameworks
by: Buhl, Marie Davidsen, et al.
Published: (2025)
by: Buhl, Marie Davidsen, et al.
Published: (2025)
AI Safety: Necessary, but insufficient and possibly problematic
by: P, Deepak
Published: (2024)
by: P, Deepak
Published: (2024)
How Alignment and Jailbreak Work: Explain LLM Safety through Intermediate Hidden States
by: Zhou, Zhenhong, et al.
Published: (2024)
by: Zhou, Zhenhong, et al.
Published: (2024)
Combining Cost-Constrained Runtime Monitors for AI Safety
by: Hua, Tim Tian, et al.
Published: (2025)
by: Hua, Tim Tian, et al.
Published: (2025)
Building Effective Safety Guardrails in AI Education Tools
by: Clark, Hannah-Beth, et al.
Published: (2025)
by: Clark, Hannah-Beth, et al.
Published: (2025)
What Is AI Safety? What Do We Want It to Be?
by: Harding, Jacqueline, et al.
Published: (2025)
by: Harding, Jacqueline, et al.
Published: (2025)
The Singapore Consensus on Global AI Safety Research Priorities
by: Bengio, Yoshua, et al.
Published: (2025)
by: Bengio, Yoshua, et al.
Published: (2025)
Simple Role Assignment is Extraordinarily Effective for Safety Alignment
by: Ziheng, Zhou, et al.
Published: (2026)
by: Ziheng, Zhou, et al.
Published: (2026)
The Ghost in the Grammar: Methodological Anthropomorphism in AI Safety Evaluations
by: Costa, Mariana Lins
Published: (2026)
by: Costa, Mariana Lins
Published: (2026)
Upstream and Downstream AI Safety: Both on the Same River?
by: McDermid, John, et al.
Published: (2024)
by: McDermid, John, et al.
Published: (2024)
Taxonomy and Consistency Analysis of Safety Benchmarks for AI Agents
by: Li, Miles Q., et al.
Published: (2026)
by: Li, Miles Q., et al.
Published: (2026)
Agentic Microphysics: A Manifesto for Generative AI Safety
by: Pierucci, Federico, et al.
Published: (2026)
by: Pierucci, Federico, et al.
Published: (2026)
Probabilistic Analysis of Copyright Disputes and Generative AI Safety
by: Chiba-Okabe, Hiroaki
Published: (2024)
by: Chiba-Okabe, Hiroaki
Published: (2024)
Interoperability in AI Safety Governance: Ethics, Regulations, and Standards
by: Chin, Yik Chan, et al.
Published: (2026)
by: Chin, Yik Chan, et al.
Published: (2026)
Evaluating LLM Agent Adherence to Hierarchical Safety Principles: A Lightweight Benchmark for Probing Foundational Controllability Components
by: Potham, Ram
Published: (2025)
by: Potham, Ram
Published: (2025)
Invisible Orchestrators Suppress Protective Behavior and Dissociate Power-Holders: Safety Risks in Multi-Agent LLM Systems
by: Fukui, Hiroki
Published: (2026)
by: Fukui, Hiroki
Published: (2026)
Intelligent Approaches to Predictive Analytics in Occupational Health and Safety in India
by: Saxena, Ritwik Raj
Published: (2024)
by: Saxena, Ritwik Raj
Published: (2024)
Similar Items
-
SAGE: A Generic Framework for LLM Safety Evaluation
by: Jindal, Madhur, et al.
Published: (2025) -
Navigating the Cultural Kaleidoscope: A Hitchhiker's Guide to Sensitivity in Large Language Models
by: Banerjee, Somnath, et al.
Published: (2024) -
Socio-Culturally Aware Evaluation Framework for LLM-Based Content Moderation
by: Kumar, Shanu, et al.
Published: (2024) -
Enhancing Zero-shot Chain of Thought Prompting via Uncertainty-Guided Strategy Selection
by: Kumar, Shanu, et al.
Published: (2024) -
Litmus (Re)Agent: A Benchmark and Agentic System for Predictive Evaluation of Multilingual Models
by: Mittal, Avni, et al.
Published: (2026)