:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Rath, Prasanjit, Shrawgi, Hari, Agrawal, Parag, Dandapat, Sandipan
Format:	Preprint
Published:	2025
Subjects:	Computers and Society Artificial Intelligence
Online Access:	https://arxiv.org/abs/2502.12552
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

SAGE: A Generic Framework for LLM Safety Evaluation
by: Jindal, Madhur, et al.
Published: (2025)

Navigating the Cultural Kaleidoscope: A Hitchhiker's Guide to Sensitivity in Large Language Models
by: Banerjee, Somnath, et al.
Published: (2024)

Socio-Culturally Aware Evaluation Framework for LLM-Based Content Moderation
by: Kumar, Shanu, et al.
Published: (2024)

Enhancing Zero-shot Chain of Thought Prompting via Uncertainty-Guided Strategy Selection
by: Kumar, Shanu, et al.
Published: (2024)

Litmus (Re)Agent: A Benchmark and Agentic System for Predictive Evaluation of Multilingual Models
by: Mittal, Avni, et al.
Published: (2026)

Rethinking Tokenization for Rich Morphology: The Dominance of Unigram over BPE and Morphological Alignment
by: Vemula, Saketh Reddy, et al.
Published: (2025)

Harnessing Large Language Models for Mental Health: Opportunities, Challenges, and Ethical Considerations
by: Pandey, Hari Mohan
Published: (2024)

Exploring Disparity-Accuracy Trade-offs in Face Recognition Systems: The Role of Datasets, Architectures, and Loss Functions
by: Jaiswal, Siddharth D, et al.
Published: (2025)

Wide Reflective Equilibrium in LLM Alignment: Bridging Moral Epistemology and AI Safety
by: Brophy, Matthew
Published: (2025)

CR4T: Rewrite-Based Guardrails for Adolescent LLM Safety
by: An, Heajun, et al.
Published: (2026)

LLM Safety Alignment is Divergence Estimation in Disguise
by: Haldar, Rajdeep, et al.
Published: (2025)

Unmasking the Canvas: A Dynamic Benchmark for Image Generation Jailbreaking and LLM Content Safety
by: Nair, Variath Madhupal Gautham, et al.
Published: (2025)

AI Safety is Stuck in Technical Terms -- A System Safety Response to the International AI Safety Report
by: Dobbe, Roel
Published: (2025)

Safety Cases: A Scalable Approach to Frontier AI Safety
by: Hilton, Benjamin, et al.
Published: (2025)

Safety Cases: How to Justify the Safety of Advanced AI Systems
by: Clymer, Joshua, et al.
Published: (2024)

MindCraft: Revolutionizing Education through AI-Powered Personalized Learning and Mentorship for Rural India
by: Bardia, Arihant, et al.
Published: (2025)

Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey
by: Dong, Zhichen, et al.
Published: (2024)

International Agreements on AI Safety: Review and Recommendations for a Conditional AI Safety Treaty
by: Scholefield, Rebecca, et al.
Published: (2025)

Toxic HallucinAItions: Perturbing Prompts and Tracing LLM Circuits
by: Shimgekar, Soorya Ram, et al.
Published: (2026)

LSSF: Safety Alignment for Large Language Models through Low-Rank Safety Subspace Fusion
by: Zhou, Guanghao, et al.
Published: (2026)

Present and Future of AI in Renewable Energy Domain : A Comprehensive Survey
by: Rashid, Abdur, et al.
Published: (2024)

Toward an African Agenda for AI Safety
by: Segun, Samuel T., et al.
Published: (2025)

Concrete Problems in AI Safety, Revisited
by: Raji, Inioluwa Deborah, et al.
Published: (2023)

Emerging Practices in Frontier AI Safety Frameworks
by: Buhl, Marie Davidsen, et al.
Published: (2025)

AI Safety: Necessary, but insufficient and possibly problematic
by: P, Deepak
Published: (2024)

How Alignment and Jailbreak Work: Explain LLM Safety through Intermediate Hidden States
by: Zhou, Zhenhong, et al.
Published: (2024)

Combining Cost-Constrained Runtime Monitors for AI Safety
by: Hua, Tim Tian, et al.
Published: (2025)

Building Effective Safety Guardrails in AI Education Tools
by: Clark, Hannah-Beth, et al.
Published: (2025)

What Is AI Safety? What Do We Want It to Be?
by: Harding, Jacqueline, et al.
Published: (2025)

The Singapore Consensus on Global AI Safety Research Priorities
by: Bengio, Yoshua, et al.
Published: (2025)

Simple Role Assignment is Extraordinarily Effective for Safety Alignment
by: Ziheng, Zhou, et al.
Published: (2026)

The Ghost in the Grammar: Methodological Anthropomorphism in AI Safety Evaluations
by: Costa, Mariana Lins
Published: (2026)

Upstream and Downstream AI Safety: Both on the Same River?
by: McDermid, John, et al.
Published: (2024)

Taxonomy and Consistency Analysis of Safety Benchmarks for AI Agents
by: Li, Miles Q., et al.
Published: (2026)

Agentic Microphysics: A Manifesto for Generative AI Safety
by: Pierucci, Federico, et al.
Published: (2026)

Probabilistic Analysis of Copyright Disputes and Generative AI Safety
by: Chiba-Okabe, Hiroaki
Published: (2024)

Interoperability in AI Safety Governance: Ethics, Regulations, and Standards
by: Chin, Yik Chan, et al.
Published: (2026)

Evaluating LLM Agent Adherence to Hierarchical Safety Principles: A Lightweight Benchmark for Probing Foundational Controllability Components
by: Potham, Ram
Published: (2025)

Invisible Orchestrators Suppress Protective Behavior and Dissociate Power-Holders: Safety Risks in Multi-Agent LLM Systems
by: Fukui, Hiroki
Published: (2026)

Intelligent Approaches to Predictive Analytics in Occupational Health and Safety in India
by: Saxena, Ritwik Raj
Published: (2024)