:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ahmad, Syed Talal, Lu, Haohui, Liu, Sidong, Lau, Annie, Beheshti, Amin, Dras, Mark, Naseem, Usman
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2503.09103
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

VITAL: A New Dataset for Benchmarking Pluralistic Alignment in Healthcare
by: Shetty, Anudeex, et al.
Published: (2025)

Agentic Moderation: Multi-Agent Design for Safer Vision-Language Models
by: Ren, Juan, et al.
Published: (2025)

Beyond the Black Box: Demystifying Multi-Turn LLM Reasoning with VISTA
by: Zhang, Yiran, et al.
Published: (2025)

Steering Over-refusals Towards Safety in Retrieval Augmented Generation
by: Maskey, Utsav, et al.
Published: (2025)

Should LLM Safety Be More Than Refusing Harmful Instructions?
by: Maskey, Utsav, et al.
Published: (2025)

PersoDPO: Scalable Preference Optimization for Instruction-Adherent, Persona-Grounded Dialogue via Multi-LLM Evaluation
by: Afzoon, Saleh, et al.
Published: (2026)

PersoPilot: An Adaptive AI-Copilot for Transparent Contextualized Persona Classification and Personalized Response Generation
by: Afzoon, Saleh, et al.
Published: (2026)

CogMem: A Cognitive Memory Architecture for Sustained Multi-Turn Reasoning in Large Language Models
by: Zhang, Yiran, et al.
Published: (2025)

From Generation to Detection: A Multimodal Multi-Task Dataset for Benchmarking Health Misinformation
by: Zhang, Zhihao, et al.
Published: (2025)

PersoBench: Benchmarking Personalized Response Generation in Large Language Models
by: Afzoon, Saleh, et al.
Published: (2024)

Steering Towards Fairness: Mitigating Political Bias in LLMs
by: Nadeem, Afrozah, et al.
Published: (2025)

SHIELD: Classifier-Guided Prompting for Robust and Safer LVLMs
by: Ren, Juan, et al.
Published: (2025)

Fairness Evaluation and Inference Level Mitigation in LLMs
by: Nadeem, Afrozah, et al.
Published: (2025)

Framing Political Bias in Multilingual LLMs Across Pakistani Languages
by: Nadeem, Afrozah, et al.
Published: (2025)

Seeing the Threat: Vulnerabilities in Vision-Language Models to Adversarial Attack
by: Ren, Juan, et al.
Published: (2025)

Over-Refusal and Representation Subspaces: A Mechanistic Analysis of Task-Conditioned Refusal in Aligned LLMs
by: Maskey, Utsav, et al.
Published: (2026)

We Think, Therefore We Align LLMs to Helpful, Harmless and Honest Before They Go Wrong
by: Kashyap, Gautam Siddharth, et al.
Published: (2025)

AlignCultura: Towards Culturally Aligned Large Language Models?
by: Kashyap, Gautam Siddharth, et al.
Published: (2026)

Too Helpful, Too Harmless, Too Honest or Just Right?
by: Kashyap, Gautam Siddharth, et al.
Published: (2025)

When the Model Said 'No Comment', We Knew Helpfulness Was Dead, Honesty Was Alive, and Safety Was Terrified
by: Kashyap, Gautam Siddharth, et al.
Published: (2026)

LLM-Based Multi-Task Bangla Hate Speech Detection: Type, Severity, and Target
by: Hasan, Md Arid, et al.
Published: (2025)

WhatsApp Vaccine Discourse (WhaVax): An Expert-Annotated Dataset and Benchmark for Health Misinformation Detection
by: Santos, Jônatas H. dos, et al.
Published: (2026)

SafeConstellations: Mitigating Over-Refusals in LLMs Through Task-Aware Representation Steering
by: Maskey, Utsav, et al.
Published: (2025)

Simulating Influence Dynamics with LLM Agents
by: Nasim, Mehwish, et al.
Published: (2025)

A Survey on Progress in LLM Alignment from the Perspective of Reward Design
by: Ji, Miaomiao, et al.
Published: (2025)

Modeling and Optimizing User Preferences in AI Copilots: A Comprehensive Survey and Taxonomy
by: Afzoon, Saleh, et al.
Published: (2025)

Competing LLM Agents in a Non-Cooperative Game of Opinion Polarisation
by: Qasmi, Amin, et al.
Published: (2025)

MRGAgents: A Multi-Agent Framework for Improved Medical Report Generation with Med-LVLMs
by: Wang, Pengyu, et al.
Published: (2025)

AI-VaxGuide: An Agentic RAG-Based LLM for Vaccination Decisions
by: Zeggai, Abdellah, et al.
Published: (2025)

MultiHateClip: A Multilingual Benchmark Dataset for Hateful Video Detection on YouTube and Bilibili
by: Wang, Han, et al.
Published: (2024)

Revealing the Truth with ConLLM for Detecting Multi-Modal Deepfakes
by: Kashyap, Gautam Siddharth, et al.
Published: (2026)

VaxPulse: Active Global Vaccine Infodemic Risk Assessment
by: Dimaguila, Gerardo Luis, et al.
Published: (2025)

ChildGuard: A Specialized Dataset for Combatting Child-Targeted Hate Speech
by: Kashyap, Gautam Siddharth, et al.
Published: (2025)

ViMGuard: A Novel Multi-Modal System for Video Misinformation Guarding
by: Kan, Andrew, et al.
Published: (2024)

MSynFD: Multi-hop Syntax aware Fake News Detection
by: Xiao, Liang, et al.
Published: (2024)

UXR Point of View on Product Feature Prioritization Prior To Multi-Million Engineering Commitments
by: Lau, Jonas, et al.
Published: (2025)

Can LLM-Generated Misinformation Be Detected?
by: Chen, Canyu, et al.
Published: (2023)

Myanmar XNLI: Building a Dataset and Exploring Low-resource Approaches to Natural Language Inference with Myanmar
by: Htet, Aung Kyaw, et al.
Published: (2025)

Natural Language-Oriented Programming (NLOP): Towards Democratizing Software Creation
by: Beheshti, Amin
Published: (2024)

Mechanistic Interpretability for Large Language Model Alignment: Progress, Challenges, and Future Directions
by: Naseem, Usman
Published: (2026)