:: Library Catalog

Copertina

Salvato in:

Dettagli Bibliografici
Autori principali:	Veeramani, Hariram, Thapa, Surendrabikram, Naseem, Usman
Natura:	Preprint
Pubblicazione:	2024
Soggetti:	Computation and Language
Accesso online:	https://arxiv.org/abs/2402.10772
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

Documenti analoghi

RDR: the Recap, Deliberate, and Respond Method for Enhanced Language Understanding
di: Zi, Yuxin, et al.
Pubblicazione: (2023)

Mechanistic Interpretability for Large Language Model Alignment: Progress, Challenges, and Future Directions
di: Naseem, Usman
Pubblicazione: (2026)

Framing Political Bias in Multilingual LLMs Across Pakistani Languages
di: Nadeem, Afrozah, et al.
Pubblicazione: (2025)

Bias Beyond Borders: Political Ideology Evaluation and Steering in Multilingual LLMs
di: Nadeem, Afrozah, et al.
Pubblicazione: (2026)

Benchmarking Large Language Models for Cryptanalysis and Side-Channel Vulnerabilities
di: Maskey, Utsav, et al.
Pubblicazione: (2025)

Seeing the Threat: Vulnerabilities in Vision-Language Models to Adversarial Attack
di: Ren, Juan, et al.
Pubblicazione: (2025)

DUAL-Bench: Measuring Over-Refusal and Robustness in Vision-Language Models
di: Ren, Kaixuan, et al.
Pubblicazione: (2025)

Can Reasoning LLMs Enhance Clinical Document Classification?
di: Mustafa, Akram, et al.
Pubblicazione: (2025)

AlignCultura: Towards Culturally Aligned Large Language Models?
di: Kashyap, Gautam Siddharth, et al.
Pubblicazione: (2026)

When the Model Said 'No Comment', We Knew Helpfulness Was Dead, Honesty Was Alive, and Safety Was Terrified
di: Kashyap, Gautam Siddharth, et al.
Pubblicazione: (2026)

SHIELD: Classifier-Guided Prompting for Robust and Safer LVLMs
di: Ren, Juan, et al.
Pubblicazione: (2025)

Should LLM Safety Be More Than Refusing Harmful Instructions?
di: Maskey, Utsav, et al.
Pubblicazione: (2025)

Steering Over-refusals Towards Safety in Retrieval Augmented Generation
di: Maskey, Utsav, et al.
Pubblicazione: (2025)

Over-Refusal and Representation Subspaces: A Mechanistic Analysis of Task-Conditioned Refusal in Aligned LLMs
di: Maskey, Utsav, et al.
Pubblicazione: (2026)

Self-Explaining Hate Speech Detection with Moral Rationales
di: Vargas, Francielle, et al.
Pubblicazione: (2026)

PersoBench: Benchmarking Personalized Response Generation in Large Language Models
di: Afzoon, Saleh, et al.
Pubblicazione: (2024)

Reversal of Thought: Enhancing Large Language Models with Preference-Guided Reverse Reasoning Warm-up
di: Yuan, Jiahao, et al.
Pubblicazione: (2024)

PersoPilot: An Adaptive AI-Copilot for Transparent Contextualized Persona Classification and Personalized Response Generation
di: Afzoon, Saleh, et al.
Pubblicazione: (2026)

XGUARD: A Graded Benchmark for Evaluating Safety Failures of Large Language Models on Extremist Content
di: Abishethvarman, Vadivel, et al.
Pubblicazione: (2025)

CogMem: A Cognitive Memory Architecture for Sustained Multi-Turn Reasoning in Large Language Models
di: Zhang, Yiran, et al.
Pubblicazione: (2025)

Steering Towards Fairness: Mitigating Political Bias in LLMs
di: Nadeem, Afrozah, et al.
Pubblicazione: (2025)

Fairness Evaluation and Inference Level Mitigation in LLMs
di: Nadeem, Afrozah, et al.
Pubblicazione: (2025)

ExtremeAIGC: Benchmarking LMM Vulnerability to AI-Generated Extremist Content
di: Chandna, Bhavik, et al.
Pubblicazione: (2025)

We Think, Therefore We Align LLMs to Helpful, Harmless and Honest Before They Go Wrong
di: Kashyap, Gautam Siddharth, et al.
Pubblicazione: (2025)

Too Helpful, Too Harmless, Too Honest or Just Right?
di: Kashyap, Gautam Siddharth, et al.
Pubblicazione: (2025)

Do Personality Traits Interfere? Geometric Limitations of Steering in Large Language Models
di: Bhandari, Pranav, et al.
Pubblicazione: (2026)

Beyond the Black Box: Demystifying Multi-Turn LLM Reasoning with VISTA
di: Zhang, Yiran, et al.
Pubblicazione: (2025)

SafeConstellations: Mitigating Over-Refusals in LLMs Through Task-Aware Representation Steering
di: Maskey, Utsav, et al.
Pubblicazione: (2025)

Evaluating Hierarchical Clinical Document Classification Using Reasoning-Based LLMs
di: Mustafa, Akram, et al.
Pubblicazione: (2025)

Enhancing textual textbook question answering with large language models and retrieval augmented generation
di: Alawwad, Hessa Abdulrahman, et al.
Pubblicazione: (2024)

Better to Ask in English: Evaluation of Large Language Models on English, Low-resource and Cross-Lingual Settings
di: Dey, Krishno, et al.
Pubblicazione: (2024)

VaxGuard: A Multi-Generator, Multi-Type, and Multi-Role Dataset for Detecting LLM-Generated Vaccine Misinformation
di: Ahmad, Syed Talal, et al.
Pubblicazione: (2025)

MRG-R1: Reinforcement Learning for Clinically Aligned Medical Report Generation
di: Wang, Pengyu, et al.
Pubblicazione: (2025)

PersoDPO: Scalable Preference Optimization for Instruction-Adherent, Persona-Grounded Dialogue via Multi-LLM Evaluation
di: Afzoon, Saleh, et al.
Pubblicazione: (2026)

SemEval-2026 Task 9: Detecting Multilingual, Multicultural and Multievent Online Polarization
di: Naseem, Usman, et al.
Pubblicazione: (2026)

TurnBench-MS: A Benchmark for Evaluating Multi-Turn, Multi-Step Reasoning in Large Language Models
di: Zhang, Yiran, et al.
Pubblicazione: (2025)

LVMed-R2: Perception and Reflection-driven Complex Reasoning for Medical Report Generation
di: Wang, Hao, et al.
Pubblicazione: (2025)

Modeling and Optimizing User Preferences in AI Copilots: A Comprehensive Survey and Taxonomy
di: Afzoon, Saleh, et al.
Pubblicazione: (2025)

VITAL: A New Dataset for Benchmarking Pluralistic Alignment in Healthcare
di: Shetty, Anudeex, et al.
Pubblicazione: (2025)

Can Large Language Models Make Everyone Happy?
di: Naseem, Usman, et al.
Pubblicazione: (2026)