:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Kumarage, Tharindu, Bauer, Lisa, Ma, Yao, Rosen, Dan, Guduri, Yashasvi Raghavendra, Rumshisky, Anna, Chang, Kai-Wei, Galstyan, Aram, Gupta, Rahul, Peris, Charith
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2604.22119
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

ARES: Adaptive Red-Teaming and End-to-End Repair of Policy-Reward System
by: Liang, Jiacheng, et al.
Published: (2026)

Towards Safety Reasoning in LLMs: AI-agentic Deliberation for Policy-embedded CoT Data Creation
by: Kumarage, Tharindu, et al.
Published: (2025)

Kaleidoscopic Teaming in Multi Agent Simulations
by: Mehrabi, Ninareh, et al.
Published: (2025)

Tree-of-Traversals: A Zero-Shot Reasoning Algorithm for Augmenting Black-box Language Models with Knowledge Graphs
by: Markowitz, Elan, et al.
Published: (2024)

K-Edit: Language Model Editing with Contextual Knowledge Awareness
by: Markowitz, Elan, et al.
Published: (2025)

On the steerability of large language models toward data-driven personas
by: Li, Junyi, et al.
Published: (2023)

Attribute Controlled Fine-tuning for Large Language Models: A Case Study on Detoxification
by: Meng, Tao, et al.
Published: (2024)

SWAN: Semantic Watermarking with Abstract Meaning Representation
by: Ye, Ziping, et al.
Published: (2026)

Data Advisor: Dynamic Data Curation for Safety Alignment of Large Language Models
by: Wang, Fei, et al.
Published: (2024)

Prompt Perturbation Consistency Learning for Robust Language Models
by: Qiang, Yao, et al.
Published: (2024)

Harnessing Artificial Intelligence to Combat Online Hate: Exploring the Challenges and Opportunities of Large Language Models in Hate Speech Detection
by: Kumarage, Tharindu, et al.
Published: (2024)

Evaluating Differentially Private Synthetic Data Generation in High-Stakes Domains
by: Ramesh, Krithika, et al.
Published: (2024)

Can Knowledge Graphs Reduce Hallucinations in LLMs? : A Survey
by: Agrawal, Garima, et al.
Published: (2023)

Mindful-RAG: A Study of Points of Failure in Retrieval Augmented Generation
by: Agrawal, Garima, et al.
Published: (2024)

Emergent Abilities in Reduced-Scale Generative Language Models
by: Muckatira, Sherin, et al.
Published: (2024)

Geometry over Density: Few-Shot Cross-Domain OOD Detection
by: Li, Shawn, et al.
Published: (2026)

Sustainable AI Training via Hardware-Software Co-Design on NVIDIA, AMD, and Emerging GPU Architectures
by: Makin, Yashasvi, et al.
Published: (2025)

RedditESS: A Mental Health Social Support Interaction Dataset -- Understanding Effective Social Support to Refine AI-Driven Support Tools
by: Alghamdi, Zeyad, et al.
Published: (2025)

Tokenization Matters: Navigating Data-Scarce Tokenization for Gender Inclusive Language Technologies
by: Ovalle, Anaelia, et al.
Published: (2023)

FLIRT: Feedback Loop In-context Red Teaming
by: Mehrabi, Ninareh, et al.
Published: (2023)

KG-LLM-Bench: A Scalable Benchmark for Evaluating LLM Reasoning on Textualized Knowledge Graphs
by: Markowitz, Elan, et al.
Published: (2025)

Cross-Platform Hate Speech Detection with Weakly Supervised Causal Disentanglement
by: Sheth, Paras, et al.
Published: (2024)

Making Sense Of Distributed Representations With Activation Spectroscopy
by: Reing, Kyle, et al.
Published: (2025)

Learning Morphisms with Gauss-Newton Approximation for Growing Networks
by: Lawton, Neal, et al.
Published: (2024)

Regularizing Calabi-Yau topological conformal field theories using cutoff heat kernels
by: Aulak, Yashasvi
Published: (2024)

Partial Federated Learning
by: Feng, Tiantian, et al.
Published: (2024)

Controlling the Risk of Corrupted Contexts for Language Models via Early-Exiting
by: Wynn, Andrea, et al.
Published: (2025)

NarrativeTime: Dense Temporal Annotation on a Timeline
by: Rogers, Anna, et al.
Published: (2019)

A Survey of AI-generated Text Forensic Systems: Detection, Attribution, and Characterization
by: Kumarage, Tharindu, et al.
Published: (2024)

The Impact of Depression, Anxiety, and Stress on Cognitive Conflict in University Students
by: Yashasvi Walia, et al.
Published: (2025)

Adaptive Video Understanding Agent: Enhancing efficiency with dynamic frame sampling and feedback-driven reasoning
by: Jeoung, Sullam, et al.
Published: (2024)

Can LLMs Improve Multimodal Fact-Checking by Asking Relevant Questions?
by: Beigi, Alimohammad, et al.
Published: (2024)

Deconstructing In-Context Learning: Understanding Prompts via Corruption
by: Shivagunde, Namrata, et al.
Published: (2024)

Scaling Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning
by: Lialin, Vladislav, et al.
Published: (2023)

A Pre-Training Analogue of Grokking in Language Models: Tracing Delayed Grammatical Generalization
by: Muckatira, Sherin, et al.
Published: (2026)

Beyond Perplexity: A Geometric and Spectral Study of Low-Rank Pre-Training
by: Shivagunde, Namrata, et al.
Published: (2026)

Unraveling circadian rhythms—computational insights into molecular mechanisms
by: Yashasvi Rao, et al.
Published: (2026)

Assessing Visual Privacy Risks in Multimodal AI: A Novel Taxonomy-Grounded Evaluation of Vision-Language Models
by: Tsaprazlis, Efthymios, et al.
Published: (2025)

Ontology-Aware RAG for Improved Question-Answering in Cybersecurity Education
by: Zhao, Chengshuai, et al.
Published: (2024)

Adversarial Déjà Vu: Jailbreak Dictionary Learning for Stronger Generalization to Unseen Attacks
by: Dabas, Mahavir, et al.
Published: (2025)