Saved in:
| Main Authors: | Kumarage, Tharindu, Bauer, Lisa, Ma, Yao, Rosen, Dan, Guduri, Yashasvi Raghavendra, Rumshisky, Anna, Chang, Kai-Wei, Galstyan, Aram, Gupta, Rahul, Peris, Charith |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.22119 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
ARES: Adaptive Red-Teaming and End-to-End Repair of Policy-Reward System
by: Liang, Jiacheng, et al.
Published: (2026)
by: Liang, Jiacheng, et al.
Published: (2026)
Towards Safety Reasoning in LLMs: AI-agentic Deliberation for Policy-embedded CoT Data Creation
by: Kumarage, Tharindu, et al.
Published: (2025)
by: Kumarage, Tharindu, et al.
Published: (2025)
Kaleidoscopic Teaming in Multi Agent Simulations
by: Mehrabi, Ninareh, et al.
Published: (2025)
by: Mehrabi, Ninareh, et al.
Published: (2025)
Tree-of-Traversals: A Zero-Shot Reasoning Algorithm for Augmenting Black-box Language Models with Knowledge Graphs
by: Markowitz, Elan, et al.
Published: (2024)
by: Markowitz, Elan, et al.
Published: (2024)
K-Edit: Language Model Editing with Contextual Knowledge Awareness
by: Markowitz, Elan, et al.
Published: (2025)
by: Markowitz, Elan, et al.
Published: (2025)
On the steerability of large language models toward data-driven personas
by: Li, Junyi, et al.
Published: (2023)
by: Li, Junyi, et al.
Published: (2023)
Attribute Controlled Fine-tuning for Large Language Models: A Case Study on Detoxification
by: Meng, Tao, et al.
Published: (2024)
by: Meng, Tao, et al.
Published: (2024)
SWAN: Semantic Watermarking with Abstract Meaning Representation
by: Ye, Ziping, et al.
Published: (2026)
by: Ye, Ziping, et al.
Published: (2026)
Data Advisor: Dynamic Data Curation for Safety Alignment of Large Language Models
by: Wang, Fei, et al.
Published: (2024)
by: Wang, Fei, et al.
Published: (2024)
Prompt Perturbation Consistency Learning for Robust Language Models
by: Qiang, Yao, et al.
Published: (2024)
by: Qiang, Yao, et al.
Published: (2024)
Harnessing Artificial Intelligence to Combat Online Hate: Exploring the Challenges and Opportunities of Large Language Models in Hate Speech Detection
by: Kumarage, Tharindu, et al.
Published: (2024)
by: Kumarage, Tharindu, et al.
Published: (2024)
Evaluating Differentially Private Synthetic Data Generation in High-Stakes Domains
by: Ramesh, Krithika, et al.
Published: (2024)
by: Ramesh, Krithika, et al.
Published: (2024)
Can Knowledge Graphs Reduce Hallucinations in LLMs? : A Survey
by: Agrawal, Garima, et al.
Published: (2023)
by: Agrawal, Garima, et al.
Published: (2023)
Mindful-RAG: A Study of Points of Failure in Retrieval Augmented Generation
by: Agrawal, Garima, et al.
Published: (2024)
by: Agrawal, Garima, et al.
Published: (2024)
Emergent Abilities in Reduced-Scale Generative Language Models
by: Muckatira, Sherin, et al.
Published: (2024)
by: Muckatira, Sherin, et al.
Published: (2024)
Geometry over Density: Few-Shot Cross-Domain OOD Detection
by: Li, Shawn, et al.
Published: (2026)
by: Li, Shawn, et al.
Published: (2026)
Sustainable AI Training via Hardware-Software Co-Design on NVIDIA, AMD, and Emerging GPU Architectures
by: Makin, Yashasvi, et al.
Published: (2025)
by: Makin, Yashasvi, et al.
Published: (2025)
RedditESS: A Mental Health Social Support Interaction Dataset -- Understanding Effective Social Support to Refine AI-Driven Support Tools
by: Alghamdi, Zeyad, et al.
Published: (2025)
by: Alghamdi, Zeyad, et al.
Published: (2025)
Tokenization Matters: Navigating Data-Scarce Tokenization for Gender Inclusive Language Technologies
by: Ovalle, Anaelia, et al.
Published: (2023)
by: Ovalle, Anaelia, et al.
Published: (2023)
FLIRT: Feedback Loop In-context Red Teaming
by: Mehrabi, Ninareh, et al.
Published: (2023)
by: Mehrabi, Ninareh, et al.
Published: (2023)
KG-LLM-Bench: A Scalable Benchmark for Evaluating LLM Reasoning on Textualized Knowledge Graphs
by: Markowitz, Elan, et al.
Published: (2025)
by: Markowitz, Elan, et al.
Published: (2025)
Cross-Platform Hate Speech Detection with Weakly Supervised Causal Disentanglement
by: Sheth, Paras, et al.
Published: (2024)
by: Sheth, Paras, et al.
Published: (2024)
Making Sense Of Distributed Representations With Activation Spectroscopy
by: Reing, Kyle, et al.
Published: (2025)
by: Reing, Kyle, et al.
Published: (2025)
Learning Morphisms with Gauss-Newton Approximation for Growing Networks
by: Lawton, Neal, et al.
Published: (2024)
by: Lawton, Neal, et al.
Published: (2024)
Regularizing Calabi-Yau topological conformal field theories using cutoff heat kernels
by: Aulak, Yashasvi
Published: (2024)
by: Aulak, Yashasvi
Published: (2024)
Partial Federated Learning
by: Feng, Tiantian, et al.
Published: (2024)
by: Feng, Tiantian, et al.
Published: (2024)
Controlling the Risk of Corrupted Contexts for Language Models via Early-Exiting
by: Wynn, Andrea, et al.
Published: (2025)
by: Wynn, Andrea, et al.
Published: (2025)
NarrativeTime: Dense Temporal Annotation on a Timeline
by: Rogers, Anna, et al.
Published: (2019)
by: Rogers, Anna, et al.
Published: (2019)
A Survey of AI-generated Text Forensic Systems: Detection, Attribution, and Characterization
by: Kumarage, Tharindu, et al.
Published: (2024)
by: Kumarage, Tharindu, et al.
Published: (2024)
The Impact of Depression, Anxiety, and Stress on Cognitive Conflict in University Students
by: Yashasvi Walia, et al.
Published: (2025)
by: Yashasvi Walia, et al.
Published: (2025)
Adaptive Video Understanding Agent: Enhancing efficiency with dynamic frame sampling and feedback-driven reasoning
by: Jeoung, Sullam, et al.
Published: (2024)
by: Jeoung, Sullam, et al.
Published: (2024)
Can LLMs Improve Multimodal Fact-Checking by Asking Relevant Questions?
by: Beigi, Alimohammad, et al.
Published: (2024)
by: Beigi, Alimohammad, et al.
Published: (2024)
Deconstructing In-Context Learning: Understanding Prompts via Corruption
by: Shivagunde, Namrata, et al.
Published: (2024)
by: Shivagunde, Namrata, et al.
Published: (2024)
Scaling Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning
by: Lialin, Vladislav, et al.
Published: (2023)
by: Lialin, Vladislav, et al.
Published: (2023)
A Pre-Training Analogue of Grokking in Language Models: Tracing Delayed Grammatical Generalization
by: Muckatira, Sherin, et al.
Published: (2026)
by: Muckatira, Sherin, et al.
Published: (2026)
Beyond Perplexity: A Geometric and Spectral Study of Low-Rank Pre-Training
by: Shivagunde, Namrata, et al.
Published: (2026)
by: Shivagunde, Namrata, et al.
Published: (2026)
Unraveling circadian rhythms—computational insights into molecular mechanisms
by: Yashasvi Rao, et al.
Published: (2026)
by: Yashasvi Rao, et al.
Published: (2026)
Assessing Visual Privacy Risks in Multimodal AI: A Novel Taxonomy-Grounded Evaluation of Vision-Language Models
by: Tsaprazlis, Efthymios, et al.
Published: (2025)
by: Tsaprazlis, Efthymios, et al.
Published: (2025)
Ontology-Aware RAG for Improved Question-Answering in Cybersecurity Education
by: Zhao, Chengshuai, et al.
Published: (2024)
by: Zhao, Chengshuai, et al.
Published: (2024)
Adversarial Déjà Vu: Jailbreak Dictionary Learning for Stronger Generalization to Unseen Attacks
by: Dabas, Mahavir, et al.
Published: (2025)
by: Dabas, Mahavir, et al.
Published: (2025)
Similar Items
-
ARES: Adaptive Red-Teaming and End-to-End Repair of Policy-Reward System
by: Liang, Jiacheng, et al.
Published: (2026) -
Towards Safety Reasoning in LLMs: AI-agentic Deliberation for Policy-embedded CoT Data Creation
by: Kumarage, Tharindu, et al.
Published: (2025) -
Kaleidoscopic Teaming in Multi Agent Simulations
by: Mehrabi, Ninareh, et al.
Published: (2025) -
Tree-of-Traversals: A Zero-Shot Reasoning Algorithm for Augmenting Black-box Language Models with Knowledge Graphs
by: Markowitz, Elan, et al.
Published: (2024) -
K-Edit: Language Model Editing with Contextual Knowledge Awareness
by: Markowitz, Elan, et al.
Published: (2025)