Saved in:
| Main Authors: | Afane, Mohamed, Robitschek, Emily, Ouyang, Derek, Ho, Daniel E. |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.19895 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Benchmarking Legal RAG: The Promise and Limits of AI Statutory Surveys
by: Afane, Mohamed, et al.
Published: (2026)
by: Afane, Mohamed, et al.
Published: (2026)
Can LLMs Help Allocate Public Health Resources? A Case Study on Childhood Lead Testing
by: Afane, Mohamed, et al.
Published: (2025)
by: Afane, Mohamed, et al.
Published: (2025)
A Progressive Visual-Logic-Aligned Framework for Ride-Hailing Adjudication
by: Wu, Weiming, et al.
Published: (2026)
by: Wu, Weiming, et al.
Published: (2026)
ATP: Adaptive Threshold Pruning for Efficient Data Encoding in Quantum Neural Networks
by: Afane, Mohamed, et al.
Published: (2025)
by: Afane, Mohamed, et al.
Published: (2025)
Deciding When Not to Decide: Indeterminacy-Aware Intrusion Detection with NeutroSENSE
by: Al-Masri, Eyhab
Published: (2025)
by: Al-Masri, Eyhab
Published: (2025)
Critical Windows of Complexity Control: When Transformers Decide to Reason or Memorize
by: Ali, Sarwan
Published: (2026)
by: Ali, Sarwan
Published: (2026)
Next-Generation Phishing: How LLM Agents Empower Cyber Attackers
by: Afane, Khalifa, et al.
Published: (2024)
by: Afane, Khalifa, et al.
Published: (2024)
Learning to Decide with AI Assistance under Human-Alignment
by: Benz, Nina Corvelo, et al.
Published: (2026)
by: Benz, Nina Corvelo, et al.
Published: (2026)
Automating Adjudication of Cardiovascular Events Using Large Language Models
by: Sivarajkumar, Sonish, et al.
Published: (2025)
by: Sivarajkumar, Sonish, et al.
Published: (2025)
Adjudicator: Correcting Noisy Labels with a KG-Informed Council of LLM Agents
by: You, Doohee, et al.
Published: (2025)
by: You, Doohee, et al.
Published: (2025)
TraceScope: Interactive URL Triage via Decoupled Checklist Adjudication
by: Zhang, Haolin, et al.
Published: (2026)
by: Zhang, Haolin, et al.
Published: (2026)
From Calculation to Adjudication: Examining LLM judges on Mathematical Reasoning Tasks
by: Stephan, Andreas, et al.
Published: (2024)
by: Stephan, Andreas, et al.
Published: (2024)
ConsistencyAI: A Benchmark to Assess LLMs' Factual Consistency When Responding to Different Demographic Groups
by: Banyas, Peter, et al.
Published: (2025)
by: Banyas, Peter, et al.
Published: (2025)
AIRA_2: Overcoming Bottlenecks in AI Research Agents
by: Hambardzumyan, Karen, et al.
Published: (2026)
by: Hambardzumyan, Karen, et al.
Published: (2026)
Do Benchmarks Underestimate LLM Performance? Evaluating Hallucination Detection With LLM-First Human-Adjudicated Assessment
by: Atasoy, I. F., et al.
Published: (2026)
by: Atasoy, I. F., et al.
Published: (2026)
The Prompt War: How AI Decides on a Military Intervention
by: Chupilkin, Maxim
Published: (2025)
by: Chupilkin, Maxim
Published: (2025)
When Language Shapes Thought: Cross-Lingual Transfer of Factual Knowledge in Question Answering
by: Kang, Eojin, et al.
Published: (2025)
by: Kang, Eojin, et al.
Published: (2025)
Do Automatic Factuality Metrics Measure Factuality? A Critical Evaluation
by: Ramprasad, Sanjana, et al.
Published: (2024)
by: Ramprasad, Sanjana, et al.
Published: (2024)
Learning to Decide with Just Enough: Information-Theoretic Context Summarization for CMDPs
by: Liu, Peidong, et al.
Published: (2025)
by: Liu, Peidong, et al.
Published: (2025)
Do Proactive Agents Really Need an LLM to Decide When to Wake and What to Anchor?
by: Liu, Xiaoze, et al.
Published: (2026)
by: Liu, Xiaoze, et al.
Published: (2026)
SYNFAC-EDIT: Synthetic Imitation Edit Feedback for Factual Alignment in Clinical Summarization
by: Mishra, Prakamya, et al.
Published: (2024)
by: Mishra, Prakamya, et al.
Published: (2024)
Artificial Intelligence and Child Custody Adjudication: A Comparative Study of Estonia and Nigeria
by: Folajuwon-Banjo, Emilia Oluwaseun
Published: (2025)
by: Folajuwon-Banjo, Emilia Oluwaseun
Published: (2025)
Reason2Decide: Rationale-Driven Multi-Task Learning
by: Hasan, H M Quamran, et al.
Published: (2025)
by: Hasan, H M Quamran, et al.
Published: (2025)
On the Size Complexity and Decidability of First-Order Progression
by: Classen, Jens, et al.
Published: (2026)
by: Classen, Jens, et al.
Published: (2026)
Deciding the Satisfiability of Combined Qualitative Constraint Networks
by: Cohen-Solal, Quentin, et al.
Published: (2026)
by: Cohen-Solal, Quentin, et al.
Published: (2026)
Decidable By Construction: Design-Time Verification for Trustworthy AI
by: Haynes, Houston
Published: (2026)
by: Haynes, Houston
Published: (2026)
Factuality on Demand: Controlling the Factuality-Informativeness Trade-off in Text Generation
by: Gong, Ziwei, et al.
Published: (2026)
by: Gong, Ziwei, et al.
Published: (2026)
Deciding how to respond: A deliberative framework to guide policymaker responses to AI systems
by: Fourie, Willem
Published: (2025)
by: Fourie, Willem
Published: (2025)
Do I Really Know? Learning Factual Self-Verification for Hallucination Reduction
by: Altinisik, Enes, et al.
Published: (2026)
by: Altinisik, Enes, et al.
Published: (2026)
Locomo-Plus: Beyond-Factual Cognitive Memory Evaluation Framework for LLM Agents
by: Li, Yifei, et al.
Published: (2026)
by: Li, Yifei, et al.
Published: (2026)
OpenFactCheck: A Unified Framework for Factuality Evaluation of LLMs
by: Iqbal, Hasan, et al.
Published: (2024)
by: Iqbal, Hasan, et al.
Published: (2024)
MAD-Fact: A Multi-Agent Debate Framework for Long-Form Factuality Evaluation in LLMs
by: Ning, Yucheng, et al.
Published: (2025)
by: Ning, Yucheng, et al.
Published: (2025)
GeoDecider: A Coarse-to-Fine Agentic Workflow for Explainable Lithology Classification
by: Wang, Jiahao, et al.
Published: (2026)
by: Wang, Jiahao, et al.
Published: (2026)
Overcoming Multi-step Complexity in Multimodal Theory-of-Mind Reasoning: A Scalable Bayesian Planner
by: Zhang, Chunhui, et al.
Published: (2025)
by: Zhang, Chunhui, et al.
Published: (2025)
Knowledge Authoring with Factual English, Rules, and Actions
by: Wang, Yuheng
Published: (2024)
by: Wang, Yuheng
Published: (2024)
Factuality of Large Language Models: A Survey
by: Wang, Yuxia, et al.
Published: (2024)
by: Wang, Yuxia, et al.
Published: (2024)
Evaluating Reliability Asymmetries in Chinese Factual Search and AI Answers
by: Liu, Geng, et al.
Published: (2025)
by: Liu, Geng, et al.
Published: (2025)
Knowledgeable In-Context Tuning: Exploring and Exploiting Factual Knowledge for In-Context Learning
by: Wang, Jianing, et al.
Published: (2023)
by: Wang, Jianing, et al.
Published: (2023)
Tracking vs. Deciding: The Dual-Capability Bottleneck in Searchless Chess Transformers
by: Li, Quanhao, et al.
Published: (2026)
by: Li, Quanhao, et al.
Published: (2026)
Is Factuality Enhancement a Free Lunch For LLMs? Better Factuality Can Lead to Worse Context-Faithfulness
by: Bi, Baolong, et al.
Published: (2024)
by: Bi, Baolong, et al.
Published: (2024)
Similar Items
-
Benchmarking Legal RAG: The Promise and Limits of AI Statutory Surveys
by: Afane, Mohamed, et al.
Published: (2026) -
Can LLMs Help Allocate Public Health Resources? A Case Study on Childhood Lead Testing
by: Afane, Mohamed, et al.
Published: (2025) -
A Progressive Visual-Logic-Aligned Framework for Ride-Hailing Adjudication
by: Wu, Weiming, et al.
Published: (2026) -
ATP: Adaptive Threshold Pruning for Efficient Data Encoding in Quantum Neural Networks
by: Afane, Mohamed, et al.
Published: (2025) -
Deciding When Not to Decide: Indeterminacy-Aware Intrusion Detection with NeutroSENSE
by: Al-Masri, Eyhab
Published: (2025)