:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Lee, Grandee, Wang, Yue, Lye, Che Yee, Peh, Luke
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2605.19529
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Criterion Validity of LLM-as-Judge for Business Outcomes in Conversational Commerce
by: Chen, Liang, et al.
Published: (2026)

Prompt Fencing: A Cryptographic Approach to Establishing Security Boundaries in Large Language Model Prompts
by: Peh, Steven
Published: (2025)

Debate Only When Necessary: Adaptive Multiagent Collaboration for Efficient LLM Reasoning
by: Eo, Sugyeong, et al.
Published: (2025)

Concurrent Criterion Validation of a Validity Screen for LLM Confidence Signals via Selective Prediction
by: Cacioli, Jon-Paul
Published: (2026)

On the Collapse of Generative Paths: A Criterion and Correction for Diffusion Steering
by: Lee, Ziseok, et al.
Published: (2025)

HALO: Hardware-aware quantization with low critical-path-delay weights for LLM acceleration
by: Juneja, Rohan, et al.
Published: (2025)

Decipherment-Aware Multilingual Learning in Jointly Trained Language Models
by: Lee, Grandee
Published: (2024)

Case-Specific Rubrics for Clinical AI Evaluation: Methodology, Validation, and LLM-Clinician Agreement Across 823 Encounters
by: Shah, Aaryan, et al.
Published: (2026)

Reality Check: A New Evaluation Ecosystem Is Necessary to Understand AI's Real World Effects
by: Schwartz, Reva, et al.
Published: (2025)

Are We Evaluating the Edit Locality of LLM Model Editing Properly?
by: Liu, Wei, et al.
Published: (2026)

VSPO: Validating Semantic Pitfalls in Ontology via LLM-Based CQ Generation
by: Choi, Hyojun, et al.
Published: (2025)

LinTree: Improving LLM Reasoning with Explicitly Structured Search Histories
by: Kang, Liwei, et al.
Published: (2026)

Learning to Visually Connect Actions and their Effects
by: Parmar, Paritosh, et al.
Published: (2024)

Is Prompt Selection Necessary for Task-Free Online Continual Learning?
by: Park, Seoyoung, et al.
Published: (2026)

Establishing Performance Baselines in Fine-Tuning, Retrieval-Augmented Generation and Soft-Prompting for Non-Specialist LLM Users
by: Dodgson, Jennifer, et al.
Published: (2023)

Are Expressive Encoders Necessary for Discrete Graph Generation?
by: Revolinsky, Jay, et al.
Published: (2026)

Integrated Framework for LLM Evaluation with Answer Generation
by: Lee, Sujeong, et al.
Published: (2025)

Model Tampering Attacks Enable More Rigorous Evaluations of LLM Capabilities
by: Che, Zora, et al.
Published: (2025)

Validating LLM-Generated Programs with Metamorphic Prompt Testing
by: Wang, Xiaoyin, et al.
Published: (2024)

Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences
by: Shankar, Shreya, et al.
Published: (2024)

Benchmarking Emergent Coordination in Large-Scale LLM Populations: An Evaluation Framework on the MoltBook Archive
by: Yee, Brandon, et al.
Published: (2026)

CLAVE: An Adaptive Framework for Evaluating Values of LLM Generated Responses
by: Yao, Jing, et al.
Published: (2024)

Pioneering Reliable Assessment in Text-to-Image Knowledge Editing: Leveraging a Fine-Grained Dataset and an Innovative Criterion
by: Gu, Hengrui, et al.
Published: (2024)

Modeling the Data-Generating Process is Necessary for Out-of-Distribution Generalization
by: Kaur, Jivat Neet, et al.
Published: (2022)

Towards Feedback-to-Plan Decisions for Self-Evolving LLM Agents in CUDA Kernel Generation
by: Chong, Yee Hin, et al.
Published: (2026)

Don't Overthink It: Inter-Rollout Action Agreement as a Free Adaptive-Compute Signal for LLM Agents
by: Sethi, Khushal
Published: (2026)

Enhancing Financial Inclusion and Regulatory Challenges: A Critical Analysis of Digital Banks and Alternative Lenders Through Digital Platforms, Machine Learning, and Large Language Models Integration
by: Lee, Luke
Published: (2024)

The Price of Agreement: Measuring LLM Sycophancy in Agentic Financial Applications
by: Zhao, Zhenyu, et al.
Published: (2026)

Interactive Visual Assessment for Text-to-Image Generation Models
by: Mi, Xiaoyue, et al.
Published: (2024)

Dynamic and Adaptive Feature Generation with LLM
by: Zhang, Xinhao, et al.
Published: (2024)

ChainReaction: Causal Chain-Guided Reasoning for Modular and Explainable Causal-Why Video Question Answering
by: Parmar, Paritosh, et al.
Published: (2025)

Are Human-generated Demonstrations Necessary for In-context Learning?
by: Li, Rui, et al.
Published: (2023)

Adaptive Reasoning and Acting in Medical Language Agents
by: Dutta, Abhishek, et al.
Published: (2024)

Medical Image Debiasing by Learning Adaptive Agreement from a Biased Council
by: Luo, Luyang, et al.
Published: (2024)

SMART: Self-Generating and Self-Validating Multi-Dimensional Assessment for LLMs' Mathematical Problem Solving
by: Hou, Yujie, et al.
Published: (2025)

From Natural Language to Solver-Ready Power System Optimization: An LLM-Assisted, Validation-in-the-Loop Framework
by: Hu, Yunkai, et al.
Published: (2025)

Automated Validation of LLM-based Evaluators for Software Engineering Artifacts
by: Fandina, Ora Nova, et al.
Published: (2025)

Judge's Verdict: A Comprehensive Analysis of LLM Judge Capability Through Human Agreement
by: Han, Steve, et al.
Published: (2025)

A Theoretical Analysis of Compositional Generalization in Neural Networks: A Necessary and Sufficient Condition
by: Li, Yuanpeng
Published: (2025)

Chain-of-Trust: A Progressive Trust Evaluation Framework Enabled by Generative AI
by: Zhu, Botao, et al.
Published: (2025)