Saved in:
| Main Authors: | Lee, Grandee, Wang, Yue, Lye, Che Yee, Peh, Luke |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.19529 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Criterion Validity of LLM-as-Judge for Business Outcomes in Conversational Commerce
by: Chen, Liang, et al.
Published: (2026)
by: Chen, Liang, et al.
Published: (2026)
Prompt Fencing: A Cryptographic Approach to Establishing Security Boundaries in Large Language Model Prompts
by: Peh, Steven
Published: (2025)
by: Peh, Steven
Published: (2025)
Debate Only When Necessary: Adaptive Multiagent Collaboration for Efficient LLM Reasoning
by: Eo, Sugyeong, et al.
Published: (2025)
by: Eo, Sugyeong, et al.
Published: (2025)
Concurrent Criterion Validation of a Validity Screen for LLM Confidence Signals via Selective Prediction
by: Cacioli, Jon-Paul
Published: (2026)
by: Cacioli, Jon-Paul
Published: (2026)
On the Collapse of Generative Paths: A Criterion and Correction for Diffusion Steering
by: Lee, Ziseok, et al.
Published: (2025)
by: Lee, Ziseok, et al.
Published: (2025)
HALO: Hardware-aware quantization with low critical-path-delay weights for LLM acceleration
by: Juneja, Rohan, et al.
Published: (2025)
by: Juneja, Rohan, et al.
Published: (2025)
Decipherment-Aware Multilingual Learning in Jointly Trained Language Models
by: Lee, Grandee
Published: (2024)
by: Lee, Grandee
Published: (2024)
Case-Specific Rubrics for Clinical AI Evaluation: Methodology, Validation, and LLM-Clinician Agreement Across 823 Encounters
by: Shah, Aaryan, et al.
Published: (2026)
by: Shah, Aaryan, et al.
Published: (2026)
Reality Check: A New Evaluation Ecosystem Is Necessary to Understand AI's Real World Effects
by: Schwartz, Reva, et al.
Published: (2025)
by: Schwartz, Reva, et al.
Published: (2025)
Are We Evaluating the Edit Locality of LLM Model Editing Properly?
by: Liu, Wei, et al.
Published: (2026)
by: Liu, Wei, et al.
Published: (2026)
VSPO: Validating Semantic Pitfalls in Ontology via LLM-Based CQ Generation
by: Choi, Hyojun, et al.
Published: (2025)
by: Choi, Hyojun, et al.
Published: (2025)
LinTree: Improving LLM Reasoning with Explicitly Structured Search Histories
by: Kang, Liwei, et al.
Published: (2026)
by: Kang, Liwei, et al.
Published: (2026)
Learning to Visually Connect Actions and their Effects
by: Parmar, Paritosh, et al.
Published: (2024)
by: Parmar, Paritosh, et al.
Published: (2024)
Is Prompt Selection Necessary for Task-Free Online Continual Learning?
by: Park, Seoyoung, et al.
Published: (2026)
by: Park, Seoyoung, et al.
Published: (2026)
Establishing Performance Baselines in Fine-Tuning, Retrieval-Augmented Generation and Soft-Prompting for Non-Specialist LLM Users
by: Dodgson, Jennifer, et al.
Published: (2023)
by: Dodgson, Jennifer, et al.
Published: (2023)
Are Expressive Encoders Necessary for Discrete Graph Generation?
by: Revolinsky, Jay, et al.
Published: (2026)
by: Revolinsky, Jay, et al.
Published: (2026)
Integrated Framework for LLM Evaluation with Answer Generation
by: Lee, Sujeong, et al.
Published: (2025)
by: Lee, Sujeong, et al.
Published: (2025)
Model Tampering Attacks Enable More Rigorous Evaluations of LLM Capabilities
by: Che, Zora, et al.
Published: (2025)
by: Che, Zora, et al.
Published: (2025)
Validating LLM-Generated Programs with Metamorphic Prompt Testing
by: Wang, Xiaoyin, et al.
Published: (2024)
by: Wang, Xiaoyin, et al.
Published: (2024)
Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences
by: Shankar, Shreya, et al.
Published: (2024)
by: Shankar, Shreya, et al.
Published: (2024)
Benchmarking Emergent Coordination in Large-Scale LLM Populations: An Evaluation Framework on the MoltBook Archive
by: Yee, Brandon, et al.
Published: (2026)
by: Yee, Brandon, et al.
Published: (2026)
CLAVE: An Adaptive Framework for Evaluating Values of LLM Generated Responses
by: Yao, Jing, et al.
Published: (2024)
by: Yao, Jing, et al.
Published: (2024)
Pioneering Reliable Assessment in Text-to-Image Knowledge Editing: Leveraging a Fine-Grained Dataset and an Innovative Criterion
by: Gu, Hengrui, et al.
Published: (2024)
by: Gu, Hengrui, et al.
Published: (2024)
Modeling the Data-Generating Process is Necessary for Out-of-Distribution Generalization
by: Kaur, Jivat Neet, et al.
Published: (2022)
by: Kaur, Jivat Neet, et al.
Published: (2022)
Towards Feedback-to-Plan Decisions for Self-Evolving LLM Agents in CUDA Kernel Generation
by: Chong, Yee Hin, et al.
Published: (2026)
by: Chong, Yee Hin, et al.
Published: (2026)
Don't Overthink It: Inter-Rollout Action Agreement as a Free Adaptive-Compute Signal for LLM Agents
by: Sethi, Khushal
Published: (2026)
by: Sethi, Khushal
Published: (2026)
Enhancing Financial Inclusion and Regulatory Challenges: A Critical Analysis of Digital Banks and Alternative Lenders Through Digital Platforms, Machine Learning, and Large Language Models Integration
by: Lee, Luke
Published: (2024)
by: Lee, Luke
Published: (2024)
The Price of Agreement: Measuring LLM Sycophancy in Agentic Financial Applications
by: Zhao, Zhenyu, et al.
Published: (2026)
by: Zhao, Zhenyu, et al.
Published: (2026)
Interactive Visual Assessment for Text-to-Image Generation Models
by: Mi, Xiaoyue, et al.
Published: (2024)
by: Mi, Xiaoyue, et al.
Published: (2024)
Dynamic and Adaptive Feature Generation with LLM
by: Zhang, Xinhao, et al.
Published: (2024)
by: Zhang, Xinhao, et al.
Published: (2024)
ChainReaction: Causal Chain-Guided Reasoning for Modular and Explainable Causal-Why Video Question Answering
by: Parmar, Paritosh, et al.
Published: (2025)
by: Parmar, Paritosh, et al.
Published: (2025)
Are Human-generated Demonstrations Necessary for In-context Learning?
by: Li, Rui, et al.
Published: (2023)
by: Li, Rui, et al.
Published: (2023)
Adaptive Reasoning and Acting in Medical Language Agents
by: Dutta, Abhishek, et al.
Published: (2024)
by: Dutta, Abhishek, et al.
Published: (2024)
Medical Image Debiasing by Learning Adaptive Agreement from a Biased Council
by: Luo, Luyang, et al.
Published: (2024)
by: Luo, Luyang, et al.
Published: (2024)
SMART: Self-Generating and Self-Validating Multi-Dimensional Assessment for LLMs' Mathematical Problem Solving
by: Hou, Yujie, et al.
Published: (2025)
by: Hou, Yujie, et al.
Published: (2025)
From Natural Language to Solver-Ready Power System Optimization: An LLM-Assisted, Validation-in-the-Loop Framework
by: Hu, Yunkai, et al.
Published: (2025)
by: Hu, Yunkai, et al.
Published: (2025)
Automated Validation of LLM-based Evaluators for Software Engineering Artifacts
by: Fandina, Ora Nova, et al.
Published: (2025)
by: Fandina, Ora Nova, et al.
Published: (2025)
Judge's Verdict: A Comprehensive Analysis of LLM Judge Capability Through Human Agreement
by: Han, Steve, et al.
Published: (2025)
by: Han, Steve, et al.
Published: (2025)
A Theoretical Analysis of Compositional Generalization in Neural Networks: A Necessary and Sufficient Condition
by: Li, Yuanpeng
Published: (2025)
by: Li, Yuanpeng
Published: (2025)
Chain-of-Trust: A Progressive Trust Evaluation Framework Enabled by Generative AI
by: Zhu, Botao, et al.
Published: (2025)
by: Zhu, Botao, et al.
Published: (2025)
Similar Items
-
Criterion Validity of LLM-as-Judge for Business Outcomes in Conversational Commerce
by: Chen, Liang, et al.
Published: (2026) -
Prompt Fencing: A Cryptographic Approach to Establishing Security Boundaries in Large Language Model Prompts
by: Peh, Steven
Published: (2025) -
Debate Only When Necessary: Adaptive Multiagent Collaboration for Efficient LLM Reasoning
by: Eo, Sugyeong, et al.
Published: (2025) -
Concurrent Criterion Validation of a Validity Screen for LLM Confidence Signals via Selective Prediction
by: Cacioli, Jon-Paul
Published: (2026) -
On the Collapse of Generative Paths: A Criterion and Correction for Diffusion Steering
by: Lee, Ziseok, et al.
Published: (2025)