Saved in:
| Main Authors: | Fang, Jinrui, Chen, Runhan, Yang, Xu, Yu, Jian, Xu, Jiawei, Vinod, Ashwin, Shi, Wenqi, Chen, Tianlong, Ji, Heng, Zhai, ChengXiang, Ding, Ying, Zhang, Yuji |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.04325 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
An Axiomatic Benchmark for Evaluation of Scientific Novelty Metrics
by: Liu, Miri, et al.
Published: (2026)
by: Liu, Miri, et al.
Published: (2026)
MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language Models
by: Pandit, Shrey, et al.
Published: (2025)
by: Pandit, Shrey, et al.
Published: (2025)
Beyond Reactive Safety: Risk-Aware LLM Alignment via Long-Horizon Simulation
by: Sun, Chenkai, et al.
Published: (2025)
by: Sun, Chenkai, et al.
Published: (2025)
An Investigation of Robustness of LLMs in Mathematical Reasoning: Benchmarking with Mathematically-Equivalent Transformation of Advanced Mathematical Problems
by: Hao, Yuren, et al.
Published: (2025)
by: Hao, Yuren, et al.
Published: (2025)
Ten Principles of AI Agent Economics
by: Yang, Ke, et al.
Published: (2025)
by: Yang, Ke, et al.
Published: (2025)
The Indispensable Role of User Simulation in the Pursuit of AGI
by: Balog, Krisztian, et al.
Published: (2025)
by: Balog, Krisztian, et al.
Published: (2025)
User Simulation for Evaluating Information Access Systems
by: Balog, Krisztian, et al.
Published: (2023)
by: Balog, Krisztian, et al.
Published: (2023)
User Simulation in the Era of Generative AI: User Modeling, Synthetic Data Generation, and System Evaluation
by: Balog, Krisztian, et al.
Published: (2025)
by: Balog, Krisztian, et al.
Published: (2025)
Make Any Collection Navigable: Methods for Constructing and Evaluating Hypergraph of Text
by: Alvarez, Dean E., et al.
Published: (2026)
by: Alvarez, Dean E., et al.
Published: (2026)
TinyHelen's First Curriculum: Training and Evaluating Tiny Language Models in a Simpler Language Environment
by: Yang, Ke, et al.
Published: (2024)
by: Yang, Ke, et al.
Published: (2024)
Competence-Based Analysis of Language Models
by: Davies, Adam, et al.
Published: (2023)
by: Davies, Adam, et al.
Published: (2023)
Interactive Information Need Prediction with Intent and Context
by: Ros, Kevin, et al.
Published: (2025)
by: Ros, Kevin, et al.
Published: (2025)
Globally Optimal Training of Spiking Neural Networks via Parameter Reconstruction
by: Udupi, Himanshu, et al.
Published: (2026)
by: Udupi, Himanshu, et al.
Published: (2026)
JIR-Arena: The First Benchmark Dataset for Just-in-time Information Recommendation
by: Yang, Ke, et al.
Published: (2025)
by: Yang, Ke, et al.
Published: (2025)
Teaching with Lies: Curriculum DPO on Synthetic Negatives for Hallucination Detection
by: Pandit, Shrey, et al.
Published: (2025)
by: Pandit, Shrey, et al.
Published: (2025)
User Preference Modeling for Conversational LLM Agents: Weak Rewards from Retrieval-Augmented Interaction
by: Hao, Yuren, et al.
Published: (2026)
by: Hao, Yuren, et al.
Published: (2026)
From Correctness to Comprehension: AI Agents for Personalized Error Diagnosis in Education
by: Zhang, Yi-Fan, et al.
Published: (2025)
by: Zhang, Yi-Fan, et al.
Published: (2025)
PlugMem: A Task-Agnostic Plugin Memory Module for LLM Agents
by: Yang, Ke, et al.
Published: (2026)
by: Yang, Ke, et al.
Published: (2026)
Thought Graph: Generating Thought Process for Biological Reasoning
by: Hsu, Chi-Yang, et al.
Published: (2024)
by: Hsu, Chi-Yang, et al.
Published: (2024)
SimLab: A Platform for Simulation-based Evaluation of Conversational Information Access Systems
by: Bernard, Nolwenn, et al.
Published: (2025)
by: Bernard, Nolwenn, et al.
Published: (2025)
Persona-DB: Efficient Large Language Model Personalization for Response Prediction with Collaborative Data Refinement
by: Sun, Chenkai, et al.
Published: (2024)
by: Sun, Chenkai, et al.
Published: (2024)
Seed-Guided Fine-Grained Entity Typing in Science and Engineering Domains
by: Zhang, Yu, et al.
Published: (2024)
by: Zhang, Yu, et al.
Published: (2024)
Mapping from Meaning: Addressing the Miscalibration of Prompt-Sensitive Language Models
by: Cox, Kyle, et al.
Published: (2025)
by: Cox, Kyle, et al.
Published: (2025)
Scalable Robust Bayesian Co-Clustering with Compositional ELBOs
by: Vinod, Ashwin, et al.
Published: (2025)
by: Vinod, Ashwin, et al.
Published: (2025)
Uncertainty-Aware Web-Conditioned Scientific Fact-Checking
by: Vinod, Ashwin, et al.
Published: (2026)
by: Vinod, Ashwin, et al.
Published: (2026)
Frantic to Lure Stock Listings: NASDAQ and the NYSE can't compete on stats alone, so they´re turning to shtick
Published: (2004)
Published: (2004)
Edge AI-Enabled Chicken Health Detection Based on Enhanced FCOS-Lite and Knowledge Distillation
by: Tong, Qiang, et al.
Published: (2024)
by: Tong, Qiang, et al.
Published: (2024)
Fairness in Multi-modal Medical Diagnosis with Demonstration Selection
by: Li, Dawei, et al.
Published: (2025)
by: Li, Dawei, et al.
Published: (2025)
Green Collaborative Innovation Network's Dynamic Evolution and Influencing Factors of Logistics Industry in China
by: Xu Runhan, et al.
Published: (2025)
by: Xu Runhan, et al.
Published: (2025)
TAMA: A Human-AI Collaborative Thematic Analysis Framework Using Multi-Agent LLMs for Clinical Interviews
by: Xu, Huimin, et al.
Published: (2025)
by: Xu, Huimin, et al.
Published: (2025)
The Public Library Lure
by: Greenaway, Emerson
Published: (1969)
by: Greenaway, Emerson
Published: (1969)
YT-Pilot: Turning YouTube into Structured Learning Pathways with Context-Aware AI Support
by: Albassam, Dina, et al.
Published: (2026)
by: Albassam, Dina, et al.
Published: (2026)
Bias and Volatility: A Statistical Framework for Evaluating Large Language Model's Stereotypes and the Associated Generation Inconsistency
by: Liu, Yiran, et al.
Published: (2024)
by: Liu, Yiran, et al.
Published: (2024)
Flexible Locomotion Learning with Diffusion Model Predictive Control
by: Huang, Runhan, et al.
Published: (2025)
by: Huang, Runhan, et al.
Published: (2025)
Risk Prognosis and Fire Prediction of Urban Utility Tunnels Using a Hybrid SSA‐LSTM Method
by: Tianlong Xu, et al.
Published: (2025)
by: Tianlong Xu, et al.
Published: (2025)
Luring Deep-sea Life
Published: (1986)
Published: (1986)
Lures and wild fish fry
by: Surtida, Augusto P., et al.
Published: (1999)
by: Surtida, Augusto P., et al.
Published: (1999)
Position: Open and Closed Large Language Models in Healthcare
by: Xu, Jiawei, et al.
Published: (2025)
by: Xu, Jiawei, et al.
Published: (2025)
Convergence and Near-optimal Sampling for Multivariate Function Approximations in Irregular Domains via Vandermonde with Arnoldi
by: Zhu, Wenqi, et al.
Published: (2023)
by: Zhu, Wenqi, et al.
Published: (2023)
MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback
by: Wang, Xingyao, et al.
Published: (2023)
by: Wang, Xingyao, et al.
Published: (2023)
Similar Items
-
An Axiomatic Benchmark for Evaluation of Scientific Novelty Metrics
by: Liu, Miri, et al.
Published: (2026) -
MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language Models
by: Pandit, Shrey, et al.
Published: (2025) -
Beyond Reactive Safety: Risk-Aware LLM Alignment via Long-Horizon Simulation
by: Sun, Chenkai, et al.
Published: (2025) -
An Investigation of Robustness of LLMs in Mathematical Reasoning: Benchmarking with Mathematically-Equivalent Transformation of Advanced Mathematical Problems
by: Hao, Yuren, et al.
Published: (2025) -
Ten Principles of AI Agent Economics
by: Yang, Ke, et al.
Published: (2025)