:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Hua, Yining, Na, Hongbin, Ayubcha, Cyrus, Lian, Levi
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2605.23262
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Applying and Evaluating Large Language Models in Mental Health Care: A Scoping Review of Human-Assessed Generative Tasks
by: Hua, Yining, et al.
Published: (2024)

Machine Learning in Drug Development for Neurological Diseases: A Review of Blood Brain Barrier Permeability Prediction Models
by: Aryon Eckleel Nabi, et al.
Published: (2025)

Charting the evolution of artificial intelligence mental health chatbots from rule‐based systems to large language models: a systematic review
by: Yining Hua, et al.
Published: (2025)

The Instrumental Dissolution of Typing: Why AI Challenges the Keyboard Era in Knowledge Work
by: Hua, Wei Roy
Published: (2026)

Large Language Models in Mental Health Care: a Scoping Review
by: Hua, Yining, et al.
Published: (2024)

Teaching AI Through Benchmark Construction: QuestBench as a Course-Based Practice for Accountable Knowledge Work
by: Shen, Haiyang, et al.
Published: (2026)

MineAgent: Towards Remote-Sensing Mineral Exploration with Multimodal Large Language Models
by: Yu, Beibei, et al.
Published: (2024)

WorkArena++: Towards Compositional Planning and Reasoning-based Common Knowledge Work Tasks
by: Boisvert, Léo, et al.
Published: (2024)

Benchmark Health Index: A Systematic Framework for Benchmarking the Benchmarks of LLMs
by: Zhu, Longyuan, et al.
Published: (2026)

Cracking Factual Knowledge: A Comprehensive Analysis of Degenerate Knowledge Neurons in Large Language Models
by: Chen, Yuheng, et al.
Published: (2024)

Poly2Vec: Polymorphic Fourier-Based Encoding of Geospatial Objects for GeoAI Applications
by: Siampou, Maria Despoina, et al.
Published: (2024)

How Do Images Align and Complement LiDAR? Towards a Harmonized Multi-modal 3D Panoptic Segmentation
by: Pan, Yining, et al.
Published: (2025)

CorBenchX: Large-Scale Chest X-Ray Error Dataset and Vision-Language Model Benchmark for Report Error Correction
by: Zou, Jing, et al.
Published: (2025)

MSCoRe: A Benchmark for Multi-Stage Collaborative Reasoning in LLM Agents
by: Lei, Yuzhen, et al.
Published: (2025)

Knowledge Protocol Engineering: A New Paradigm for AI in Domain-Specific Knowledge Work
by: Zhang, Guangwei
Published: (2025)

MEDMKG: Benchmarking Medical Knowledge Exploitation with Multimodal Knowledge Graph
by: Wang, Xiaochen, et al.
Published: (2025)

GTA: A Benchmark for General Tool Agents
by: Wang, Jize, et al.
Published: (2024)

From Hazard Identification to Controller Design: Proactive and LLM-Supported Safety Engineering for ML-Powered Systems
by: Hong, Yining, et al.
Published: (2025)

From Static Spectra to Operando Infrared Dynamics: Physics Informed Flow Modeling and a Benchmark
by: Ye, Shuquan, et al.
Published: (2026)

SKA-Bench: A Fine-Grained Benchmark for Evaluating Structured Knowledge Understanding of LLMs
by: Liu, Zhiqiang, et al.
Published: (2025)

ORLoopBench: Solver-in-the-Loop Benchmarks for Self-Correction and Behavioral Rationality in Operations Research
by: Ao, Ruicheng, et al.
Published: (2026)

OphthBench: A Comprehensive Benchmark for Evaluating Large Language Models in Chinese Ophthalmology
by: Zhou, Chengfeng, et al.
Published: (2025)

Towards Unsupervised Adversarial Document Detection in Retrieval Augmented Generation Systems
by: Levi, Patrick
Published: (2026)

Approaching Low-Cost Cardiac Intelligence with Semi-Supervised Knowledge Distillation
by: Zhou, Rushuang, et al.
Published: (2025)

Prior Knowledge Makes It Possible: From Sublinear Graph Algorithms to LLM Test-Time Methods
by: Blum, Avrim, et al.
Published: (2025)

WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks?
by: Drouin, Alexandre, et al.
Published: (2024)

Rethinking RL Evaluation: Can Benchmarks Truly Reveal Failures of RL Methods?
by: Chen, Zihan, et al.
Published: (2025)

Towards Knowledgeable Deep Research: Framework and Benchmark
by: Liu, Wenxuan, et al.
Published: (2026)

Beyond Isolation: Multi-Agent Synergy for Improving Knowledge Graph Construction
by: Ye, Hongbin, et al.
Published: (2023)

CORE-Acu: Structured Reasoning Traces and Knowledge Graph Safety Verification for Acupuncture Clinical Decision Support
by: Xu, Liuyi, et al.
Published: (2026)

ClimateIQA: A New Dataset and Benchmark to Advance Vision-Language Models in Meteorology Anomalies Analysis
by: Chen, Jian, et al.
Published: (2024)

Do Deepfake Detectors Work in Reality?
by: Ren, Simiao, et al.
Published: (2025)

Geo2Vec: Shape- and Distance-Aware Neural Representation of Geospatial Entities
by: Chu, Chen, et al.
Published: (2025)

Benchmarking LLM Summaries of Multimodal Clinical Time Series for Remote Monitoring
by: Shukla, Aditya, et al.
Published: (2026)

Fast and Continual Knowledge Graph Embedding via Incremental LoRA
by: Liu, Jiajun, et al.
Published: (2024)

Interactive AI NPCs Powered by LLMs: Technical Report for the CPDC Challenge 2025
by: Huang, Yitian, et al.
Published: (2025)

Data Collection of Real-Life Knowledge Work in Context: The RLKWiC Dataset
by: Bakhshizadeh, Mahta, et al.
Published: (2024)

A Decomposition Modeling Framework for Seasonal Time-Series Forecasting
by: Pang, Yining, et al.
Published: (2024)

An Attack Method for Medical Insurance Claim Fraud Detection based on Generative Adversarial Network
by: Pang, Yining, et al.
Published: (2025)

FieldWorkArena: Agentic AI Benchmark for Real Field Work Tasks
by: Takahashi, Jun, et al.
Published: (2025)