:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yuan, Jiayi, Zhang, Jiamu, Wen, Andrew, Hu, Xia
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2502.09670
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Federated Fine-tuning of Large Language Models under Heterogeneous Tasks and Client Resources
by: Bai, Jiamu, et al.
Published: (2024)

DHP Benchmark: Are LLMs Good NLG Evaluators?
by: Wang, Yicheng, et al.
Published: (2024)

Evaluating Progress in Graph Foundation Models: A Comprehensive Benchmark and New Insights
by: Yu, Xingtong, et al.
Published: (2026)

An Exploration of Higher Education Course Evaluation by Large Language Models
by: Yuan, Bo, et al.
Published: (2024)

CMMaTH: A Chinese Multi-modal Math Skill Evaluation Benchmark for Foundation Models
by: Li, Zhong-Zhi, et al.
Published: (2024)

Language Models can Evaluate Themselves via Probability Discrepancy
by: Xia, Tingyu, et al.
Published: (2024)

Generating Leakage-Free Benchmarks for Robust RAG Evaluation
by: Liu, Jiayi, et al.
Published: (2026)

Cross-Platform Evaluation of Reasoning Capabilities in Foundation Models
by: de Curtò, J., et al.
Published: (2025)

A Few Words Can Distort Graphs: Knowledge Poisoning Attacks on Graph-based Retrieval-Augmented Generation of Large Language Models
by: Wen, Jiayi, et al.
Published: (2025)

Adapting Large Language Models for Education: Foundational Capabilities, Potentials, and Challenges
by: Li, Qingyao, et al.
Published: (2023)

Leveraging Computerized Adaptive Testing for Cost-effective Evaluation of Large Language Models in Medical Benchmarking
by: Zheng, Tianpeng, et al.
Published: (2026)

MM-Eval: A Hierarchical Benchmark for Modern Mongolian Evaluation in LLMs
by: Zhang, Mengyuan, et al.
Published: (2024)

Towards Foundation Models for Knowledge Graph Reasoning
by: Galkin, Mikhail, et al.
Published: (2023)

DSAEval: Evaluating Data Science Agents on a Wide Range of Real-World Data Science Problems
by: Sun, Maojun, et al.
Published: (2026)

Batched Low-Rank Adaptation of Foundation Models
by: Wen, Yeming, et al.
Published: (2023)

Yi: Open Foundation Models by 01.AI
by: AI, 01., et al.
Published: (2024)

SAMGPT: Text-free Graph Foundation Model for Multi-domain Pre-training and Cross-domain Adaptation
by: Yu, Xingtong, et al.
Published: (2025)

QuarkMedBench: A Real-World Scenario Driven Benchmark for Evaluating Large Language Models
by: Wu, Yao, et al.
Published: (2026)

Beyond Completion: A Foundation Model for General Knowledge Graph Reasoning
by: Hua, Yin, et al.
Published: (2025)

FAMA: The First Large-Scale Open-Science Speech Foundation Model for English and Italian
by: Papi, Sara, et al.
Published: (2025)

Evaluating the Effectiveness of the Foundational Models for Q&A Classification in Mental Health care
by: Alhuzali, Hassan, et al.
Published: (2024)

MMAI Gym for Science: Training Liquid Foundation Models for Drug Discovery
by: Kuznetsov, Maksim, et al.
Published: (2026)

A Prompt-Based Knowledge Graph Foundation Model for Universal In-Context Reasoning
by: Cui, Yuanning, et al.
Published: (2024)

Symbol-LLM: Towards Foundational Symbol-centric Interface For Large Language Models
by: Xu, Fangzhi, et al.
Published: (2023)

Luna: An Evaluation Foundation Model to Catch Language Model Hallucinations with High Accuracy and Low Cost
by: Belyi, Masha, et al.
Published: (2024)

DARE-bench: Evaluating Modeling and Instruction Fidelity of LLMs in Data Science
by: Shu, Fan, et al.
Published: (2026)

Elsevier Arena: Human Evaluation of Chemistry/Biology/Health Foundational Large Language Models
by: Thorne, Camilo, et al.
Published: (2024)

FedDTRE: Federated Dialogue Generation Models Powered by Trustworthiness Evaluation
by: Lu, Shule, et al.
Published: (2025)

Weaver: Foundation Models for Creative Writing
by: Wang, Tiannan, et al.
Published: (2024)

Shadow-FT: Tuning Instruct Model via Training on Paired Base Model
by: Wu, Taiqiang, et al.
Published: (2025)

On Speeding Up Language Model Evaluation
by: Zhou, Jin Peng, et al.
Published: (2024)

CodeApex: A Bilingual Programming Evaluation Benchmark for Large Language Models
by: Fu, Lingyue, et al.
Published: (2023)

AI-Driven Automation Can Become the Foundation of Next-Era Science of Science Research
by: Chen, Renqi, et al.
Published: (2025)

MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts
by: Lu, Pan, et al.
Published: (2023)

A Survey on Multilingual Large Language Models: Corpora, Alignment, and Bias
by: Xu, Yuemei, et al.
Published: (2024)

LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models
by: Diao, Shizhe, et al.
Published: (2023)

Whispers that Shake Foundations: Analyzing and Mitigating False Premise Hallucinations in Large Language Models
by: Yuan, Hongbang, et al.
Published: (2024)

Fine-Grained Benchmark Generation for Comprehensive Evaluation of Foundation Models
by: Islam, Mohammed Saidul, et al.
Published: (2026)

Foundation Models to Unlock Real-World Evidence from Nationwide Medical Claims
by: Ma, Fan, et al.
Published: (2026)

Me LLaMA: Foundation Large Language Models for Medical Applications
by: Xie, Qianqian, et al.
Published: (2024)