Saved in:
| Main Authors: | Yuan, Jiayi, Zhang, Jiamu, Wen, Andrew, Hu, Xia |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.09670 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Federated Fine-tuning of Large Language Models under Heterogeneous Tasks and Client Resources
by: Bai, Jiamu, et al.
Published: (2024)
by: Bai, Jiamu, et al.
Published: (2024)
DHP Benchmark: Are LLMs Good NLG Evaluators?
by: Wang, Yicheng, et al.
Published: (2024)
by: Wang, Yicheng, et al.
Published: (2024)
Evaluating Progress in Graph Foundation Models: A Comprehensive Benchmark and New Insights
by: Yu, Xingtong, et al.
Published: (2026)
by: Yu, Xingtong, et al.
Published: (2026)
An Exploration of Higher Education Course Evaluation by Large Language Models
by: Yuan, Bo, et al.
Published: (2024)
by: Yuan, Bo, et al.
Published: (2024)
CMMaTH: A Chinese Multi-modal Math Skill Evaluation Benchmark for Foundation Models
by: Li, Zhong-Zhi, et al.
Published: (2024)
by: Li, Zhong-Zhi, et al.
Published: (2024)
Language Models can Evaluate Themselves via Probability Discrepancy
by: Xia, Tingyu, et al.
Published: (2024)
by: Xia, Tingyu, et al.
Published: (2024)
Generating Leakage-Free Benchmarks for Robust RAG Evaluation
by: Liu, Jiayi, et al.
Published: (2026)
by: Liu, Jiayi, et al.
Published: (2026)
Cross-Platform Evaluation of Reasoning Capabilities in Foundation Models
by: de Curtò, J., et al.
Published: (2025)
by: de Curtò, J., et al.
Published: (2025)
A Few Words Can Distort Graphs: Knowledge Poisoning Attacks on Graph-based Retrieval-Augmented Generation of Large Language Models
by: Wen, Jiayi, et al.
Published: (2025)
by: Wen, Jiayi, et al.
Published: (2025)
Adapting Large Language Models for Education: Foundational Capabilities, Potentials, and Challenges
by: Li, Qingyao, et al.
Published: (2023)
by: Li, Qingyao, et al.
Published: (2023)
Leveraging Computerized Adaptive Testing for Cost-effective Evaluation of Large Language Models in Medical Benchmarking
by: Zheng, Tianpeng, et al.
Published: (2026)
by: Zheng, Tianpeng, et al.
Published: (2026)
MM-Eval: A Hierarchical Benchmark for Modern Mongolian Evaluation in LLMs
by: Zhang, Mengyuan, et al.
Published: (2024)
by: Zhang, Mengyuan, et al.
Published: (2024)
Towards Foundation Models for Knowledge Graph Reasoning
by: Galkin, Mikhail, et al.
Published: (2023)
by: Galkin, Mikhail, et al.
Published: (2023)
DSAEval: Evaluating Data Science Agents on a Wide Range of Real-World Data Science Problems
by: Sun, Maojun, et al.
Published: (2026)
by: Sun, Maojun, et al.
Published: (2026)
Batched Low-Rank Adaptation of Foundation Models
by: Wen, Yeming, et al.
Published: (2023)
by: Wen, Yeming, et al.
Published: (2023)
Yi: Open Foundation Models by 01.AI
by: AI, 01., et al.
Published: (2024)
by: AI, 01., et al.
Published: (2024)
SAMGPT: Text-free Graph Foundation Model for Multi-domain Pre-training and Cross-domain Adaptation
by: Yu, Xingtong, et al.
Published: (2025)
by: Yu, Xingtong, et al.
Published: (2025)
QuarkMedBench: A Real-World Scenario Driven Benchmark for Evaluating Large Language Models
by: Wu, Yao, et al.
Published: (2026)
by: Wu, Yao, et al.
Published: (2026)
Beyond Completion: A Foundation Model for General Knowledge Graph Reasoning
by: Hua, Yin, et al.
Published: (2025)
by: Hua, Yin, et al.
Published: (2025)
FAMA: The First Large-Scale Open-Science Speech Foundation Model for English and Italian
by: Papi, Sara, et al.
Published: (2025)
by: Papi, Sara, et al.
Published: (2025)
Evaluating the Effectiveness of the Foundational Models for Q&A Classification in Mental Health care
by: Alhuzali, Hassan, et al.
Published: (2024)
by: Alhuzali, Hassan, et al.
Published: (2024)
MMAI Gym for Science: Training Liquid Foundation Models for Drug Discovery
by: Kuznetsov, Maksim, et al.
Published: (2026)
by: Kuznetsov, Maksim, et al.
Published: (2026)
A Prompt-Based Knowledge Graph Foundation Model for Universal In-Context Reasoning
by: Cui, Yuanning, et al.
Published: (2024)
by: Cui, Yuanning, et al.
Published: (2024)
Symbol-LLM: Towards Foundational Symbol-centric Interface For Large Language Models
by: Xu, Fangzhi, et al.
Published: (2023)
by: Xu, Fangzhi, et al.
Published: (2023)
Luna: An Evaluation Foundation Model to Catch Language Model Hallucinations with High Accuracy and Low Cost
by: Belyi, Masha, et al.
Published: (2024)
by: Belyi, Masha, et al.
Published: (2024)
DARE-bench: Evaluating Modeling and Instruction Fidelity of LLMs in Data Science
by: Shu, Fan, et al.
Published: (2026)
by: Shu, Fan, et al.
Published: (2026)
Elsevier Arena: Human Evaluation of Chemistry/Biology/Health Foundational Large Language Models
by: Thorne, Camilo, et al.
Published: (2024)
by: Thorne, Camilo, et al.
Published: (2024)
FedDTRE: Federated Dialogue Generation Models Powered by Trustworthiness Evaluation
by: Lu, Shule, et al.
Published: (2025)
by: Lu, Shule, et al.
Published: (2025)
Weaver: Foundation Models for Creative Writing
by: Wang, Tiannan, et al.
Published: (2024)
by: Wang, Tiannan, et al.
Published: (2024)
Shadow-FT: Tuning Instruct Model via Training on Paired Base Model
by: Wu, Taiqiang, et al.
Published: (2025)
by: Wu, Taiqiang, et al.
Published: (2025)
On Speeding Up Language Model Evaluation
by: Zhou, Jin Peng, et al.
Published: (2024)
by: Zhou, Jin Peng, et al.
Published: (2024)
CodeApex: A Bilingual Programming Evaluation Benchmark for Large Language Models
by: Fu, Lingyue, et al.
Published: (2023)
by: Fu, Lingyue, et al.
Published: (2023)
AI-Driven Automation Can Become the Foundation of Next-Era Science of Science Research
by: Chen, Renqi, et al.
Published: (2025)
by: Chen, Renqi, et al.
Published: (2025)
MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts
by: Lu, Pan, et al.
Published: (2023)
by: Lu, Pan, et al.
Published: (2023)
A Survey on Multilingual Large Language Models: Corpora, Alignment, and Bias
by: Xu, Yuemei, et al.
Published: (2024)
by: Xu, Yuemei, et al.
Published: (2024)
LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models
by: Diao, Shizhe, et al.
Published: (2023)
by: Diao, Shizhe, et al.
Published: (2023)
Whispers that Shake Foundations: Analyzing and Mitigating False Premise Hallucinations in Large Language Models
by: Yuan, Hongbang, et al.
Published: (2024)
by: Yuan, Hongbang, et al.
Published: (2024)
Fine-Grained Benchmark Generation for Comprehensive Evaluation of Foundation Models
by: Islam, Mohammed Saidul, et al.
Published: (2026)
by: Islam, Mohammed Saidul, et al.
Published: (2026)
Foundation Models to Unlock Real-World Evidence from Nationwide Medical Claims
by: Ma, Fan, et al.
Published: (2026)
by: Ma, Fan, et al.
Published: (2026)
Me LLaMA: Foundation Large Language Models for Medical Applications
by: Xie, Qianqian, et al.
Published: (2024)
by: Xie, Qianqian, et al.
Published: (2024)
Similar Items
-
Federated Fine-tuning of Large Language Models under Heterogeneous Tasks and Client Resources
by: Bai, Jiamu, et al.
Published: (2024) -
DHP Benchmark: Are LLMs Good NLG Evaluators?
by: Wang, Yicheng, et al.
Published: (2024) -
Evaluating Progress in Graph Foundation Models: A Comprehensive Benchmark and New Insights
by: Yu, Xingtong, et al.
Published: (2026) -
An Exploration of Higher Education Course Evaluation by Large Language Models
by: Yuan, Bo, et al.
Published: (2024) -
CMMaTH: A Chinese Multi-modal Math Skill Evaluation Benchmark for Foundation Models
by: Li, Zhong-Zhi, et al.
Published: (2024)