Saved in:
| Main Authors: | Moore, Steven, Costello, Eamon, Nguyen, Huy A., Stamper, John |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2405.20529 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Automated Generation and Tagging of Knowledge Components from Multiple-Choice Questions
by: Moore, Steven, et al.
Published: (2024)
by: Moore, Steven, et al.
Published: (2024)
Cognitive Agent Compilation for Explicit Problem Solver Modeling
by: Moon, Hyeongdon, et al.
Published: (2026)
by: Moon, Hyeongdon, et al.
Published: (2026)
Automatic Evaluation of Healthcare LLMs Beyond Question-Answering
by: Arias-Duart, Anna, et al.
Published: (2025)
by: Arias-Duart, Anna, et al.
Published: (2025)
Small but Significant: On the Promise of Small Language Models for Accessible AIED
by: Wei, Yumou, et al.
Published: (2025)
by: Wei, Yumou, et al.
Published: (2025)
Long-Span Question-Answering: Automatic Question Generation and QA-System Ranking via Side-by-Side Evaluation
by: Bohnet, Bernd, et al.
Published: (2024)
by: Bohnet, Bernd, et al.
Published: (2024)
AGENT-CQ: Automatic Generation and Evaluation of Clarifying Questions for Conversational Search with LLMs
by: Siro, Clemencia, et al.
Published: (2024)
by: Siro, Clemencia, et al.
Published: (2024)
Automatic Generation of Inference Making Questions for Reading Comprehension Assessments
by: Ma, Wanjing Anya, et al.
Published: (2025)
by: Ma, Wanjing Anya, et al.
Published: (2025)
Automatic Dataset Generation for Knowledge Intensive Question Answering Tasks
by: Yuen, Sizhe, et al.
Published: (2025)
by: Yuen, Sizhe, et al.
Published: (2025)
Hallucination-Free Automatic Question & Answer Generation for Intuitive Learning
by: Wang, Nicholas X., et al.
Published: (2026)
by: Wang, Nicholas X., et al.
Published: (2026)
Automatic Question & Answer Generation Using Generative Large Language Model (LLM)
by: Ehsan, Md. Alvee, et al.
Published: (2025)
by: Ehsan, Md. Alvee, et al.
Published: (2025)
WalledEval: A Comprehensive Safety Evaluation Toolkit for Large Language Models
by: Gupta, Prannaya, et al.
Published: (2024)
by: Gupta, Prannaya, et al.
Published: (2024)
SciEvalKit: An Open-source Evaluation Toolkit for Scientific General Intelligence
by: Wang, Yiheng, et al.
Published: (2025)
by: Wang, Yiheng, et al.
Published: (2025)
Are Large Language Models Consistent over Value-laden Questions?
by: Moore, Jared, et al.
Published: (2024)
by: Moore, Jared, et al.
Published: (2024)
Confabulation: The Surprising Value of Large Language Model Hallucinations
by: Sui, Peiqi, et al.
Published: (2024)
by: Sui, Peiqi, et al.
Published: (2024)
Comprehensive and Practical Evaluation of Retrieval-Augmented Generation Systems for Medical Question Answering
by: Ngo, Nghia Trung, et al.
Published: (2024)
by: Ngo, Nghia Trung, et al.
Published: (2024)
Bias in the Tails: How Name-conditioned Evaluative Framing in Resume Summaries Destabilizes LLM-based Hiring
by: Nghiem, Huy, et al.
Published: (2026)
by: Nghiem, Huy, et al.
Published: (2026)
CompeteSMoE -- Statistically Guaranteed Mixture of Experts Training via Competition
by: Nguyen, Nam V., et al.
Published: (2025)
by: Nguyen, Nam V., et al.
Published: (2025)
Balancing Safety and Helpfulness in Healthcare AI Assistants through Iterative Preference Alignment
by: Nghiem, Huy, et al.
Published: (2025)
by: Nghiem, Huy, et al.
Published: (2025)
From Answers to Questions: EQGBench for Evaluating LLMs' Educational Question Generation
by: Zhou, Chengliang, et al.
Published: (2025)
by: Zhou, Chengliang, et al.
Published: (2025)
R-Eval: A Unified Toolkit for Evaluating Domain Knowledge of Retrieval Augmented Large Language Models
by: Tu, Shangqing, et al.
Published: (2024)
by: Tu, Shangqing, et al.
Published: (2024)
Automatic Legal Writing Evaluation of LLMs
by: Pires, Ramon, et al.
Published: (2025)
by: Pires, Ramon, et al.
Published: (2025)
Sketch: A Toolkit for Streamlining LLM Operations
by: Jiang, Xin, et al.
Published: (2024)
by: Jiang, Xin, et al.
Published: (2024)
The Impact of Item-Writing Flaws on Difficulty and Discrimination in Item Response Theory
by: Schmucker, Robin, et al.
Published: (2025)
by: Schmucker, Robin, et al.
Published: (2025)
Automatic Generation of Question Hints for Mathematics Problems using Large Language Models in Educational Technology
by: Tonga, Junior Cedric, et al.
Published: (2024)
by: Tonga, Junior Cedric, et al.
Published: (2024)
Training Computer Use Agents to Assess the Usability of Graphical User Interfaces
by: Gao, Alice, et al.
Published: (2026)
by: Gao, Alice, et al.
Published: (2026)
Adaptive Stopping for Multi-Turn LLM Reasoning
by: Zhou, Xiaofan, et al.
Published: (2026)
by: Zhou, Xiaofan, et al.
Published: (2026)
Foundational Automatic Evaluators: Scaling Multi-Task Generative Evaluator Training for Reasoning-Centric Domains
by: Xu, Austin, et al.
Published: (2025)
by: Xu, Austin, et al.
Published: (2025)
VLSP 2025 MLQA-TSR Challenge: Vietnamese Multimodal Legal Question Answering on Traffic Sign Regulation
by: Luu, Son T., et al.
Published: (2025)
by: Luu, Son T., et al.
Published: (2025)
Feedback Forensics: A Toolkit to Measure AI Personality
by: Findeis, Arduin, et al.
Published: (2025)
by: Findeis, Arduin, et al.
Published: (2025)
VLQA: The First Comprehensive, Large, and High-Quality Vietnamese Dataset for Legal Question Answering
by: Nguyen, Tan-Minh, et al.
Published: (2025)
by: Nguyen, Tan-Minh, et al.
Published: (2025)
Evaluating the Fitness of Ontologies for the Task of Question Generation
by: Alkhuzaey, Samah, et al.
Published: (2025)
by: Alkhuzaey, Samah, et al.
Published: (2025)
Qworld: Question-Specific Evaluation Criteria for LLMs
by: Gao, Shanghua, et al.
Published: (2026)
by: Gao, Shanghua, et al.
Published: (2026)
Can we train ASR systems on Code-switch without real code-switch data? Case study for Singapore's languages
by: Nguyen, Tuan, et al.
Published: (2025)
by: Nguyen, Tuan, et al.
Published: (2025)
Cross-Attention Watermarking of Large Language Models
by: Baldassini, Folco Bertini, et al.
Published: (2024)
by: Baldassini, Folco Bertini, et al.
Published: (2024)
Jury: A Comprehensive Evaluation Toolkit
by: Cavusoglu, Devrim, et al.
Published: (2023)
by: Cavusoglu, Devrim, et al.
Published: (2023)
Towards Automatic Evaluation of Task-Oriented Dialogue Flows
by: Mirtaheri, Mehrnoosh, et al.
Published: (2024)
by: Mirtaheri, Mehrnoosh, et al.
Published: (2024)
Submodular Evaluation Subset Selection in Automatic Prompt Optimization
by: Nian, Jinming, et al.
Published: (2026)
by: Nian, Jinming, et al.
Published: (2026)
LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models
by: Diao, Shizhe, et al.
Published: (2023)
by: Diao, Shizhe, et al.
Published: (2023)
JailbreakEval: An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language Models
by: Ran, Delong, et al.
Published: (2024)
by: Ran, Delong, et al.
Published: (2024)
GR-NLP-TOOLKIT: An Open-Source NLP Toolkit for Modern Greek
by: Loukas, Lefteris, et al.
Published: (2024)
by: Loukas, Lefteris, et al.
Published: (2024)
Similar Items
-
Automated Generation and Tagging of Knowledge Components from Multiple-Choice Questions
by: Moore, Steven, et al.
Published: (2024) -
Cognitive Agent Compilation for Explicit Problem Solver Modeling
by: Moon, Hyeongdon, et al.
Published: (2026) -
Automatic Evaluation of Healthcare LLMs Beyond Question-Answering
by: Arias-Duart, Anna, et al.
Published: (2025) -
Small but Significant: On the Promise of Small Language Models for Accessible AIED
by: Wei, Yumou, et al.
Published: (2025) -
Long-Span Question-Answering: Automatic Question Generation and QA-System Ranking via Side-by-Side Evaluation
by: Bohnet, Bernd, et al.
Published: (2024)