:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Sun, Yifei, Li, Yongan, Qin, A. K., Hou, Sicheng, Pflanzner, Tamas
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence Computation and Language
Online Access:	https://arxiv.org/abs/2601.11792
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Response-free item difficulty modelling for multiple-choice items with fine-tuned transformers: Component-wise representation and multi-task learning
by: Netík, Jan, et al.
Published: (2026)

Research on intelligent generation of structural demolition suggestions based on multi-model collaboration
by: Yang, Zhifeng, et al.
Published: (2025)

Two are better than one: Context window extension with multi-grained self-injection
by: Han, Wei, et al.
Published: (2024)

Investigating the performance of Retrieval-Augmented Generation and fine-tuning for the development of AI-driven knowledge-based systems
by: Lakatos, Robert, et al.
Published: (2024)

MARS: toward more efficient multi-agent collaboration for LLM reasoning
by: Wang, Xiao, et al.
Published: (2025)

THiNK: Can Large Language Models Think-aloud?
by: Yu, Yongan, et al.
Published: (2025)

TripScore: Benchmarking and rewarding real-world travel planning with fine-grained evaluation
by: Qu, Yincen, et al.
Published: (2025)

Can Vision Language Models Be Adaptive in Mathematics Education? A Learner Model-based Rubric Study
by: Gao, Jie, et al.
Published: (2026)

WXImpactBench: A Disruptive Weather Impact Understanding Benchmark for Evaluating Large Language Models
by: Yu, Yongan, et al.
Published: (2025)

Zero-knowledge LLM hallucination detection and mitigation through fine-grained cross-model consistency
by: Goel, Aman, et al.
Published: (2025)

Comparing large language models and human programmers for generating programming code
by: Hou, Wenpin, et al.
Published: (2024)

GRADE: Generating multi-hop QA and fine-gRAined Difficulty matrix for RAG Evaluation
by: Lee, Jeongsoo, et al.
Published: (2025)

Language Ranker: A Lightweight Ranking framework for LLM Decoding
by: Zhang, Chenheng, et al.
Published: (2025)

QA-TOOLBOX: Conversational Question-Answering for process task guidance in manufacturing
by: Manuvinakurike, Ramesh, et al.
Published: (2024)

How secure is AI-generated Code: A Large-Scale Comparison of Large Language Models
by: Tihanyi, Norbert, et al.
Published: (2024)

CoEx -- Co-evolving World-model and Exploration
by: Kim, Minsoo, et al.
Published: (2025)

Can reasoning models comprehend mathematical problems in Chinese ancient texts? An empirical study based on data from Suanjing Shishu
by: Liu, Chang, et al.
Published: (2025)

Discerning minds or generic tutors? Evaluating instructional guidance capabilities in Socratic LLMs
by: Liu, Ying, et al.
Published: (2025)

A systematic framework for generating novel experimental hypotheses from language models
by: Misra, Kanishka, et al.
Published: (2024)

PAREDA: A Multi-Accent Speech Dataset of Natural Language Processing Research Discussions
by: Jin, Sicheng, et al.
Published: (2026)

RPC-Bench: A Fine-grained Benchmark for Research Paper Comprehension
by: Chen, Yelin, et al.
Published: (2026)

ChineseWebText 2.0: Large-Scale High-quality Chinese Web Text with Multi-dimensional and fine-grained information
by: Zhang, Wanyue, et al.
Published: (2024)

When Facts Change: Probing LLMs on Evolving Knowledge with evolveQA
by: Nakshatri, Nishanth Sridhar, et al.
Published: (2025)

Diverse And Private Synthetic Datasets Generation for RAG evaluation: A multi-agent framework
by: Driouich, Ilias, et al.
Published: (2025)

Reshaping MOFs text mining with a dynamic multi-agents framework of large language model
by: Lin, Zuhong, et al.
Published: (2025)

Can LLM generate interesting mathematical research problems?
by: Chen, Xiaoyang, et al.
Published: (2026)

Auto-Cypher: Improving LLMs on Cypher generation via LLM-supervised generation-verification framework
by: Tiwari, Aman, et al.
Published: (2024)

WiseMind: a knowledge-guided multi-agent framework for accurate and empathetic psychiatric diagnosis
by: Wu, Yuqi, et al.
Published: (2025)

Efficient Reasoning Models: A Survey
by: Feng, Sicheng, et al.
Published: (2025)

Knowledge-Graph Based RAG System Evaluation Framework
by: Dong, Sicheng, et al.
Published: (2025)

Fine-grained Verification via Diagnostic Reasoning Supervision for Aspect Sentiment Triplet Extraction
by: Lai, Wenna, et al.
Published: (2026)

Minor SFT loss for LLM fine-tune to increase performance and reduce model deviation
by: Xie, Shiming, et al.
Published: (2024)

Artifical intelligence and inherent mathematical difficulty
by: Dean, Walter, et al.
Published: (2024)

COMET: "Cone of experience" enhanced large multimodal model for mathematical problem generation
by: Liu, Sannyuya, et al.
Published: (2024)

MathDivide: Improved mathematical reasoning by large language models
by: Srivastava, Saksham Sahai, et al.
Published: (2024)

Self-evolving AI agents for protein discovery and directed evolution
by: Tan, Yang, et al.
Published: (2026)

MARS: Co-evolving Dual-System Deep Research via Multi-Agent Reinforcement Learning
by: Chen, Guoxin, et al.
Published: (2025)

Harnessing Chain-of-Thought Metadata for Task Routing and Adversarial Prompt Detection
by: Marinelli, Ryan, et al.
Published: (2025)

ProtAgents: Protein discovery via large language model multi-agent collaborations combining physics and machine learning
by: Ghafarollahi, A., et al.
Published: (2024)

ProdRev: A DNN framework for empowering customers using generative pre-trained transformers
by: Gupta, Aakash, et al.
Published: (2025)