Saved in:
| Main Authors: | Sun, Yifei, Li, Yongan, Qin, A. K., Hou, Sicheng, Pflanzner, Tamas |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.11792 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Response-free item difficulty modelling for multiple-choice items with fine-tuned transformers: Component-wise representation and multi-task learning
by: Netík, Jan, et al.
Published: (2026)
by: Netík, Jan, et al.
Published: (2026)
Research on intelligent generation of structural demolition suggestions based on multi-model collaboration
by: Yang, Zhifeng, et al.
Published: (2025)
by: Yang, Zhifeng, et al.
Published: (2025)
Two are better than one: Context window extension with multi-grained self-injection
by: Han, Wei, et al.
Published: (2024)
by: Han, Wei, et al.
Published: (2024)
Investigating the performance of Retrieval-Augmented Generation and fine-tuning for the development of AI-driven knowledge-based systems
by: Lakatos, Robert, et al.
Published: (2024)
by: Lakatos, Robert, et al.
Published: (2024)
MARS: toward more efficient multi-agent collaboration for LLM reasoning
by: Wang, Xiao, et al.
Published: (2025)
by: Wang, Xiao, et al.
Published: (2025)
THiNK: Can Large Language Models Think-aloud?
by: Yu, Yongan, et al.
Published: (2025)
by: Yu, Yongan, et al.
Published: (2025)
TripScore: Benchmarking and rewarding real-world travel planning with fine-grained evaluation
by: Qu, Yincen, et al.
Published: (2025)
by: Qu, Yincen, et al.
Published: (2025)
Can Vision Language Models Be Adaptive in Mathematics Education? A Learner Model-based Rubric Study
by: Gao, Jie, et al.
Published: (2026)
by: Gao, Jie, et al.
Published: (2026)
WXImpactBench: A Disruptive Weather Impact Understanding Benchmark for Evaluating Large Language Models
by: Yu, Yongan, et al.
Published: (2025)
by: Yu, Yongan, et al.
Published: (2025)
Zero-knowledge LLM hallucination detection and mitigation through fine-grained cross-model consistency
by: Goel, Aman, et al.
Published: (2025)
by: Goel, Aman, et al.
Published: (2025)
Comparing large language models and human programmers for generating programming code
by: Hou, Wenpin, et al.
Published: (2024)
by: Hou, Wenpin, et al.
Published: (2024)
GRADE: Generating multi-hop QA and fine-gRAined Difficulty matrix for RAG Evaluation
by: Lee, Jeongsoo, et al.
Published: (2025)
by: Lee, Jeongsoo, et al.
Published: (2025)
Language Ranker: A Lightweight Ranking framework for LLM Decoding
by: Zhang, Chenheng, et al.
Published: (2025)
by: Zhang, Chenheng, et al.
Published: (2025)
QA-TOOLBOX: Conversational Question-Answering for process task guidance in manufacturing
by: Manuvinakurike, Ramesh, et al.
Published: (2024)
by: Manuvinakurike, Ramesh, et al.
Published: (2024)
How secure is AI-generated Code: A Large-Scale Comparison of Large Language Models
by: Tihanyi, Norbert, et al.
Published: (2024)
by: Tihanyi, Norbert, et al.
Published: (2024)
CoEx -- Co-evolving World-model and Exploration
by: Kim, Minsoo, et al.
Published: (2025)
by: Kim, Minsoo, et al.
Published: (2025)
Can reasoning models comprehend mathematical problems in Chinese ancient texts? An empirical study based on data from Suanjing Shishu
by: Liu, Chang, et al.
Published: (2025)
by: Liu, Chang, et al.
Published: (2025)
Discerning minds or generic tutors? Evaluating instructional guidance capabilities in Socratic LLMs
by: Liu, Ying, et al.
Published: (2025)
by: Liu, Ying, et al.
Published: (2025)
A systematic framework for generating novel experimental hypotheses from language models
by: Misra, Kanishka, et al.
Published: (2024)
by: Misra, Kanishka, et al.
Published: (2024)
PAREDA: A Multi-Accent Speech Dataset of Natural Language Processing Research Discussions
by: Jin, Sicheng, et al.
Published: (2026)
by: Jin, Sicheng, et al.
Published: (2026)
RPC-Bench: A Fine-grained Benchmark for Research Paper Comprehension
by: Chen, Yelin, et al.
Published: (2026)
by: Chen, Yelin, et al.
Published: (2026)
ChineseWebText 2.0: Large-Scale High-quality Chinese Web Text with Multi-dimensional and fine-grained information
by: Zhang, Wanyue, et al.
Published: (2024)
by: Zhang, Wanyue, et al.
Published: (2024)
When Facts Change: Probing LLMs on Evolving Knowledge with evolveQA
by: Nakshatri, Nishanth Sridhar, et al.
Published: (2025)
by: Nakshatri, Nishanth Sridhar, et al.
Published: (2025)
Diverse And Private Synthetic Datasets Generation for RAG evaluation: A multi-agent framework
by: Driouich, Ilias, et al.
Published: (2025)
by: Driouich, Ilias, et al.
Published: (2025)
Reshaping MOFs text mining with a dynamic multi-agents framework of large language model
by: Lin, Zuhong, et al.
Published: (2025)
by: Lin, Zuhong, et al.
Published: (2025)
Can LLM generate interesting mathematical research problems?
by: Chen, Xiaoyang, et al.
Published: (2026)
by: Chen, Xiaoyang, et al.
Published: (2026)
Auto-Cypher: Improving LLMs on Cypher generation via LLM-supervised generation-verification framework
by: Tiwari, Aman, et al.
Published: (2024)
by: Tiwari, Aman, et al.
Published: (2024)
WiseMind: a knowledge-guided multi-agent framework for accurate and empathetic psychiatric diagnosis
by: Wu, Yuqi, et al.
Published: (2025)
by: Wu, Yuqi, et al.
Published: (2025)
Efficient Reasoning Models: A Survey
by: Feng, Sicheng, et al.
Published: (2025)
by: Feng, Sicheng, et al.
Published: (2025)
Knowledge-Graph Based RAG System Evaluation Framework
by: Dong, Sicheng, et al.
Published: (2025)
by: Dong, Sicheng, et al.
Published: (2025)
Fine-grained Verification via Diagnostic Reasoning Supervision for Aspect Sentiment Triplet Extraction
by: Lai, Wenna, et al.
Published: (2026)
by: Lai, Wenna, et al.
Published: (2026)
Minor SFT loss for LLM fine-tune to increase performance and reduce model deviation
by: Xie, Shiming, et al.
Published: (2024)
by: Xie, Shiming, et al.
Published: (2024)
Artifical intelligence and inherent mathematical difficulty
by: Dean, Walter, et al.
Published: (2024)
by: Dean, Walter, et al.
Published: (2024)
COMET: "Cone of experience" enhanced large multimodal model for mathematical problem generation
by: Liu, Sannyuya, et al.
Published: (2024)
by: Liu, Sannyuya, et al.
Published: (2024)
MathDivide: Improved mathematical reasoning by large language models
by: Srivastava, Saksham Sahai, et al.
Published: (2024)
by: Srivastava, Saksham Sahai, et al.
Published: (2024)
Self-evolving AI agents for protein discovery and directed evolution
by: Tan, Yang, et al.
Published: (2026)
by: Tan, Yang, et al.
Published: (2026)
MARS: Co-evolving Dual-System Deep Research via Multi-Agent Reinforcement Learning
by: Chen, Guoxin, et al.
Published: (2025)
by: Chen, Guoxin, et al.
Published: (2025)
Harnessing Chain-of-Thought Metadata for Task Routing and Adversarial Prompt Detection
by: Marinelli, Ryan, et al.
Published: (2025)
by: Marinelli, Ryan, et al.
Published: (2025)
ProtAgents: Protein discovery via large language model multi-agent collaborations combining physics and machine learning
by: Ghafarollahi, A., et al.
Published: (2024)
by: Ghafarollahi, A., et al.
Published: (2024)
ProdRev: A DNN framework for empowering customers using generative pre-trained transformers
by: Gupta, Aakash, et al.
Published: (2025)
by: Gupta, Aakash, et al.
Published: (2025)
Similar Items
-
Response-free item difficulty modelling for multiple-choice items with fine-tuned transformers: Component-wise representation and multi-task learning
by: Netík, Jan, et al.
Published: (2026) -
Research on intelligent generation of structural demolition suggestions based on multi-model collaboration
by: Yang, Zhifeng, et al.
Published: (2025) -
Two are better than one: Context window extension with multi-grained self-injection
by: Han, Wei, et al.
Published: (2024) -
Investigating the performance of Retrieval-Augmented Generation and fine-tuning for the development of AI-driven knowledge-based systems
by: Lakatos, Robert, et al.
Published: (2024) -
MARS: toward more efficient multi-agent collaboration for LLM reasoning
by: Wang, Xiao, et al.
Published: (2025)