Enregistré dans:
| Auteurs principaux: | Li, Chen, Wang, Weiqi, Hu, Jingcheng, Wei, Yixuan, Zheng, Nanning, Hu, Han, Zhang, Zheng, Peng, Houwen |
|---|---|
| Format: | Preprint |
| Publié: |
2024
|
| Sujets: | |
| Accès en ligne: | https://arxiv.org/abs/2403.04706 |
| Tags: |
Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
|
Documents similaires
Do Large Language Models Possess Sensitive to Sentiment?
par: Liu, Yang, et autres
Publié: (2024)
par: Liu, Yang, et autres
Publié: (2024)
LongReasonArena: A Long Reasoning Benchmark for Large Language Models
par: Ding, Jiayu, et autres
Publié: (2025)
par: Ding, Jiayu, et autres
Publié: (2025)
Xwin-LM: Strong and Scalable Alignment Practice for LLMs
par: Ni, Bolin, et autres
Publié: (2024)
par: Ni, Bolin, et autres
Publié: (2024)
AgentMath: Empowering Mathematical Reasoning for Large Language Models via Tool-Augmented Agent
par: Luo, Haipeng, et autres
Publié: (2025)
par: Luo, Haipeng, et autres
Publié: (2025)
Confidence Matters: Revisiting Intrinsic Self-Correction Capabilities of Large Language Models
par: Li, Loka, et autres
Publié: (2024)
par: Li, Loka, et autres
Publié: (2024)
MuBench: Assessment of Multilingual Capabilities of Large Language Models Across 61 Languages
par: Han, Wenhan, et autres
Publié: (2025)
par: Han, Wenhan, et autres
Publié: (2025)
AI Scientists Fail Without Strong Implementation Capability
par: Zhu, Minjun, et autres
Publié: (2025)
par: Zhu, Minjun, et autres
Publié: (2025)
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning
par: Huan, Maggie, et autres
Publié: (2025)
par: Huan, Maggie, et autres
Publié: (2025)
SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals
par: Yang, Ruihan, et autres
Publié: (2024)
par: Yang, Ruihan, et autres
Publié: (2024)
From Summary to Action: Enhancing Large Language Models for Complex Tasks with Open World APIs
par: Liu, Yulong, et autres
Publié: (2024)
par: Liu, Yulong, et autres
Publié: (2024)
TabularMath: Understanding Math Reasoning over Tables with Large Language Models
par: Tian, Shi-Yu, et autres
Publié: (2025)
par: Tian, Shi-Yu, et autres
Publié: (2025)
Your Finetuned Large Language Model is Already a Powerful Out-of-distribution Detector
par: Zhang, Andi, et autres
Publié: (2024)
par: Zhang, Andi, et autres
Publié: (2024)
Make Your LLM Fully Utilize the Context
par: An, Shengnan, et autres
Publié: (2024)
par: An, Shengnan, et autres
Publié: (2024)
FineMath: A Fine-Grained Mathematical Evaluation Benchmark for Chinese Large Language Models
par: Liu, Yan, et autres
Publié: (2024)
par: Liu, Yan, et autres
Publié: (2024)
Learning From Mistakes Makes LLM Better Reasoner
par: An, Shengnan, et autres
Publié: (2023)
par: An, Shengnan, et autres
Publié: (2023)
OMEGA: Can LLMs Reason Outside the Box in Math? Evaluating Exploratory, Compositional, and Transformative Generalization
par: Sun, Yiyou, et autres
Publié: (2025)
par: Sun, Yiyou, et autres
Publié: (2025)
ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large Language Models
par: Wu, Yanan, et autres
Publié: (2024)
par: Wu, Yanan, et autres
Publié: (2024)
Improving the Language Understanding Capabilities of Large Language Models Using Reinforcement Learning
par: Hu, Bokai, et autres
Publié: (2024)
par: Hu, Bokai, et autres
Publié: (2024)
Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On
par: Zeng, Liang, et autres
Publié: (2024)
par: Zeng, Liang, et autres
Publié: (2024)
MultiMath: Bridging Visual and Mathematical Reasoning for Large Language Models
par: Peng, Shuai, et autres
Publié: (2024)
par: Peng, Shuai, et autres
Publié: (2024)
Training-Free Test-Time Contrastive Learning for Large Language Models
par: Zheng, Kaiwen, et autres
Publié: (2026)
par: Zheng, Kaiwen, et autres
Publié: (2026)
Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities?
par: Zeng, Zhiyuan, et autres
Publié: (2025)
par: Zeng, Zhiyuan, et autres
Publié: (2025)
Case-Based or Rule-Based: How Do Transformers Do the Math?
par: Hu, Yi, et autres
Publié: (2024)
par: Hu, Yi, et autres
Publié: (2024)
Assisting Research Proposal Writing with Large Language Models: Evaluation and Refinement
par: Ren, Jing, et autres
Publié: (2025)
par: Ren, Jing, et autres
Publié: (2025)
Let's Reason Formally: Natural-Formal Hybrid Reasoning Enhances LLM's Math Capability
par: Wang, Ruida, et autres
Publié: (2025)
par: Wang, Ruida, et autres
Publié: (2025)
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models
par: Yu, Longhui, et autres
Publié: (2023)
par: Yu, Longhui, et autres
Publié: (2023)
PLPP: Prompt Learning with Perplexity Is Self-Distillation for Vision-Language Models
par: Liu, Biao, et autres
Publié: (2024)
par: Liu, Biao, et autres
Publié: (2024)
CNSL-bench: Benchmarking the Sign Language Understanding Capabilities of MLLMs on Chinese National Sign Language
par: Zhao, Rui, et autres
Publié: (2026)
par: Zhao, Rui, et autres
Publié: (2026)
AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling
par: Liu, Zihan, et autres
Publié: (2024)
par: Liu, Zihan, et autres
Publié: (2024)
The LLM Already Knows: Estimating LLM-Perceived Question Difficulty via Hidden Representations
par: Zhu, Yubo, et autres
Publié: (2025)
par: Zhu, Yubo, et autres
Publié: (2025)
Quantifying Association Capabilities of Large Language Models and Its Implications on Privacy Leakage
par: Shao, Hanyin, et autres
Publié: (2023)
par: Shao, Hanyin, et autres
Publié: (2023)
R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning
par: Yang, Qi, et autres
Publié: (2025)
par: Yang, Qi, et autres
Publié: (2025)
Math Neurosurgery: Isolating Language Models' Math Reasoning Abilities Using Only Forward Passes
par: Christ, Bryan R., et autres
Publié: (2024)
par: Christ, Bryan R., et autres
Publié: (2024)
EssayJudge: A Multi-Granular Benchmark for Assessing Automated Essay Scoring Capabilities of Multimodal Large Language Models
par: Su, Jiamin, et autres
Publié: (2025)
par: Su, Jiamin, et autres
Publié: (2025)
Debate-to-Detect: Reformulating Misinformation Detection as a Real-World Debate with Large Language Models
par: Han, Chen, et autres
Publié: (2025)
par: Han, Chen, et autres
Publié: (2025)
DynaMath: A Dynamic Visual Benchmark for Evaluating Mathematical Reasoning Robustness of Vision Language Models
par: Zou, Chengke, et autres
Publié: (2024)
par: Zou, Chengke, et autres
Publié: (2024)
Adapting Large Language Models for Education: Foundational Capabilities, Potentials, and Challenges
par: Li, Qingyao, et autres
Publié: (2023)
par: Li, Qingyao, et autres
Publié: (2023)
Improving Bilingual Capabilities of Language Models to Support Diverse Linguistic Practices in Education
par: Syamkumar, Anand, et autres
Publié: (2024)
par: Syamkumar, Anand, et autres
Publié: (2024)
FoundaBench: Evaluating Chinese Fundamental Knowledge Capabilities of Large Language Models
par: Li, Wei, et autres
Publié: (2024)
par: Li, Wei, et autres
Publié: (2024)
IIMedGPT: Promoting Large Language Model Capabilities of Medical Tasks by Efficient Human Preference Alignment
par: Zhang, Yiming, et autres
Publié: (2025)
par: Zhang, Yiming, et autres
Publié: (2025)
Documents similaires
-
Do Large Language Models Possess Sensitive to Sentiment?
par: Liu, Yang, et autres
Publié: (2024) -
LongReasonArena: A Long Reasoning Benchmark for Large Language Models
par: Ding, Jiayu, et autres
Publié: (2025) -
Xwin-LM: Strong and Scalable Alignment Practice for LLMs
par: Ni, Bolin, et autres
Publié: (2024) -
AgentMath: Empowering Mathematical Reasoning for Large Language Models via Tool-Augmented Agent
par: Luo, Haipeng, et autres
Publié: (2025) -
Confidence Matters: Revisiting Intrinsic Self-Correction Capabilities of Large Language Models
par: Li, Loka, et autres
Publié: (2024)