Saved in:
| Main Authors: | Yan, Yunxiang, Sawada, Tomohiro, Goyal, Kartik |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.23776 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Train It and Forget It: Merge Lists are Unnecessary for BPE Inference in Language Models
by: Sawada, Tomohiro, et al.
Published: (2025)
by: Sawada, Tomohiro, et al.
Published: (2025)
Do Large Language Models have Problem-Solving Capability under Incomplete Information Scenarios?
by: Chen, Yuyan, et al.
Published: (2024)
by: Chen, Yuyan, et al.
Published: (2024)
EvaLearn: Quantifying the Learning Capability and Efficiency of LLMs via Sequential Problem Solving
by: Dou, Shihan, et al.
Published: (2025)
by: Dou, Shihan, et al.
Published: (2025)
MAP's not dead yet: Uncovering true language model modes by conditioning away degeneracy
by: Yoshida, Davis, et al.
Published: (2023)
by: Yoshida, Davis, et al.
Published: (2023)
FCoReBench: Can Large Language Models Solve Challenging First-Order Combinatorial Reasoning Problems?
by: Mittal, Chinmay, et al.
Published: (2024)
by: Mittal, Chinmay, et al.
Published: (2024)
Satisfiability Solving with LLMs: A Matched-Pair Evaluation of Reasoning Capability
by: Zhang, Leizhen, et al.
Published: (2026)
by: Zhang, Leizhen, et al.
Published: (2026)
StrategyLLM: Large Language Models as Strategy Generators, Executors, Optimizers, and Evaluators for Problem Solving
by: Gao, Chang, et al.
Published: (2023)
by: Gao, Chang, et al.
Published: (2023)
Token-Supervised Value Models for Enhancing Mathematical Problem-Solving Capabilities of Large Language Models
by: Lee, Jung Hyun, et al.
Published: (2024)
by: Lee, Jung Hyun, et al.
Published: (2024)
When in Doubt, Cascade: Towards Building Efficient and Capable Guardrails
by: Nagireddy, Manish, et al.
Published: (2024)
by: Nagireddy, Manish, et al.
Published: (2024)
PlanGEN: A Multi-Agent Framework for Generating Planning and Reasoning Trajectories for Complex Problem Solving
by: Parmar, Mihir, et al.
Published: (2025)
by: Parmar, Mihir, et al.
Published: (2025)
Fine-Tuning Qwen 2.5 3B for Realistic Movie Dialogue Generation
by: Gupta, Kartik
Published: (2025)
by: Gupta, Kartik
Published: (2025)
PLAN-TUNING: Post-Training Language Models to Learn Step-by-Step Planning for Complex Problem Solving
by: Parmar, Mihir, et al.
Published: (2025)
by: Parmar, Mihir, et al.
Published: (2025)
Logit Arithmetic Elicits Long Reasoning Capabilities Without Training
by: Zhang, Yunxiang, et al.
Published: (2025)
by: Zhang, Yunxiang, et al.
Published: (2025)
Does Learning Mathematical Problem-Solving Generalize to Broader Reasoning?
by: Zhou, Ruochen, et al.
Published: (2025)
by: Zhou, Ruochen, et al.
Published: (2025)
Updating Parametric Knowledge with Context Distillation Retains Post-Training Capabilities
by: Padmanabhan, Shankar, et al.
Published: (2026)
by: Padmanabhan, Shankar, et al.
Published: (2026)
Idea First, Code Later: Disentangling Problem Solving from Code Generation in Evaluating LLMs for Competitive Programming
by: Hadhoud, Sama, et al.
Published: (2026)
by: Hadhoud, Sama, et al.
Published: (2026)
Assessing the Capability of LLMs in Solving POSCOMP Questions
by: Viegas, Cayo, et al.
Published: (2025)
by: Viegas, Cayo, et al.
Published: (2025)
ArithmAttack: Evaluating Robustness of LLMs to Noisy Context in Math Problem Solving
by: Abedin, Zain Ul, et al.
Published: (2025)
by: Abedin, Zain Ul, et al.
Published: (2025)
Data Augmentation with In-Context Learning and Comparative Evaluation in Math Word Problem Solving
by: Yigit, Gulsum, et al.
Published: (2024)
by: Yigit, Gulsum, et al.
Published: (2024)
ABBEL: LLM Agents Acting through Belief Bottlenecks Expressed in Language
by: Lidayan, Aly, et al.
Published: (2025)
by: Lidayan, Aly, et al.
Published: (2025)
On Provable Length and Compositional Generalization
by: Ahuja, Kartik, et al.
Published: (2024)
by: Ahuja, Kartik, et al.
Published: (2024)
Using LLMs for Knowledge Component-level Correctness Labeling in Open-ended Coding Problems
by: Duan, Zhangqi, et al.
Published: (2026)
by: Duan, Zhangqi, et al.
Published: (2026)
Investigating Bias: A Multilingual Pipeline for Generating, Solving, and Evaluating Math Problems with LLMs
by: Mahran, Mariam, et al.
Published: (2025)
by: Mahran, Mariam, et al.
Published: (2025)
GCoder: Improving Large Language Model for Generalized Graph Problem Solving
by: Zhang, Qifan, et al.
Published: (2024)
by: Zhang, Qifan, et al.
Published: (2024)
Evaluating the Generation Capabilities of Large Chinese Language Models
by: Zeng, Hui, et al.
Published: (2023)
by: Zeng, Hui, et al.
Published: (2023)
Mathify: Evaluating Large Language Models on Mathematical Problem Solving Tasks
by: Anand, Avinash, et al.
Published: (2024)
by: Anand, Avinash, et al.
Published: (2024)
Collaborative Problem-Solving in an Optimization Game
by: Jeknic, Isidora, et al.
Published: (2025)
by: Jeknic, Isidora, et al.
Published: (2025)
IndicDB -- Benchmarking Multilingual Text-to-SQL Capabilities in Indian Languages
by: Dawar, Aviral, et al.
Published: (2026)
by: Dawar, Aviral, et al.
Published: (2026)
Logit Arithmetic Elicits Long Reasoning Capabilities Without Training
by: Zhang, Yunxiang, et al.
Published: (2025)
by: Zhang, Yunxiang, et al.
Published: (2025)
MathEDU: Feedback Generation on Problem-Solving Processes for Mathematical Learning Support
by: Hsu, Wei-Ling, et al.
Published: (2025)
by: Hsu, Wei-Ling, et al.
Published: (2025)
Learning to Solve Domain-Specific Calculation Problems with Knowledge-Intensive Programs Generator
by: Liu, Chengyuan, et al.
Published: (2024)
by: Liu, Chengyuan, et al.
Published: (2024)
Synthetic Data Generation and Joint Learning for Robust Code-Mixed Translation
by: Kartik, Kartik, et al.
Published: (2024)
by: Kartik, Kartik, et al.
Published: (2024)
Steamroller Problems: An Evaluation of LLM Reasoning Capability with Automated Theorem Prover Strategies
by: McGinness, Lachlan, et al.
Published: (2024)
by: McGinness, Lachlan, et al.
Published: (2024)
EduVidQA: Generating and Evaluating Long-form Answers to Student Questions based on Lecture Videos
by: Ray, Sourjyadip, et al.
Published: (2025)
by: Ray, Sourjyadip, et al.
Published: (2025)
Erasing with Precision: Evaluating Specific Concept Erasure from Text-to-Image Generative Models
by: Fuchi, Masane, et al.
Published: (2025)
by: Fuchi, Masane, et al.
Published: (2025)
QUIET: A Multi-Blank Cascaded Story Cloze Benchmark for LLM Creative Generation Capability
by: Zou, Bo, et al.
Published: (2026)
by: Zou, Bo, et al.
Published: (2026)
From Problem-Solving to Teaching Problem-Solving: Aligning LLMs with Pedagogy using Reinforcement Learning
by: Dinucu-Jianu, David, et al.
Published: (2025)
by: Dinucu-Jianu, David, et al.
Published: (2025)
Pi-GPS: Enhancing Geometry Problem Solving by Unleashing the Power of Diagrammatic Information
by: Zhao, Junbo, et al.
Published: (2025)
by: Zhao, Junbo, et al.
Published: (2025)
MapCoder: Multi-Agent Code Generation for Competitive Problem Solving
by: Islam, Md. Ashraful, et al.
Published: (2024)
by: Islam, Md. Ashraful, et al.
Published: (2024)
Faithful Model Evaluation for Model-Based Metrics
by: Goyal, Palash, et al.
Published: (2023)
by: Goyal, Palash, et al.
Published: (2023)
Similar Items
-
Train It and Forget It: Merge Lists are Unnecessary for BPE Inference in Language Models
by: Sawada, Tomohiro, et al.
Published: (2025) -
Do Large Language Models have Problem-Solving Capability under Incomplete Information Scenarios?
by: Chen, Yuyan, et al.
Published: (2024) -
EvaLearn: Quantifying the Learning Capability and Efficiency of LLMs via Sequential Problem Solving
by: Dou, Shihan, et al.
Published: (2025) -
MAP's not dead yet: Uncovering true language model modes by conditioning away degeneracy
by: Yoshida, Davis, et al.
Published: (2023) -
FCoReBench: Can Large Language Models Solve Challenging First-Order Combinatorial Reasoning Problems?
by: Mittal, Chinmay, et al.
Published: (2024)