:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yan, Yunxiang, Sawada, Tomohiro, Goyal, Kartik
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2507.23776
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Train It and Forget It: Merge Lists are Unnecessary for BPE Inference in Language Models
by: Sawada, Tomohiro, et al.
Published: (2025)

Do Large Language Models have Problem-Solving Capability under Incomplete Information Scenarios?
by: Chen, Yuyan, et al.
Published: (2024)

EvaLearn: Quantifying the Learning Capability and Efficiency of LLMs via Sequential Problem Solving
by: Dou, Shihan, et al.
Published: (2025)

MAP's not dead yet: Uncovering true language model modes by conditioning away degeneracy
by: Yoshida, Davis, et al.
Published: (2023)

FCoReBench: Can Large Language Models Solve Challenging First-Order Combinatorial Reasoning Problems?
by: Mittal, Chinmay, et al.
Published: (2024)

Satisfiability Solving with LLMs: A Matched-Pair Evaluation of Reasoning Capability
by: Zhang, Leizhen, et al.
Published: (2026)

StrategyLLM: Large Language Models as Strategy Generators, Executors, Optimizers, and Evaluators for Problem Solving
by: Gao, Chang, et al.
Published: (2023)

Token-Supervised Value Models for Enhancing Mathematical Problem-Solving Capabilities of Large Language Models
by: Lee, Jung Hyun, et al.
Published: (2024)

When in Doubt, Cascade: Towards Building Efficient and Capable Guardrails
by: Nagireddy, Manish, et al.
Published: (2024)

PlanGEN: A Multi-Agent Framework for Generating Planning and Reasoning Trajectories for Complex Problem Solving
by: Parmar, Mihir, et al.
Published: (2025)

Fine-Tuning Qwen 2.5 3B for Realistic Movie Dialogue Generation
by: Gupta, Kartik
Published: (2025)

PLAN-TUNING: Post-Training Language Models to Learn Step-by-Step Planning for Complex Problem Solving
by: Parmar, Mihir, et al.
Published: (2025)

Logit Arithmetic Elicits Long Reasoning Capabilities Without Training
by: Zhang, Yunxiang, et al.
Published: (2025)

Does Learning Mathematical Problem-Solving Generalize to Broader Reasoning?
by: Zhou, Ruochen, et al.
Published: (2025)

Updating Parametric Knowledge with Context Distillation Retains Post-Training Capabilities
by: Padmanabhan, Shankar, et al.
Published: (2026)

Idea First, Code Later: Disentangling Problem Solving from Code Generation in Evaluating LLMs for Competitive Programming
by: Hadhoud, Sama, et al.
Published: (2026)

Assessing the Capability of LLMs in Solving POSCOMP Questions
by: Viegas, Cayo, et al.
Published: (2025)

ArithmAttack: Evaluating Robustness of LLMs to Noisy Context in Math Problem Solving
by: Abedin, Zain Ul, et al.
Published: (2025)

Data Augmentation with In-Context Learning and Comparative Evaluation in Math Word Problem Solving
by: Yigit, Gulsum, et al.
Published: (2024)

ABBEL: LLM Agents Acting through Belief Bottlenecks Expressed in Language
by: Lidayan, Aly, et al.
Published: (2025)

On Provable Length and Compositional Generalization
by: Ahuja, Kartik, et al.
Published: (2024)

Using LLMs for Knowledge Component-level Correctness Labeling in Open-ended Coding Problems
by: Duan, Zhangqi, et al.
Published: (2026)

Investigating Bias: A Multilingual Pipeline for Generating, Solving, and Evaluating Math Problems with LLMs
by: Mahran, Mariam, et al.
Published: (2025)

GCoder: Improving Large Language Model for Generalized Graph Problem Solving
by: Zhang, Qifan, et al.
Published: (2024)

Evaluating the Generation Capabilities of Large Chinese Language Models
by: Zeng, Hui, et al.
Published: (2023)

Mathify: Evaluating Large Language Models on Mathematical Problem Solving Tasks
by: Anand, Avinash, et al.
Published: (2024)

Collaborative Problem-Solving in an Optimization Game
by: Jeknic, Isidora, et al.
Published: (2025)

IndicDB -- Benchmarking Multilingual Text-to-SQL Capabilities in Indian Languages
by: Dawar, Aviral, et al.
Published: (2026)

Logit Arithmetic Elicits Long Reasoning Capabilities Without Training
by: Zhang, Yunxiang, et al.
Published: (2025)

MathEDU: Feedback Generation on Problem-Solving Processes for Mathematical Learning Support
by: Hsu, Wei-Ling, et al.
Published: (2025)

Learning to Solve Domain-Specific Calculation Problems with Knowledge-Intensive Programs Generator
by: Liu, Chengyuan, et al.
Published: (2024)

Synthetic Data Generation and Joint Learning for Robust Code-Mixed Translation
by: Kartik, Kartik, et al.
Published: (2024)

Steamroller Problems: An Evaluation of LLM Reasoning Capability with Automated Theorem Prover Strategies
by: McGinness, Lachlan, et al.
Published: (2024)

EduVidQA: Generating and Evaluating Long-form Answers to Student Questions based on Lecture Videos
by: Ray, Sourjyadip, et al.
Published: (2025)

Erasing with Precision: Evaluating Specific Concept Erasure from Text-to-Image Generative Models
by: Fuchi, Masane, et al.
Published: (2025)

QUIET: A Multi-Blank Cascaded Story Cloze Benchmark for LLM Creative Generation Capability
by: Zou, Bo, et al.
Published: (2026)

From Problem-Solving to Teaching Problem-Solving: Aligning LLMs with Pedagogy using Reinforcement Learning
by: Dinucu-Jianu, David, et al.
Published: (2025)

Pi-GPS: Enhancing Geometry Problem Solving by Unleashing the Power of Diagrammatic Information
by: Zhao, Junbo, et al.
Published: (2025)

MapCoder: Multi-Agent Code Generation for Competitive Problem Solving
by: Islam, Md. Ashraful, et al.
Published: (2024)

Faithful Model Evaluation for Model-Based Metrics
by: Goyal, Palash, et al.
Published: (2023)