Saved in:
| Main Authors: | Parmar, Mihir, Liu, Xin, Goyal, Palash, Chen, Yanfei, Le, Long, Mishra, Swaroop, Mobahi, Hossein, Gu, Jindong, Wang, Zifeng, Nakhost, Hootan, Baral, Chitta, Lee, Chen-Yu, Pfister, Tomas, Palangi, Hamid |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.16111 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
PLAN-TUNING: Post-Training Language Models to Learn Step-by-Step Planning for Complex Problem Solving
by: Parmar, Mihir, et al.
Published: (2025)
by: Parmar, Mihir, et al.
Published: (2025)
ScholarPeer: A Context-Aware Multi-Agent Framework for Automated Peer Review
by: Goyal, Palash, et al.
Published: (2026)
by: Goyal, Palash, et al.
Published: (2026)
Don't Blame the Annotator: Bias Already Starts in the Annotation Instructions
by: Parmar, Mihir, et al.
Published: (2022)
by: Parmar, Mihir, et al.
Published: (2022)
HEART: Emotionally-Driven Test-Time Scaling of Language Models
by: Pinto, Gabriela, et al.
Published: (2025)
by: Pinto, Gabriela, et al.
Published: (2025)
LLM-Based Multi-Agent Blackboard System for Information Discovery in Data Science
by: Salemi, Alireza, et al.
Published: (2025)
by: Salemi, Alireza, et al.
Published: (2025)
TarGEN: Targeted Data Generation with Large Language Models
by: Gupta, Himanshu, et al.
Published: (2023)
by: Gupta, Himanshu, et al.
Published: (2023)
GuidedSampling: Steering LLMs Towards Diverse Candidate Solutions at Inference-Time
by: Handa, Divij, et al.
Published: (2025)
by: Handa, Divij, et al.
Published: (2025)
Mid-Training with Self-Generated Data Improves Reinforcement Learning in Language Models
by: RRV, Aswin, et al.
Published: (2026)
by: RRV, Aswin, et al.
Published: (2026)
Polymath: A Challenging Multi-modal Mathematical Reasoning Benchmark
by: Gupta, Himanshu, et al.
Published: (2024)
by: Gupta, Himanshu, et al.
Published: (2024)
TFRBench: A Reasoning Benchmark for Evaluating Forecasting Systems
by: Ahamed, Md Atik, et al.
Published: (2026)
by: Ahamed, Md Atik, et al.
Published: (2026)
Synapse: Adaptive Arbitration of Complementary Expertise in Time Series Foundational Models
by: Das, Sarkar Snigdha Sarathi, et al.
Published: (2025)
by: Das, Sarkar Snigdha Sarathi, et al.
Published: (2025)
VISTA: A Test-Time Self-Improving Video Generation Agent
by: Long, Do Xuan, et al.
Published: (2025)
by: Long, Do Xuan, et al.
Published: (2025)
Watch and Learn: Learning to Use Computers from Online Videos
by: Song, Chan Hee, et al.
Published: (2025)
by: Song, Chan Hee, et al.
Published: (2025)
VeriGuard: Enhancing LLM Agent Safety via Verified Code Generation
by: Miculicich, Lesly, et al.
Published: (2025)
by: Miculicich, Lesly, et al.
Published: (2025)
CANVAS: Continuity-Aware Narratives via Visual Agentic Storyboarding
by: Mondal, Ishani, et al.
Published: (2026)
by: Mondal, Ishani, et al.
Published: (2026)
Heterogeneous Swarms: Jointly Optimizing Model Roles and Weights for Multi-LLM Systems
by: Feng, Shangbin, et al.
Published: (2025)
by: Feng, Shangbin, et al.
Published: (2025)
Reasoning-Aware Training for Time Series Forecasting
by: Ahamed, Md Atik, et al.
Published: (2026)
by: Ahamed, Md Atik, et al.
Published: (2026)
LEAF: A Living Benchmark for Event-Augmented Forecasting
by: Tan, Mingtian, et al.
Published: (2026)
by: Tan, Mingtian, et al.
Published: (2026)
Reverse Thinking Makes LLMs Stronger Reasoners
by: Chen, Justin Chih-Yao, et al.
Published: (2024)
by: Chen, Justin Chih-Yao, et al.
Published: (2024)
Teach Better or Show Smarter? On Instructions and Exemplars in Automatic Prompt Optimization
by: Wan, Xingchen, et al.
Published: (2024)
by: Wan, Xingchen, et al.
Published: (2024)
SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL (extended)
by: Sun, Ruoxi, et al.
Published: (2023)
by: Sun, Ruoxi, et al.
Published: (2023)
Magnet: Multi-turn Tool-use Data Synthesis and Distillation via Graph Translation
by: Yin, Fan, et al.
Published: (2025)
by: Yin, Fan, et al.
Published: (2025)
Step-by-Step Reasoning to Solve Grid Puzzles: Where do LLMs Falter?
by: Tyagi, Nemika, et al.
Published: (2024)
by: Tyagi, Nemika, et al.
Published: (2024)
Investigating the Shortcomings of LLMs in Step-by-Step Legal Reasoning
by: Mishra, Venkatesh, et al.
Published: (2025)
by: Mishra, Venkatesh, et al.
Published: (2025)
PHANTOM RECALL: When Familiar Puzzles Fool Smart Models
by: Mukhopadhyay, Souradeep, et al.
Published: (2025)
by: Mukhopadhyay, Souradeep, et al.
Published: (2025)
Nexus : An Agentic Framework for Time Series Forecasting
by: Das, Sarkar Snigdha Sarathi, et al.
Published: (2026)
by: Das, Sarkar Snigdha Sarathi, et al.
Published: (2026)
Cutting Through the Noise: Boosting LLM Performance on Math Word Problems
by: Anantheswaran, Ujjwala, et al.
Published: (2024)
by: Anantheswaran, Ujjwala, et al.
Published: (2024)
Multi-LogiEval: Towards Evaluating Multi-Step Logical Reasoning Ability of Large Language Models
by: Patel, Nisarg, et al.
Published: (2024)
by: Patel, Nisarg, et al.
Published: (2024)
ScientistOne: Towards Human-Level Autonomous Research via Chain-of-Evidence
by: Meng, Rui, et al.
Published: (2026)
by: Meng, Rui, et al.
Published: (2026)
ThinkTuning: Instilling Cognitive Reflections without Distillation
by: RRV, Aswin, et al.
Published: (2025)
by: RRV, Aswin, et al.
Published: (2025)
From Few to Many: Self-Improving Many-Shot Reasoners Through Iterative Optimization and Generation
by: Wan, Xingchen, et al.
Published: (2025)
by: Wan, Xingchen, et al.
Published: (2025)
Towards LogiGLUE: A Brief Survey and A Benchmark for Analyzing Logical Reasoning Capabilities of Language Models
by: Luo, Man, et al.
Published: (2023)
by: Luo, Man, et al.
Published: (2023)
LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models
by: Parmar, Mihir, et al.
Published: (2024)
by: Parmar, Mihir, et al.
Published: (2024)
Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning
by: Deng, Yihe, et al.
Published: (2025)
by: Deng, Yihe, et al.
Published: (2025)
In Prospect and Retrospect: Reflective Memory Management for Long-term Personalized Dialogue Agents
by: Tan, Zhen, et al.
Published: (2025)
by: Tan, Zhen, et al.
Published: (2025)
Maestro: Self-Improving Text-to-Image Generation via Agent Orchestration
by: Wan, Xingchen, et al.
Published: (2025)
by: Wan, Xingchen, et al.
Published: (2025)
Towards Compute-Optimal Many-Shot In-Context Learning
by: Golchin, Shahriar, et al.
Published: (2025)
by: Golchin, Shahriar, et al.
Published: (2025)
Model Swarms: Collaborative Search to Adapt LLM Experts via Swarm Intelligence
by: Feng, Shangbin, et al.
Published: (2024)
by: Feng, Shangbin, et al.
Published: (2024)
Neglected Hessian component explains mysteries in Sharpness regularization
by: Dauphin, Yann N., et al.
Published: (2024)
by: Dauphin, Yann N., et al.
Published: (2024)
Exploring Group and Symmetry Principles in Large Language Models
by: Imani, Shima, et al.
Published: (2024)
by: Imani, Shima, et al.
Published: (2024)
Similar Items
-
PLAN-TUNING: Post-Training Language Models to Learn Step-by-Step Planning for Complex Problem Solving
by: Parmar, Mihir, et al.
Published: (2025) -
ScholarPeer: A Context-Aware Multi-Agent Framework for Automated Peer Review
by: Goyal, Palash, et al.
Published: (2026) -
Don't Blame the Annotator: Bias Already Starts in the Annotation Instructions
by: Parmar, Mihir, et al.
Published: (2022) -
HEART: Emotionally-Driven Test-Time Scaling of Language Models
by: Pinto, Gabriela, et al.
Published: (2025) -
LLM-Based Multi-Agent Blackboard System for Information Discovery in Data Science
by: Salemi, Alireza, et al.
Published: (2025)