Enregistré dans:
| Auteurs principaux: | Jayarao, Pratik, Gupta, Himanshu, Varshney, Neeraj, Dwivedi, Chaitanya |
|---|---|
| Format: | Preprint |
| Publié: |
2025
|
| Sujets: | |
| Accès en ligne: | https://arxiv.org/abs/2509.13332 |
| Tags: |
Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
|
Documents similaires
Code Mixologist : A Practitioner's Guide to Building Code-Mixed LLMs
par: Gupta, Himanshu, et autres
Publié: (2026)
par: Gupta, Himanshu, et autres
Publié: (2026)
Expert Upcycling: Shifting the Compute-Efficient Frontier of Mixture-of-Experts
par: Dwivedi, Chaitanya, et autres
Publié: (2026)
par: Dwivedi, Chaitanya, et autres
Publié: (2026)
LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models
par: Parmar, Mihir, et autres
Publié: (2024)
par: Parmar, Mihir, et autres
Publié: (2024)
Judging the Judges: A Systematic Study of Position Bias in LLM-as-a-Judge
par: Shi, Lin, et autres
Publié: (2024)
par: Shi, Lin, et autres
Publié: (2024)
Learning From Mistakes Makes LLM Better Reasoner
par: An, Shengnan, et autres
Publié: (2023)
par: An, Shengnan, et autres
Publié: (2023)
Abstraction-of-Thought Makes Language Models Better Reasoners
par: Hong, Ruixin, et autres
Publié: (2024)
par: Hong, Ruixin, et autres
Publié: (2024)
Multi-LogiEval: Towards Evaluating Multi-Step Logical Reasoning Ability of Large Language Models
par: Patel, Nisarg, et autres
Publié: (2024)
par: Patel, Nisarg, et autres
Publié: (2024)
Towards LogiGLUE: A Brief Survey and A Benchmark for Analyzing Logical Reasoning Capabilities of Language Models
par: Luo, Man, et autres
Publié: (2023)
par: Luo, Man, et autres
Publié: (2023)
Learning to Self-Verify Makes Language Models Better Reasoners
par: Chen, Yuxin, et autres
Publié: (2026)
par: Chen, Yuxin, et autres
Publié: (2026)
SAT: Balancing Reasoning Accuracy and Efficiency with Stepwise Adaptive Thinking
par: Huang, Weiyang, et autres
Publié: (2026)
par: Huang, Weiyang, et autres
Publié: (2026)
Stateful KV Cache Management for LLMs: Balancing Space, Time, Accuracy, and Positional Fidelity
par: Poudel, Pratik
Publié: (2025)
par: Poudel, Pratik
Publié: (2025)
Mentor-KD: Making Small Language Models Better Multi-step Reasoners
par: Lee, Hojae, et autres
Publié: (2024)
par: Lee, Hojae, et autres
Publié: (2024)
JudgeLRM: Large Reasoning Models as a Judge
par: Chen, Nuo, et autres
Publié: (2025)
par: Chen, Nuo, et autres
Publié: (2025)
Polymath: A Challenging Multi-modal Mathematical Reasoning Benchmark
par: Gupta, Himanshu, et autres
Publié: (2024)
par: Gupta, Himanshu, et autres
Publié: (2024)
From Accuracy to Robustness: A Study of Rule- and Model-based Verifiers in Mathematical Reasoning
par: Huang, Yuzhen, et autres
Publié: (2025)
par: Huang, Yuzhen, et autres
Publié: (2025)
The Accuracy Paradox in RLHF: When Better Reward Models Don't Yield Better Language Models
par: Chen, Yanjun, et autres
Publié: (2024)
par: Chen, Yanjun, et autres
Publié: (2024)
Can LLMs Learn by Teaching for Better Reasoning? A Preliminary Study
par: Ning, Xuefei, et autres
Publié: (2024)
par: Ning, Xuefei, et autres
Publié: (2024)
Making Implicit Premises Explicit in Logical Understanding of Enthymemes
par: Feng, Xuyao, et autres
Publié: (2026)
par: Feng, Xuyao, et autres
Publié: (2026)
Do Before You Judge: Self-Reference as a Pathway to Better LLM Evaluation
par: Lin, Wei-Hsiang, et autres
Publié: (2025)
par: Lin, Wei-Hsiang, et autres
Publié: (2025)
A Study on Leveraging Search and Self-Feedback for Agent Reasoning
par: K, Karthikeyan, et autres
Publié: (2025)
par: K, Karthikeyan, et autres
Publié: (2025)
What Makes Low-Bit Quantization-Aware Training Work for Reasoning LLMs? A Systematic Study
par: Lv, Keyu, et autres
Publié: (2026)
par: Lv, Keyu, et autres
Publié: (2026)
S^3cMath: Spontaneous Step-level Self-correction Makes Large Language Models Better Mathematical Reasoners
par: Yan, Yuchen, et autres
Publié: (2024)
par: Yan, Yuchen, et autres
Publié: (2024)
Reasoning Models Better Express Their Confidence
par: Yoon, Dongkeun, et autres
Publié: (2025)
par: Yoon, Dongkeun, et autres
Publié: (2025)
Reasoning Is Not Free: Robust Adaptive Cost-Efficient Routing for LLM-as-a-Judge
par: Zhang, Wenbo, et autres
Publié: (2026)
par: Zhang, Wenbo, et autres
Publié: (2026)
Disparities in LLM Reasoning Accuracy and Explanations: A Case Study on African American English
par: Zhou, Runtao, et autres
Publié: (2025)
par: Zhou, Runtao, et autres
Publié: (2025)
JudgeRLVR: Judge First, Generate Second for Efficient Reasoning
par: Duo, Jiangshan, et autres
Publié: (2026)
par: Duo, Jiangshan, et autres
Publié: (2026)
AI Agents-as-Judge: Automated Assessment of Accuracy, Consistency, Completeness and Clarity for Enterprise Documents
par: Dasgupta, Sudip, et autres
Publié: (2025)
par: Dasgupta, Sudip, et autres
Publié: (2025)
Geode: A Zero-shot Geospatial Question-Answering Agent with Explicit Reasoning and Precise Spatio-Temporal Retrieval
par: Gupta, Devashish Vikas, et autres
Publié: (2024)
par: Gupta, Devashish Vikas, et autres
Publié: (2024)
AdaptEvolve: Improving Efficiency of Evolutionary AI Agents through Adaptive Model Selection
par: Ray, Pretam, et autres
Publié: (2026)
par: Ray, Pretam, et autres
Publié: (2026)
StepWiser: Stepwise Generative Judges for Wiser Reasoning
par: Xiong, Wei, et autres
Publié: (2025)
par: Xiong, Wei, et autres
Publié: (2025)
SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs
par: Shi, Dachuan, et autres
Publié: (2025)
par: Shi, Dachuan, et autres
Publié: (2025)
Stance Reasoner: Zero-Shot Stance Detection on Social Media with Explicit Reasoning
par: Taranukhin, Maksym, et autres
Publié: (2024)
par: Taranukhin, Maksym, et autres
Publié: (2024)
Debating for Better Reasoning: An Unsupervised Multimodal Approach
par: Adhikari, Ashutosh, et autres
Publié: (2025)
par: Adhikari, Ashutosh, et autres
Publié: (2025)
Permutation-Consensus Listwise Judging for Robust Factuality Evaluation
par: Huang, Tianyi, et autres
Publié: (2026)
par: Huang, Tianyi, et autres
Publié: (2026)
Monolingual or Multilingual Instruction Tuning: Which Makes a Better Alpaca
par: Chen, Pinzhen, et autres
Publié: (2023)
par: Chen, Pinzhen, et autres
Publié: (2023)
Supervised Knowledge Makes Large Language Models Better In-context Learners
par: Yang, Linyi, et autres
Publié: (2023)
par: Yang, Linyi, et autres
Publié: (2023)
Learning to Plan & Reason for Evaluation with Thinking-LLM-as-a-Judge
par: Saha, Swarnadeep, et autres
Publié: (2025)
par: Saha, Swarnadeep, et autres
Publié: (2025)
BaRDa: A Belief and Reasoning Dataset that Separates Factual Accuracy and Reasoning Ability
par: Clark, Peter, et autres
Publié: (2023)
par: Clark, Peter, et autres
Publié: (2023)
Towards Robust ESG Analysis Against Greenwashing Risks: Aspect-Action Analysis with Cross-Category Generalization
par: Ong, Keane, et autres
Publié: (2025)
par: Ong, Keane, et autres
Publié: (2025)
An Empirical Study on Large Language Models in Accuracy and Robustness under Chinese Industrial Scenarios
par: Li, Zongjie, et autres
Publié: (2024)
par: Li, Zongjie, et autres
Publié: (2024)
Documents similaires
-
Code Mixologist : A Practitioner's Guide to Building Code-Mixed LLMs
par: Gupta, Himanshu, et autres
Publié: (2026) -
Expert Upcycling: Shifting the Compute-Efficient Frontier of Mixture-of-Experts
par: Dwivedi, Chaitanya, et autres
Publié: (2026) -
LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models
par: Parmar, Mihir, et autres
Publié: (2024) -
Judging the Judges: A Systematic Study of Position Bias in LLM-as-a-Judge
par: Shi, Lin, et autres
Publié: (2024) -
Learning From Mistakes Makes LLM Better Reasoner
par: An, Shengnan, et autres
Publié: (2023)