Saved in:
Bibliographic Details
Main Authors: Margapuri, Venkat, Kazanjian, Garik, Kosaraju, Naren
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2509.20105
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866916967672184832
author Margapuri, Venkat
Kazanjian, Garik
Kosaraju, Naren
author_facet Margapuri, Venkat
Kazanjian, Garik
Kosaraju, Naren
contents Large Language Models (LLMs) often struggle with maintaining coherent multi-step reasoning traces, particularly in tasks that require a structured logical flow. This work introduces a quantum-inspired approach to address the challenge by incorporating a fidelity-based reward derived from Projected Entangled Pair States (PEPS) into Proximal Policy Optimization. Unlike prior approaches that use direct supervision or contrastive objectives, the proposed method guides learning through structural consistency, offering a novel approach to enforce global coherence in generated reasoning traces. The proposed framework is evaluated using multiple coherence-determining metrics on diverse datasets such as GSM8K, StrategyQA, and EntailmentBank spanning arithmetic, intuitive, and entailment-based reasoning. Results show that the proposed quantum-inspired approach offers significant improvements over supervised, contrastive, and pretrained baseline approaches, highlighting the effectiveness of quantum-inspired fidelity as a foundation to improve reasoning trace coherence in LLMs.
format Preprint
id arxiv_https___arxiv_org_abs_2509_20105
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle PEPS: Quantum-Inspired Reinforcement Learning for Coherent Reasoning Traces in LLMs
Margapuri, Venkat
Kazanjian, Garik
Kosaraju, Naren
Artificial Intelligence
Large Language Models (LLMs) often struggle with maintaining coherent multi-step reasoning traces, particularly in tasks that require a structured logical flow. This work introduces a quantum-inspired approach to address the challenge by incorporating a fidelity-based reward derived from Projected Entangled Pair States (PEPS) into Proximal Policy Optimization. Unlike prior approaches that use direct supervision or contrastive objectives, the proposed method guides learning through structural consistency, offering a novel approach to enforce global coherence in generated reasoning traces. The proposed framework is evaluated using multiple coherence-determining metrics on diverse datasets such as GSM8K, StrategyQA, and EntailmentBank spanning arithmetic, intuitive, and entailment-based reasoning. Results show that the proposed quantum-inspired approach offers significant improvements over supervised, contrastive, and pretrained baseline approaches, highlighting the effectiveness of quantum-inspired fidelity as a foundation to improve reasoning trace coherence in LLMs.
title PEPS: Quantum-Inspired Reinforcement Learning for Coherent Reasoning Traces in LLMs
topic Artificial Intelligence
url https://arxiv.org/abs/2509.20105