Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Margapuri, Venkat, Kazanjian, Garik, Kosaraju, Naren
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2509.20105
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866916967672184832
author	Margapuri, Venkat Kazanjian, Garik Kosaraju, Naren
author_facet	Margapuri, Venkat Kazanjian, Garik Kosaraju, Naren
contents	Large Language Models (LLMs) often struggle with maintaining coherent multi-step reasoning traces, particularly in tasks that require a structured logical flow. This work introduces a quantum-inspired approach to address the challenge by incorporating a fidelity-based reward derived from Projected Entangled Pair States (PEPS) into Proximal Policy Optimization. Unlike prior approaches that use direct supervision or contrastive objectives, the proposed method guides learning through structural consistency, offering a novel approach to enforce global coherence in generated reasoning traces. The proposed framework is evaluated using multiple coherence-determining metrics on diverse datasets such as GSM8K, StrategyQA, and EntailmentBank spanning arithmetic, intuitive, and entailment-based reasoning. Results show that the proposed quantum-inspired approach offers significant improvements over supervised, contrastive, and pretrained baseline approaches, highlighting the effectiveness of quantum-inspired fidelity as a foundation to improve reasoning trace coherence in LLMs.
format	Preprint
id	arxiv_https___arxiv_org_abs_2509_20105
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	PEPS: Quantum-Inspired Reinforcement Learning for Coherent Reasoning Traces in LLMs Margapuri, Venkat Kazanjian, Garik Kosaraju, Naren Artificial Intelligence Large Language Models (LLMs) often struggle with maintaining coherent multi-step reasoning traces, particularly in tasks that require a structured logical flow. This work introduces a quantum-inspired approach to address the challenge by incorporating a fidelity-based reward derived from Projected Entangled Pair States (PEPS) into Proximal Policy Optimization. Unlike prior approaches that use direct supervision or contrastive objectives, the proposed method guides learning through structural consistency, offering a novel approach to enforce global coherence in generated reasoning traces. The proposed framework is evaluated using multiple coherence-determining metrics on diverse datasets such as GSM8K, StrategyQA, and EntailmentBank spanning arithmetic, intuitive, and entailment-based reasoning. Results show that the proposed quantum-inspired approach offers significant improvements over supervised, contrastive, and pretrained baseline approaches, highlighting the effectiveness of quantum-inspired fidelity as a foundation to improve reasoning trace coherence in LLMs.
title	PEPS: Quantum-Inspired Reinforcement Learning for Coherent Reasoning Traces in LLMs
topic	Artificial Intelligence
url	https://arxiv.org/abs/2509.20105

Similar Items