Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Ni, Tongke, Fan, Yang, Zhou, Junru, Wu, Xiangping, Chen, Qingcai
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2503.23671
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910901371666432
author	Ni, Tongke Fan, Yang Zhou, Junru Wu, Xiangping Chen, Qingcai
author_facet	Ni, Tongke Fan, Yang Zhou, Junru Wu, Xiangping Chen, Qingcai
contents	Text semantic segmentation involves partitioning a document into multiple paragraphs with continuous semantics based on the subject matter, contextual information, and document structure. Traditional approaches have typically relied on preprocessing documents into segments to address input length constraints, resulting in the loss of critical semantic information across segments. To address this, we present CrossFormer, a transformer-based model featuring a novel cross-segment fusion module that dynamically models latent semantic dependencies across document segments, substantially elevating segmentation accuracy. Additionally, CrossFormer can replace rule-based chunk methods within the Retrieval-Augmented Generation (RAG) system, producing more semantically coherent chunks that enhance its efficacy. Comprehensive evaluations confirm CrossFormer's state-of-the-art performance on public text semantic segmentation datasets, alongside considerable gains on RAG benchmarks.
format	Preprint
id	arxiv_https___arxiv_org_abs_2503_23671
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	CrossFormer: Cross-Segment Semantic Fusion for Document Segmentation Ni, Tongke Fan, Yang Zhou, Junru Wu, Xiangping Chen, Qingcai Computation and Language Text semantic segmentation involves partitioning a document into multiple paragraphs with continuous semantics based on the subject matter, contextual information, and document structure. Traditional approaches have typically relied on preprocessing documents into segments to address input length constraints, resulting in the loss of critical semantic information across segments. To address this, we present CrossFormer, a transformer-based model featuring a novel cross-segment fusion module that dynamically models latent semantic dependencies across document segments, substantially elevating segmentation accuracy. Additionally, CrossFormer can replace rule-based chunk methods within the Retrieval-Augmented Generation (RAG) system, producing more semantically coherent chunks that enhance its efficacy. Comprehensive evaluations confirm CrossFormer's state-of-the-art performance on public text semantic segmentation datasets, alongside considerable gains on RAG benchmarks.
title	CrossFormer: Cross-Segment Semantic Fusion for Document Segmentation
topic	Computation and Language
url	https://arxiv.org/abs/2503.23671

Similar Items