Saved in:
Bibliographic Details
Main Authors: Ni, Tongke, Fan, Yang, Zhou, Junru, Wu, Xiangping, Chen, Qingcai
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2503.23671
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910901371666432
author Ni, Tongke
Fan, Yang
Zhou, Junru
Wu, Xiangping
Chen, Qingcai
author_facet Ni, Tongke
Fan, Yang
Zhou, Junru
Wu, Xiangping
Chen, Qingcai
contents Text semantic segmentation involves partitioning a document into multiple paragraphs with continuous semantics based on the subject matter, contextual information, and document structure. Traditional approaches have typically relied on preprocessing documents into segments to address input length constraints, resulting in the loss of critical semantic information across segments. To address this, we present CrossFormer, a transformer-based model featuring a novel cross-segment fusion module that dynamically models latent semantic dependencies across document segments, substantially elevating segmentation accuracy. Additionally, CrossFormer can replace rule-based chunk methods within the Retrieval-Augmented Generation (RAG) system, producing more semantically coherent chunks that enhance its efficacy. Comprehensive evaluations confirm CrossFormer's state-of-the-art performance on public text semantic segmentation datasets, alongside considerable gains on RAG benchmarks.
format Preprint
id arxiv_https___arxiv_org_abs_2503_23671
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle CrossFormer: Cross-Segment Semantic Fusion for Document Segmentation
Ni, Tongke
Fan, Yang
Zhou, Junru
Wu, Xiangping
Chen, Qingcai
Computation and Language
Text semantic segmentation involves partitioning a document into multiple paragraphs with continuous semantics based on the subject matter, contextual information, and document structure. Traditional approaches have typically relied on preprocessing documents into segments to address input length constraints, resulting in the loss of critical semantic information across segments. To address this, we present CrossFormer, a transformer-based model featuring a novel cross-segment fusion module that dynamically models latent semantic dependencies across document segments, substantially elevating segmentation accuracy. Additionally, CrossFormer can replace rule-based chunk methods within the Retrieval-Augmented Generation (RAG) system, producing more semantically coherent chunks that enhance its efficacy. Comprehensive evaluations confirm CrossFormer's state-of-the-art performance on public text semantic segmentation datasets, alongside considerable gains on RAG benchmarks.
title CrossFormer: Cross-Segment Semantic Fusion for Document Segmentation
topic Computation and Language
url https://arxiv.org/abs/2503.23671