Saved in:
Bibliographic Details
Main Authors: Dolina, Michał, Dec, Jakub, Drożdż, Stanisław, Kwapień, Jarosław, Liu, Jin, Stanisz, Tomasz
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2503.04449
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866909527808409600
author Dolina, Michał
Dec, Jakub
Drożdż, Stanisław
Kwapień, Jarosław
Liu, Jin
Stanisz, Tomasz
author_facet Dolina, Michał
Dec, Jakub
Drożdż, Stanisław
Kwapień, Jarosław
Liu, Jin
Stanisz, Tomasz
contents Recent research shows that punctuation patterns in texts exhibit universal features across languages. Analysis of Western classical literature reveals that the distribution of spaces between punctuation marks aligns with a discrete Weibull distribution, typically used in survival analysis. By extending this analysis to Chinese literature represented here by three notable contemporary works, it is shown that Zipf's law applies to Chinese texts similarly to Western texts, where punctuation patterns also improve adherence to the law. Additionally, the distance distribution between punctuation marks in Chinese texts follows the Weibull model, though larger spacing is less frequent than in English translations. Sentence-ending punctuation, representing sentence length, diverges more from this pattern, reflecting greater flexibility in sentence length. This variability supports the formation of complex, multifractal sentence structures, particularly evident in Gao Xingjian's "Soul Mountain". These findings demonstrate that both Chinese and Western texts share universal punctuation and word distribution patterns, underscoring their broad applicability across languages.
format Preprint
id arxiv_https___arxiv_org_abs_2503_04449
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Quantifying patterns of punctuation in modern Chinese prose
Dolina, Michał
Dec, Jakub
Drożdż, Stanisław
Kwapień, Jarosław
Liu, Jin
Stanisz, Tomasz
Computation and Language
Recent research shows that punctuation patterns in texts exhibit universal features across languages. Analysis of Western classical literature reveals that the distribution of spaces between punctuation marks aligns with a discrete Weibull distribution, typically used in survival analysis. By extending this analysis to Chinese literature represented here by three notable contemporary works, it is shown that Zipf's law applies to Chinese texts similarly to Western texts, where punctuation patterns also improve adherence to the law. Additionally, the distance distribution between punctuation marks in Chinese texts follows the Weibull model, though larger spacing is less frequent than in English translations. Sentence-ending punctuation, representing sentence length, diverges more from this pattern, reflecting greater flexibility in sentence length. This variability supports the formation of complex, multifractal sentence structures, particularly evident in Gao Xingjian's "Soul Mountain". These findings demonstrate that both Chinese and Western texts share universal punctuation and word distribution patterns, underscoring their broad applicability across languages.
title Quantifying patterns of punctuation in modern Chinese prose
topic Computation and Language
url https://arxiv.org/abs/2503.04449