Saved in:
Bibliographic Details
Main Authors: Ljubešić, Nikola, Rupnik, Peter, Perinčić, Tea
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.03245
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910010046414848
author Ljubešić, Nikola
Rupnik, Peter
Perinčić, Tea
author_facet Ljubešić, Nikola
Rupnik, Peter
Perinčić, Tea
contents This paper documents our efforts in releasing the printed and audio book of the translation of the famous novel The Little Prince into the Chakavian dialect, as a computer-readable, AI-ready dataset, with the textual and the audio components of the two releases now aligned on the level of each written and spoken word. Our motivation for working on this release is multiple. The first one is our wish to preserve the highly valuable and specific content beyond the small editions of the printed and the audio book. With the dataset published in the CLARIN.SI repository, this content is from now on at the fingertips of any interested individual. The second motivation is to make the data available for various artificial-intelligence-related usage scenarios, such as the one we follow upon inside this paper already -- adapting the Whisper-large-v3 open automatic speech recognition model, with decent performance on standard Croatian, to Chakavian dialectal speech. We can happily report that with adapting the model, the word error rate on the selected test data has being reduced to a half, while we managed to remove up to two thirds of the error on character level. We envision many more usages of this dataset beyond the set of experiments we have already performed, both on tasks of artificial intelligence research and application, as well as dialectal research. The third motivation for this release is our hope that this, now highly structured dataset, will be transformed into a digital online edition of this work, allowing individuals beyond the research and technology communities to enjoy the beauty of the message of the little boy in the desert, told through the spectacular prism of the Chakavian dialect.
format Preprint
id arxiv_https___arxiv_org_abs_2602_03245
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Mići Princ -- A Little Boy Teaching Speech Technologies the Chakavian Dialect
Ljubešić, Nikola
Rupnik, Peter
Perinčić, Tea
Audio and Speech Processing
Computation and Language
This paper documents our efforts in releasing the printed and audio book of the translation of the famous novel The Little Prince into the Chakavian dialect, as a computer-readable, AI-ready dataset, with the textual and the audio components of the two releases now aligned on the level of each written and spoken word. Our motivation for working on this release is multiple. The first one is our wish to preserve the highly valuable and specific content beyond the small editions of the printed and the audio book. With the dataset published in the CLARIN.SI repository, this content is from now on at the fingertips of any interested individual. The second motivation is to make the data available for various artificial-intelligence-related usage scenarios, such as the one we follow upon inside this paper already -- adapting the Whisper-large-v3 open automatic speech recognition model, with decent performance on standard Croatian, to Chakavian dialectal speech. We can happily report that with adapting the model, the word error rate on the selected test data has being reduced to a half, while we managed to remove up to two thirds of the error on character level. We envision many more usages of this dataset beyond the set of experiments we have already performed, both on tasks of artificial intelligence research and application, as well as dialectal research. The third motivation for this release is our hope that this, now highly structured dataset, will be transformed into a digital online edition of this work, allowing individuals beyond the research and technology communities to enjoy the beauty of the message of the little boy in the desert, told through the spectacular prism of the Chakavian dialect.
title Mići Princ -- A Little Boy Teaching Speech Technologies the Chakavian Dialect
topic Audio and Speech Processing
Computation and Language
url https://arxiv.org/abs/2602.03245