:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Dent, Rasul, Suarez, Pedro Ortiz, Clérice, Thibault, Sagot, Benoît
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2602.08951
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

KréyoLID From Language Identification Towards Language Mining
by: Dent, Rasul, et al.
Published: (2025)

Molyé: A Corpus-based Approach to Language Contact in Colonial France
by: Dent, Rasul, et al.
Published: (2024)

Reading or Guessing? Visual Grounding Failures of Vision-Language Models for OCR in Ancient Greek Editions
by: Karamolegkou, Antonia, et al.
Published: (2026)

Detecting Sexual Content at the Sentence Level in First Millennium Latin Texts
by: Clérice, Thibault
Published: (2023)

From Text to Source: Results in Detecting Large Language Model-Generated Content
by: Antoun, Wissam, et al.
Published: (2023)

Can Character-based Language Models Improve Downstream Task Performance in Low-Resource and Noisy Language Scenarios?
by: Riabi, Arij, et al.
Published: (2021)

You Actually Look Twice At it (YALTAi): using an object detection approach instead of region segmentation within the Kraken engine
by: Clérice, Thibault
Published: (2022)

Language-Switching Triggers Take a Latent Detour Through Language Models
by: Kulumba, Francis, et al.
Published: (2026)

On the Scaling Laws of Geographical Representation in Language Models
by: Godey, Nathan, et al.
Published: (2024)

Pre-Editorial Normalization for Automatically Transcribed Medieval Manuscripts in Old French and Latin
by: Clérice, Thibault, et al.
Published: (2026)

Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck
by: Godey, Nathan, et al.
Published: (2024)

ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on Transformer Encoder Models Performance
by: Antoun, Wissam, et al.
Published: (2025)

mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus
by: Futeral, Matthieu, et al.
Published: (2024)

Testing the Deliteralization Hypothesis in Human and Machine Translation
by: Marmonier, Malik, et al.
Published: (2026)

Hindsight Quality Prediction Experiments in Multi-Candidate Human-Post-Edited Machine Translation
by: Marmonier, Malik, et al.
Published: (2026)

A French Version of the OLDI Seed Corpus
by: Marmonier, Malik, et al.
Published: (2025)

Tree of Problems: Improving structured problem solving with compositionality
by: Zebaze, Armel, et al.
Published: (2024)

In-Context Example Selection via Similarity Search Improves Low-Resource Machine Translation
by: Zebaze, Armel, et al.
Published: (2024)

LLM Reasoning for Machine Translation: Synthetic Data Generation over Thinking Tokens
by: Zebaze, Armel, et al.
Published: (2025)

Explicit Learning and the LLM in Machine Translation
by: Marmonier, Malik, et al.
Published: (2025)

TopXGen: Topic-Diverse Parallel Data Generation for Low-Resource Machine Translation
by: Zebaze, Armel, et al.
Published: (2025)

Compositional Translation: A Novel LLM-based Approach for Low-resource Machine Translation
by: Zebaze, Armel, et al.
Published: (2025)

Making Sentence Embeddings Robust to User-Generated Content
by: Nishimwe, Lydia, et al.
Published: (2024)

Diachronic Document Dataset for Semantic Layout Analysis
by: Clérice, Thibault, et al.
Published: (2024)

Should We Still Pretrain Encoders with Masked Language Modeling?
by: Gisserot-Boukhlef, Hippolyte, et al.
Published: (2025)

How Should We Enhance the Safety of Large Reasoning Models: An Empirical Study
by: Zhang, Zhexin, et al.
Published: (2025)

Anisotropy Is Inherent to Self-Attention in Transformers
by: Godey, Nathan, et al.
Published: (2024)

Towards Zero-Shot Multimodal Machine Translation
by: Futeral, Matthieu, et al.
Published: (2024)

CamemBERT 2.0: A Smarter French Language Model Aged to Perfection
by: Antoun, Wissam, et al.
Published: (2024)

BigO(Bench) -- Can LLMs Generate Code with Controlled Time and Space Complexity?
by: Chambon, Pierre, et al.
Published: (2025)

When your Cousin has the Right Connections: Unsupervised Bilingual Lexicon Induction for Related Data-Imbalanced Languages
by: Bafna, Niyati, et al.
Published: (2023)

Position: AI Evaluation Should Learn from How We Test Humans
by: Zhuang, Yan, et al.
Published: (2023)

Disentangling meaning from language in LLM-based machine translation
by: Lasnier, Théo, et al.
Published: (2026)

Gaperon: A Peppered English-French Generative Language Model Suite
by: Godey, Nathan, et al.
Published: (2025)

Just Because We Camp, Doesn't Mean We Should: The Ethics of Modelling Queer Voices
by: Sigurgeirsson, Atli, et al.
Published: (2024)

We Should Evaluate Real-World Impact
by: Reiter, Ehud
Published: (2025)

We Should Chart an Atlas of All the World's Models
by: Horwitz, Eliahu, et al.
Published: (2025)

How Should We Extract Discrete Audio Tokens from Self-Supervised Models?
by: Mousavi, Pooneh, et al.
Published: (2024)

Patent Representation Learning via Self-supervision
by: Zuo, You, et al.
Published: (2025)

PatentEval: Understanding Errors in Patent Generation
by: Zuo, You, et al.
Published: (2024)