Saved in:
| Main Authors: | Desimone, S. A., Alemany, L. Alonso |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.17398 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
N-gram Prediction and Word Difference Representations for Language Modeling
by: Heo, DongNyeong, et al.
Published: (2024)
by: Heo, DongNyeong, et al.
Published: (2024)
PILOT: Steering Synthetic Data Generation with Psychological & Linguistic Output Targeting
by: Cisar, Caitlin, et al.
Published: (2025)
by: Cisar, Caitlin, et al.
Published: (2025)
N-gram-like Language Models Predict Reading Time Best
by: Michaelov, James A., et al.
Published: (2026)
by: Michaelov, James A., et al.
Published: (2026)
Modelling Intertextuality with N-gram Embeddings
by: Xing, Yi
Published: (2025)
by: Xing, Yi
Published: (2025)
Lossless Acceleration of Large Language Model via Adaptive N-gram Parallel Decoding
by: Ou, Jie, et al.
Published: (2024)
by: Ou, Jie, et al.
Published: (2024)
An Interpretable N-gram Perplexity Threat Model for Large Language Model Jailbreaks
by: Boreiko, Valentyn, et al.
Published: (2024)
by: Boreiko, Valentyn, et al.
Published: (2024)
Analysis and Visualization of Linguistic Structures in Large Language Models: Neural Representations of Verb-Particle Constructions in BERT
by: Kissane, Hassane, et al.
Published: (2024)
by: Kissane, Hassane, et al.
Published: (2024)
From N-grams to Pre-trained Multilingual Models For Language Identification
by: Sindane, Thapelo, et al.
Published: (2024)
by: Sindane, Thapelo, et al.
Published: (2024)
Identifying Influential N-grams in Confidence Calibration via Regression Analysis
by: Ozaki, Shintaro, et al.
Published: (2026)
by: Ozaki, Shintaro, et al.
Published: (2026)
Contrastive Decoding for Synthetic Data Generation in Low-Resource Language Modeling
by: Ulm, Jannek, et al.
Published: (2025)
by: Ulm, Jannek, et al.
Published: (2025)
Quantifying the Impact of Structured Output Format on Large Language Models through Causal Inference
by: Yuan, Han, et al.
Published: (2025)
by: Yuan, Han, et al.
Published: (2025)
Can Transformers Learn $n$-gram Language Models?
by: Svete, Anej, et al.
Published: (2024)
by: Svete, Anej, et al.
Published: (2024)
Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens
by: Liu, Jiacheng, et al.
Published: (2024)
by: Liu, Jiacheng, et al.
Published: (2024)
Transformers Can Represent $n$-gram Language Models
by: Svete, Anej, et al.
Published: (2024)
by: Svete, Anej, et al.
Published: (2024)
ResoFilter: Fine-grained Synthetic Data Filtering for Large Language Models through Data-Parameter Resonance Analysis
by: Tu, Zeao, et al.
Published: (2024)
by: Tu, Zeao, et al.
Published: (2024)
SLOT: Structuring the Output of Large Language Models
by: Wang, Darren Yow-Bang, et al.
Published: (2025)
by: Wang, Darren Yow-Bang, et al.
Published: (2025)
Enhancing Bangla Language Next Word Prediction and Sentence Completion through Extended RNN with Bi-LSTM Model On N-gram Language
by: Islam, Md Robiul, et al.
Published: (2024)
by: Islam, Md Robiul, et al.
Published: (2024)
Contrastive Learning with Enhanced Abstract Representations using Grouped Loss of Abstract Semantic Supervision
by: Suissa, Omri, et al.
Published: (2025)
by: Suissa, Omri, et al.
Published: (2025)
Large Language Models can Contrastively Refine their Generation for Better Sentence Representation Learning
by: Wang, Huiming, et al.
Published: (2023)
by: Wang, Huiming, et al.
Published: (2023)
Lngram: N-gram Conditional Memory in Latent Space
by: Zheng, Yunao, et al.
Published: (2026)
by: Zheng, Yunao, et al.
Published: (2026)
Learning to Reduce: Optimal Representations of Structured Data in Prompting Large Language Models
by: Lee, Younghun, et al.
Published: (2024)
by: Lee, Younghun, et al.
Published: (2024)
The Structured Output Benchmark: A Multi-Source Benchmark for Evaluating Structured Output Quality in Large Language Models
by: Singh, Abhinav Kumar, et al.
Published: (2026)
by: Singh, Abhinav Kumar, et al.
Published: (2026)
The Curious Decline of Linguistic Diversity: Training Language Models on Synthetic Text
by: Guo, Yanzhu, et al.
Published: (2023)
by: Guo, Yanzhu, et al.
Published: (2023)
Structural Perturbation in Large Language Model Representations through Recursive Symbolic Regeneration
by: Eaglewood, Kathlyn, et al.
Published: (2025)
by: Eaglewood, Kathlyn, et al.
Published: (2025)
Linguistic Minimal Pairs Elicit Linguistic Similarity in Large Language Models
by: Zhou, Xinyu, et al.
Published: (2024)
by: Zhou, Xinyu, et al.
Published: (2024)
DataGen: Unified Synthetic Dataset Generation via Large Language Models
by: Huang, Yue, et al.
Published: (2024)
by: Huang, Yue, et al.
Published: (2024)
Synthetic Data Generation Using Large Language Models: Advances in Text and Code
by: Nadas, Mihai, et al.
Published: (2025)
by: Nadas, Mihai, et al.
Published: (2025)
Decomposed Prompting: Probing Multilingual Linguistic Structure Knowledge in Large Language Models
by: Nie, Ercong, et al.
Published: (2024)
by: Nie, Ercong, et al.
Published: (2024)
From Architecture to Output: Structural Origins of Hallucination in Large Language Models and the Amplifying Role of Data
by: Sadi, Md. Rejaul Korim Sadi, et al.
Published: (2026)
by: Sadi, Md. Rejaul Korim Sadi, et al.
Published: (2026)
Infini-gram mini: Exact n-gram Search at the Internet Scale with FM-Index
by: Xu, Hao, et al.
Published: (2025)
by: Xu, Hao, et al.
Published: (2025)
Generative Linguistics, Large Language Models, and the Social Nature of Scientific Success
by: Hao, Sophie
Published: (2025)
by: Hao, Sophie
Published: (2025)
Evaluating Language Models as Synthetic Data Generators
by: Kim, Seungone, et al.
Published: (2024)
by: Kim, Seungone, et al.
Published: (2024)
Improving Clinical NLP Performance through Language Model-Generated Synthetic Clinical Data
by: Chen, Shan, et al.
Published: (2024)
by: Chen, Shan, et al.
Published: (2024)
Beyond N-gram: Data-Aware X-GRAM Extraction for Efficient Embedding Parameter Scaling
by: Chen, Yilong, et al.
Published: (2026)
by: Chen, Yilong, et al.
Published: (2026)
Oldies but Goldies: The Potential of Character N-grams for Romanian Texts
by: Lupsa, Dana, et al.
Published: (2025)
by: Lupsa, Dana, et al.
Published: (2025)
Synthetic Data Generation for Phrase Break Prediction with Large Language Model
by: Lee, Hoyeon, et al.
Published: (2025)
by: Lee, Hoyeon, et al.
Published: (2025)
HITgram: A Platform for Experimenting with n-gram Language Models
by: Dasgupta, Shibaranjani, et al.
Published: (2024)
by: Dasgupta, Shibaranjani, et al.
Published: (2024)
Linguistic Intelligence in Large Language Models for Telecommunications
by: Ahmed, Tasnim, et al.
Published: (2024)
by: Ahmed, Tasnim, et al.
Published: (2024)
Unveiling Linguistic Regions in Large Language Models
by: Zhang, Zhihao, et al.
Published: (2024)
by: Zhang, Zhihao, et al.
Published: (2024)
Benchmarking Linguistic Diversity of Large Language Models
by: Guo, Yanzhu, et al.
Published: (2024)
by: Guo, Yanzhu, et al.
Published: (2024)
Similar Items
-
N-gram Prediction and Word Difference Representations for Language Modeling
by: Heo, DongNyeong, et al.
Published: (2024) -
PILOT: Steering Synthetic Data Generation with Psychological & Linguistic Output Targeting
by: Cisar, Caitlin, et al.
Published: (2025) -
N-gram-like Language Models Predict Reading Time Best
by: Michaelov, James A., et al.
Published: (2026) -
Modelling Intertextuality with N-gram Embeddings
by: Xing, Yi
Published: (2025) -
Lossless Acceleration of Large Language Model via Adaptive N-gram Parallel Decoding
by: Ou, Jie, et al.
Published: (2024)