:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Dagan, Gautier, Synnaeve, Gabriel, Rozière, Baptiste
Format:	Preprint
Published:	2024
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2402.01035
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Better & Faster Large Language Models via Multi-token Prediction
by: Gloeckle, Fabian, et al.
Published: (2024)

BigO(Bench) -- Can LLMs Generate Code with Controlled Time and Space Complexity?
by: Chambon, Pierre, et al.
Published: (2025)

TestGenEval: A Real World Unit Test Generation and Test Completion Benchmark
by: Jain, Kush, et al.
Published: (2024)

Meta Large Language Model Compiler: Foundation Models of Compiler Optimization
by: Cummins, Chris, et al.
Published: (2024)

CAST: Cross-modal Alignment Similarity Test for Vision Language Models
by: Dagan, Gautier, et al.
Published: (2024)

Plancraft: an evaluation dataset for planning with LLM agents
by: Dagan, Gautier, et al.
Published: (2024)

$How^{2}$: How to learn from procedural How-to questions
by: Dagan, Gautier, et al.
Published: (2025)

In-domain SSL pre-training and streaming ASR
by: Duret, Jarod, et al.
Published: (2025)

Let your LLM generate a few tokens and you will reduce the need for retrieval
by: Déjean, Hervé
Published: (2024)

Don't Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning
by: Hassid, Michael, et al.
Published: (2025)

Does your data spark joy? Performance gains from domain upsampling at the end of training
by: Blakeney, Cody, et al.
Published: (2024)

Do we really have to filter out random noise in pre-training data for language models?
by: Ru, Jinghan, et al.
Published: (2025)

Is Sanskrit the most token-efficient language? A quantitative study using GPT, Gemini, and SentencePiece
by: Kumar, Anshul
Published: (2026)

What Makes Large Language Models Reason in (Multi-Turn) Code Generation?
by: Zheng, Kunhao, et al.
Published: (2024)

The KoLMogorov Test: Compression by Code Generation
by: Yoran, Ori, et al.
Published: (2025)

Towards Nepali-language LLMs: Efficient GPT training with a Nepali BPE tokenizer
by: Shrestha, Adarsha, et al.
Published: (2025)

CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution
by: Gu, Alex, et al.
Published: (2024)

Pre-training data selection for biomedical domain adaptation using journal impact metrics
by: Laï-king, Mathieu, et al.
Published: (2024)

Adding LLMs to the psycholinguistic norming toolbox: A practical guide to getting the most out of human ratings
by: Conde, Javier, et al.
Published: (2025)

Where is the signal in tokenization space?
by: Geh, Renato Lui, et al.
Published: (2024)

Why do LLMs attend to the first token?
by: Barbero, Federico, et al.
Published: (2025)

Beyond Pairwise: Global Zero-shot Temporal Graph Generation
by: Eirew, Alon, et al.
Published: (2025)

RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning
by: Gehring, Jonas, et al.
Published: (2024)

AnomaLLMy -- Detecting anomalous tokens in black-box LLMs through low-confidence single-token predictions
by: Witold, Waligóra
Published: (2024)

The pitfalls of next-token prediction
by: Bachmann, Gregor, et al.
Published: (2024)

Looking beyond the next token
by: Thankaraj, Abitha, et al.
Published: (2025)

Interpreting token compositionality in LLMs: A robustness analysis
by: Aljaafari, Nura, et al.
Published: (2024)

Comparative analysis of subword tokenization approaches for Indian languages
by: Das, Sudhansu Bala, et al.
Published: (2025)

Contextual morphologically-guided tokenization for Latin encoder models
by: Hudspeth, Marisa, et al.
Published: (2025)

Continual Pre-training of MoEs: How robust is your router?
by: Thérien, Benjamin, et al.
Published: (2025)

Code Llama: Open Foundation Models for Code
by: Rozière, Baptiste, et al.
Published: (2023)

Drop your Decoder: Pre-training with Bag-of-Word Prediction for Dense Passage Retrieval
by: Ma, Guangyuan, et al.
Published: (2024)

COVE: COntext and VEracity prediction for out-of-context images
by: Tonglet, Jonathan, et al.
Published: (2025)

Detecting harassment and defamation in cyberbullying with emotion-adaptive training
by: Yi, Peiling, et al.
Published: (2025)

On multi-token prediction for efficient LLM inference
by: Mehra, Somesh, et al.
Published: (2025)

Extrapolative Weight Averaging Reveals Correctness-Efficiency Frontiers in Code RL
by: Zheng, Kunhao, et al.
Published: (2026)

Replaying pre-training data improves fine-tuning
by: Kotha, Suhas, et al.
Published: (2026)

Evaluating the performance of state-of-the-art esg domain-specific pre-trained large language models in text classification against existing models and traditional machine learning techniques
by: Chung, Tin Yuet, et al.
Published: (2024)

Fine-tuning large language models for domain adaptation: Exploration of training strategies, scaling, model merging and synergistic capabilities
by: Lu, Wei, et al.
Published: (2024)

EventFull: Complete and Consistent Event Relation Annotation
by: Eirew, Alon, et al.
Published: (2024)