Saved in:
| Main Authors: | Dagan, Gautier, Synnaeve, Gabriel, Rozière, Baptiste |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.01035 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Better & Faster Large Language Models via Multi-token Prediction
by: Gloeckle, Fabian, et al.
Published: (2024)
by: Gloeckle, Fabian, et al.
Published: (2024)
BigO(Bench) -- Can LLMs Generate Code with Controlled Time and Space Complexity?
by: Chambon, Pierre, et al.
Published: (2025)
by: Chambon, Pierre, et al.
Published: (2025)
TestGenEval: A Real World Unit Test Generation and Test Completion Benchmark
by: Jain, Kush, et al.
Published: (2024)
by: Jain, Kush, et al.
Published: (2024)
Meta Large Language Model Compiler: Foundation Models of Compiler Optimization
by: Cummins, Chris, et al.
Published: (2024)
by: Cummins, Chris, et al.
Published: (2024)
CAST: Cross-modal Alignment Similarity Test for Vision Language Models
by: Dagan, Gautier, et al.
Published: (2024)
by: Dagan, Gautier, et al.
Published: (2024)
Plancraft: an evaluation dataset for planning with LLM agents
by: Dagan, Gautier, et al.
Published: (2024)
by: Dagan, Gautier, et al.
Published: (2024)
$How^{2}$: How to learn from procedural How-to questions
by: Dagan, Gautier, et al.
Published: (2025)
by: Dagan, Gautier, et al.
Published: (2025)
In-domain SSL pre-training and streaming ASR
by: Duret, Jarod, et al.
Published: (2025)
by: Duret, Jarod, et al.
Published: (2025)
Let your LLM generate a few tokens and you will reduce the need for retrieval
by: Déjean, Hervé
Published: (2024)
by: Déjean, Hervé
Published: (2024)
Don't Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning
by: Hassid, Michael, et al.
Published: (2025)
by: Hassid, Michael, et al.
Published: (2025)
Does your data spark joy? Performance gains from domain upsampling at the end of training
by: Blakeney, Cody, et al.
Published: (2024)
by: Blakeney, Cody, et al.
Published: (2024)
Do we really have to filter out random noise in pre-training data for language models?
by: Ru, Jinghan, et al.
Published: (2025)
by: Ru, Jinghan, et al.
Published: (2025)
Is Sanskrit the most token-efficient language? A quantitative study using GPT, Gemini, and SentencePiece
by: Kumar, Anshul
Published: (2026)
by: Kumar, Anshul
Published: (2026)
What Makes Large Language Models Reason in (Multi-Turn) Code Generation?
by: Zheng, Kunhao, et al.
Published: (2024)
by: Zheng, Kunhao, et al.
Published: (2024)
The KoLMogorov Test: Compression by Code Generation
by: Yoran, Ori, et al.
Published: (2025)
by: Yoran, Ori, et al.
Published: (2025)
Towards Nepali-language LLMs: Efficient GPT training with a Nepali BPE tokenizer
by: Shrestha, Adarsha, et al.
Published: (2025)
by: Shrestha, Adarsha, et al.
Published: (2025)
CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution
by: Gu, Alex, et al.
Published: (2024)
by: Gu, Alex, et al.
Published: (2024)
Pre-training data selection for biomedical domain adaptation using journal impact metrics
by: Laï-king, Mathieu, et al.
Published: (2024)
by: Laï-king, Mathieu, et al.
Published: (2024)
Adding LLMs to the psycholinguistic norming toolbox: A practical guide to getting the most out of human ratings
by: Conde, Javier, et al.
Published: (2025)
by: Conde, Javier, et al.
Published: (2025)
Where is the signal in tokenization space?
by: Geh, Renato Lui, et al.
Published: (2024)
by: Geh, Renato Lui, et al.
Published: (2024)
Why do LLMs attend to the first token?
by: Barbero, Federico, et al.
Published: (2025)
by: Barbero, Federico, et al.
Published: (2025)
Beyond Pairwise: Global Zero-shot Temporal Graph Generation
by: Eirew, Alon, et al.
Published: (2025)
by: Eirew, Alon, et al.
Published: (2025)
RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning
by: Gehring, Jonas, et al.
Published: (2024)
by: Gehring, Jonas, et al.
Published: (2024)
AnomaLLMy -- Detecting anomalous tokens in black-box LLMs through low-confidence single-token predictions
by: Witold, Waligóra
Published: (2024)
by: Witold, Waligóra
Published: (2024)
The pitfalls of next-token prediction
by: Bachmann, Gregor, et al.
Published: (2024)
by: Bachmann, Gregor, et al.
Published: (2024)
Looking beyond the next token
by: Thankaraj, Abitha, et al.
Published: (2025)
by: Thankaraj, Abitha, et al.
Published: (2025)
Interpreting token compositionality in LLMs: A robustness analysis
by: Aljaafari, Nura, et al.
Published: (2024)
by: Aljaafari, Nura, et al.
Published: (2024)
Comparative analysis of subword tokenization approaches for Indian languages
by: Das, Sudhansu Bala, et al.
Published: (2025)
by: Das, Sudhansu Bala, et al.
Published: (2025)
Contextual morphologically-guided tokenization for Latin encoder models
by: Hudspeth, Marisa, et al.
Published: (2025)
by: Hudspeth, Marisa, et al.
Published: (2025)
Continual Pre-training of MoEs: How robust is your router?
by: Thérien, Benjamin, et al.
Published: (2025)
by: Thérien, Benjamin, et al.
Published: (2025)
Code Llama: Open Foundation Models for Code
by: Rozière, Baptiste, et al.
Published: (2023)
by: Rozière, Baptiste, et al.
Published: (2023)
Drop your Decoder: Pre-training with Bag-of-Word Prediction for Dense Passage Retrieval
by: Ma, Guangyuan, et al.
Published: (2024)
by: Ma, Guangyuan, et al.
Published: (2024)
COVE: COntext and VEracity prediction for out-of-context images
by: Tonglet, Jonathan, et al.
Published: (2025)
by: Tonglet, Jonathan, et al.
Published: (2025)
Detecting harassment and defamation in cyberbullying with emotion-adaptive training
by: Yi, Peiling, et al.
Published: (2025)
by: Yi, Peiling, et al.
Published: (2025)
On multi-token prediction for efficient LLM inference
by: Mehra, Somesh, et al.
Published: (2025)
by: Mehra, Somesh, et al.
Published: (2025)
Extrapolative Weight Averaging Reveals Correctness-Efficiency Frontiers in Code RL
by: Zheng, Kunhao, et al.
Published: (2026)
by: Zheng, Kunhao, et al.
Published: (2026)
Replaying pre-training data improves fine-tuning
by: Kotha, Suhas, et al.
Published: (2026)
by: Kotha, Suhas, et al.
Published: (2026)
Evaluating the performance of state-of-the-art esg domain-specific pre-trained large language models in text classification against existing models and traditional machine learning techniques
by: Chung, Tin Yuet, et al.
Published: (2024)
by: Chung, Tin Yuet, et al.
Published: (2024)
Fine-tuning large language models for domain adaptation: Exploration of training strategies, scaling, model merging and synergistic capabilities
by: Lu, Wei, et al.
Published: (2024)
by: Lu, Wei, et al.
Published: (2024)
EventFull: Complete and Consistent Event Relation Annotation
by: Eirew, Alon, et al.
Published: (2024)
by: Eirew, Alon, et al.
Published: (2024)
Similar Items
-
Better & Faster Large Language Models via Multi-token Prediction
by: Gloeckle, Fabian, et al.
Published: (2024) -
BigO(Bench) -- Can LLMs Generate Code with Controlled Time and Space Complexity?
by: Chambon, Pierre, et al.
Published: (2025) -
TestGenEval: A Real World Unit Test Generation and Test Completion Benchmark
by: Jain, Kush, et al.
Published: (2024) -
Meta Large Language Model Compiler: Foundation Models of Compiler Optimization
by: Cummins, Chris, et al.
Published: (2024) -
CAST: Cross-modal Alignment Similarity Test for Vision Language Models
by: Dagan, Gautier, et al.
Published: (2024)