Saved in:
| Main Authors: | Gloeckle, Fabian, Idrissi, Badr Youbi, Rozière, Baptiste, Lopez-Paz, David, Synnaeve, Gabriel |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2404.19737 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Getting the most out of your tokenizer for pre-training and domain adaptation
by: Dagan, Gautier, et al.
Published: (2024)
by: Dagan, Gautier, et al.
Published: (2024)
From Bytes to Ideas: Language Modeling with Autoregressive U-Nets
by: Videau, Mathurin, et al.
Published: (2025)
by: Videau, Mathurin, et al.
Published: (2025)
BigO(Bench) -- Can LLMs Generate Code with Controlled Time and Space Complexity?
by: Chambon, Pierre, et al.
Published: (2025)
by: Chambon, Pierre, et al.
Published: (2025)
The KoLMogorov Test: Compression by Code Generation
by: Yoran, Ori, et al.
Published: (2025)
by: Yoran, Ori, et al.
Published: (2025)
Meta Large Language Model Compiler: Foundation Models of Compiler Optimization
by: Cummins, Chris, et al.
Published: (2024)
by: Cummins, Chris, et al.
Published: (2024)
Temperature Matters: Enhancing Watermark Robustness Against Paraphrasing Attacks
by: Idrissi, Badr Youbi, et al.
Published: (2025)
by: Idrissi, Badr Youbi, et al.
Published: (2025)
TestGenEval: A Real World Unit Test Generation and Test Completion Benchmark
by: Jain, Kush, et al.
Published: (2024)
by: Jain, Kush, et al.
Published: (2024)
What Makes Large Language Models Reason in (Multi-Turn) Code Generation?
by: Zheng, Kunhao, et al.
Published: (2024)
by: Zheng, Kunhao, et al.
Published: (2024)
Beyond Multi-Token Prediction: Pretraining LLMs with Future Summaries
by: Mahajan, Divyat, et al.
Published: (2025)
by: Mahajan, Divyat, et al.
Published: (2025)
Code Llama: Open Foundation Models for Code
by: Rozière, Baptiste, et al.
Published: (2023)
by: Rozière, Baptiste, et al.
Published: (2023)
Better, Faster: Harnessing Self-Improvement in Large Reasoning Models
by: Zhong, Qihuang, et al.
Published: (2026)
by: Zhong, Qihuang, et al.
Published: (2026)
TransNormerLLM: A Faster and Better Large Language Model with Improved TransNormer
by: Qin, Zhen, et al.
Published: (2023)
by: Qin, Zhen, et al.
Published: (2023)
LBPE: Long-token-first Tokenization to Improve Large Language Models
by: Lian, Haoran, et al.
Published: (2024)
by: Lian, Haoran, et al.
Published: (2024)
Investigating Cultural Alignment of Large Language Models
by: AlKhamissi, Badr, et al.
Published: (2024)
by: AlKhamissi, Badr, et al.
Published: (2024)
Faster and Better LLMs via Latency-Aware Test-Time Scaling
by: Wang, Zili, et al.
Published: (2025)
by: Wang, Zili, et al.
Published: (2025)
SkyLadder: Better and Faster Pretraining via Context Window Scheduling
by: Zhu, Tongyao, et al.
Published: (2025)
by: Zhu, Tongyao, et al.
Published: (2025)
Don't Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning
by: Hassid, Michael, et al.
Published: (2025)
by: Hassid, Michael, et al.
Published: (2025)
Fine-Grained Verifiers: Preference Modeling as Next-token Prediction in Vision-Language Alignment
by: Cui, Chenhang, et al.
Published: (2024)
by: Cui, Chenhang, et al.
Published: (2024)
Implicit Geometry of Next-token Prediction: From Language Sparsity Patterns to Model Representations
by: Zhao, Yize, et al.
Published: (2024)
by: Zhao, Yize, et al.
Published: (2024)
ProofOptimizer: Training Language Models to Simplify Proofs without Human Demonstrations
by: Gu, Alex, et al.
Published: (2025)
by: Gu, Alex, et al.
Published: (2025)
LLMPC: Large Language Model Predictive Control
by: Maher, Gabriel
Published: (2025)
by: Maher, Gabriel
Published: (2025)
Skip-Thinking: Chunk-wise Chain-of-Thought Distillation Enable Smaller Language Models to Reason Better and Faster
by: Chen, Xiao, et al.
Published: (2025)
by: Chen, Xiao, et al.
Published: (2025)
Interpretable Next-token Prediction via the Generalized Induction Head
by: Kim, Eunji, et al.
Published: (2024)
by: Kim, Eunji, et al.
Published: (2024)
Same evaluation, more tokens: On the effect of input length for machine translation evaluation using Large Language Models
by: Domhan, Tobias, et al.
Published: (2025)
by: Domhan, Tobias, et al.
Published: (2025)
KVPruner: Structural Pruning for Faster and Memory-Efficient Large Language Models
by: Lv, Bo, et al.
Published: (2024)
by: Lv, Bo, et al.
Published: (2024)
Towards a theory of morphology-driven marking in the lexicon: The case of the state
by: Idrissi, Mohamed El
Published: (2026)
by: Idrissi, Mohamed El
Published: (2026)
FlashDecoding++: Faster Large Language Model Inference on GPUs
by: Hong, Ke, et al.
Published: (2023)
by: Hong, Ke, et al.
Published: (2023)
Prediction hubs are context-informed frequent tokens in LLMs
by: Nielsen, Beatrix M. G., et al.
Published: (2025)
by: Nielsen, Beatrix M. G., et al.
Published: (2025)
FABSVer: Faster Training and Better Self-Verification for LLM Mathematical Reasoning
by: Pan, Haihui, et al.
Published: (2026)
by: Pan, Haihui, et al.
Published: (2026)
Prune as You Generate: Online Rollout Pruning for Faster and Better RLVR
by: Xu, Haobo, et al.
Published: (2026)
by: Xu, Haobo, et al.
Published: (2026)
ReGATE: Learning Faster and Better with Fewer Tokens in MLLMs
by: Li, Chaoyu, et al.
Published: (2025)
by: Li, Chaoyu, et al.
Published: (2025)
Byte-token Enhanced Language Models for Temporal Point Processes Analysis
by: Kong, Quyu, et al.
Published: (2025)
by: Kong, Quyu, et al.
Published: (2025)
Rethinking Tokenization: Crafting Better Tokenizers for Large Language Models
by: Yang, Jinbiao
Published: (2024)
by: Yang, Jinbiao
Published: (2024)
Debate or Vote: Which Yields Better Decisions in Multi-Agent Large Language Models?
by: Choi, Hyeong Kyu, et al.
Published: (2025)
by: Choi, Hyeong Kyu, et al.
Published: (2025)
Rational Metareasoning for Large Language Models
by: De Sabbata, C. Nicolò, et al.
Published: (2024)
by: De Sabbata, C. Nicolò, et al.
Published: (2024)
CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution
by: Gu, Alex, et al.
Published: (2024)
by: Gu, Alex, et al.
Published: (2024)
MTP-S2UT: Enhancing Speech-to-Speech Translation Quality with Multi-token Prediction
by: Wang, Jianjin, et al.
Published: (2025)
by: Wang, Jianjin, et al.
Published: (2025)
Automatic Textbook Formalization
by: Gloeckle, Fabian, et al.
Published: (2026)
by: Gloeckle, Fabian, et al.
Published: (2026)
SentenceVAE: Enable Next-sentence Prediction for Large Language Models with Faster Speed, Higher Accuracy and Longer Context
by: An, Hongjun, et al.
Published: (2024)
by: An, Hongjun, et al.
Published: (2024)
Better Call GPT, Comparing Large Language Models Against Lawyers
by: Martin, Lauren, et al.
Published: (2024)
by: Martin, Lauren, et al.
Published: (2024)
Similar Items
-
Getting the most out of your tokenizer for pre-training and domain adaptation
by: Dagan, Gautier, et al.
Published: (2024) -
From Bytes to Ideas: Language Modeling with Autoregressive U-Nets
by: Videau, Mathurin, et al.
Published: (2025) -
BigO(Bench) -- Can LLMs Generate Code with Controlled Time and Space Complexity?
by: Chambon, Pierre, et al.
Published: (2025) -
The KoLMogorov Test: Compression by Code Generation
by: Yoran, Ori, et al.
Published: (2025) -
Meta Large Language Model Compiler: Foundation Models of Compiler Optimization
by: Cummins, Chris, et al.
Published: (2024)