Saved in:
| Main Authors: | Shi, Weijia, Min, Sewon, Lomeli, Maria, Zhou, Chunting, Li, Margaret, Szilvasy, Gergely, James, Rich, Lin, Xi Victoria, Smith, Noah A., Zettlemoyer, Luke, Yih, Scott, Lewis, Mike |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2310.10638 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
RA-DIT: Retrieval-Augmented Dual Instruction Tuning
by: Lin, Xi Victoria, et al.
Published: (2023)
by: Lin, Xi Victoria, et al.
Published: (2023)
LMFusion: Adapting Pretrained Language Models for Multimodal Generation
by: Shi, Weijia, et al.
Published: (2024)
by: Shi, Weijia, et al.
Published: (2024)
SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore
by: Min, Sewon, et al.
Published: (2023)
by: Min, Sewon, et al.
Published: (2023)
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models
by: Liang, Weixin, et al.
Published: (2024)
by: Liang, Weixin, et al.
Published: (2024)
Self-Pruned Key-Value Attention: Learning When to Write by Predicting Future Utility
by: Szilvasy, Gergely, et al.
Published: (2026)
by: Szilvasy, Gergely, et al.
Published: (2026)
Scaling Retrieval-Based Language Models with a Trillion-Token Datastore
by: Shao, Rulin, et al.
Published: (2024)
by: Shao, Rulin, et al.
Published: (2024)
Self-Alignment with Instruction Backtranslation
by: Li, Xian, et al.
Published: (2023)
by: Li, Xian, et al.
Published: (2023)
Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens
by: Liu, Jiacheng, et al.
Published: (2024)
by: Liu, Jiacheng, et al.
Published: (2024)
Instruction-tuned Language Models are Better Knowledge Learners
by: Jiang, Zhengbao, et al.
Published: (2024)
by: Jiang, Zhengbao, et al.
Published: (2024)
Detecting Pretraining Data from Large Language Models
by: Shi, Weijia, et al.
Published: (2023)
by: Shi, Weijia, et al.
Published: (2023)
Do Membership Inference Attacks Work on Large Language Models?
by: Duan, Michael, et al.
Published: (2024)
by: Duan, Michael, et al.
Published: (2024)
Inference-time sparse attention with asymmetric indexing
by: Mazaré, Pierre-Emmanuel, et al.
Published: (2025)
by: Mazaré, Pierre-Emmanuel, et al.
Published: (2025)
FlexOlmo: Open Language Models for Flexible Data Use
by: Shi, Weijia, et al.
Published: (2025)
by: Shi, Weijia, et al.
Published: (2025)
ReasonIR: Training Retrievers for Reasoning Tasks
by: Shao, Rulin, et al.
Published: (2025)
by: Shao, Rulin, et al.
Published: (2025)
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
by: Ma, Xuezhe, et al.
Published: (2024)
by: Ma, Xuezhe, et al.
Published: (2024)
Vector search with small radiuses
by: Szilvasy, Gergely, et al.
Published: (2024)
by: Szilvasy, Gergely, et al.
Published: (2024)
Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models
by: Hu, Yushi, et al.
Published: (2024)
by: Hu, Yushi, et al.
Published: (2024)
Evaluating Copyright Takedown Methods for Language Models
by: Wei, Boyi, et al.
Published: (2024)
by: Wei, Boyi, et al.
Published: (2024)
Evaluation data contamination in LLMs: how do we measure it and (when) does it matter?
by: Singh, Aaditya K., et al.
Published: (2024)
by: Singh, Aaditya K., et al.
Published: (2024)
MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts
by: Lin, Xi Victoria, et al.
Published: (2024)
by: Lin, Xi Victoria, et al.
Published: (2024)
EMO: Pretraining Mixture of Experts for Emergent Modularity
by: Wang, Ryan, et al.
Published: (2026)
by: Wang, Ryan, et al.
Published: (2026)
CAT: Content-Adaptive Image Tokenization
by: Shen, Junhong, et al.
Published: (2025)
by: Shen, Junhong, et al.
Published: (2025)
ALMA: Alignment with Minimal Annotation
by: Yasunaga, Michihiro, et al.
Published: (2024)
by: Yasunaga, Michihiro, et al.
Published: (2024)
(Mis)Fitting: A Survey of Scaling Laws
by: Li, Margaret, et al.
Published: (2025)
by: Li, Margaret, et al.
Published: (2025)
Byte Latent Transformer: Patches Scale Better Than Tokens
by: Pagnoni, Artidoro, et al.
Published: (2024)
by: Pagnoni, Artidoro, et al.
Published: (2024)
The Faiss library
by: Douze, Matthijs, et al.
Published: (2024)
by: Douze, Matthijs, et al.
Published: (2024)
Short window attention enables long-term memorization
by: Cabannes, Loïc, et al.
Published: (2025)
by: Cabannes, Loïc, et al.
Published: (2025)
Beyond Language Modeling: An Exploration of Multimodal Pretraining
by: Tong, Shengbang, et al.
Published: (2026)
by: Tong, Shengbang, et al.
Published: (2026)
Breaking the Curse of Multilinguality with Cross-lingual Expert Language Models
by: Blevins, Terra, et al.
Published: (2024)
by: Blevins, Terra, et al.
Published: (2024)
Does Liking Yellow Imply Driving a School Bus? Semantic Leakage in Language Models
by: Gonen, Hila, et al.
Published: (2024)
by: Gonen, Hila, et al.
Published: (2024)
Demystifying Prompts in Language Models via Perplexity Estimation
by: Gonen, Hila, et al.
Published: (2022)
by: Gonen, Hila, et al.
Published: (2022)
MUSE: Machine Unlearning Six-Way Evaluation for Language Models
by: Shi, Weijia, et al.
Published: (2024)
by: Shi, Weijia, et al.
Published: (2024)
Stochastic activations
by: Lomeli, Maria, et al.
Published: (2025)
by: Lomeli, Maria, et al.
Published: (2025)
Compute Optimal Tokenization
by: Limisiewicz, Tomasz, et al.
Published: (2026)
by: Limisiewicz, Tomasz, et al.
Published: (2026)
Slicing and Dicing: Configuring Optimal Mixtures of Experts
by: Li, Margaret, et al.
Published: (2026)
by: Li, Margaret, et al.
Published: (2026)
The Hidden Cost of Thinking: Energy Use and Environmental Impact of LMs Beyond Pretraining
by: Morrison, Jacob, et al.
Published: (2026)
by: Morrison, Jacob, et al.
Published: (2026)
CopyBench: Measuring Literal and Non-Literal Reproduction of Copyright-Protected Text in Language Model Generation
by: Chen, Tong, et al.
Published: (2024)
by: Chen, Tong, et al.
Published: (2024)
Anchored Decoding: Provably Reducing Copyright Risk for Any Language Model
by: He, Jacqueline, et al.
Published: (2026)
by: He, Jacqueline, et al.
Published: (2026)
Memory Layers at Scale
by: Berges, Vincent-Pierre, et al.
Published: (2024)
by: Berges, Vincent-Pierre, et al.
Published: (2024)
Train Separately, Merge Together: Modular Post-Training with Mixture-of-Experts
by: Morrison, Jacob, et al.
Published: (2026)
by: Morrison, Jacob, et al.
Published: (2026)
Similar Items
-
RA-DIT: Retrieval-Augmented Dual Instruction Tuning
by: Lin, Xi Victoria, et al.
Published: (2023) -
LMFusion: Adapting Pretrained Language Models for Multimodal Generation
by: Shi, Weijia, et al.
Published: (2024) -
SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore
by: Min, Sewon, et al.
Published: (2023) -
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models
by: Liang, Weixin, et al.
Published: (2024) -
Self-Pruned Key-Value Attention: Learning When to Write by Predicting Future Utility
by: Szilvasy, Gergely, et al.
Published: (2026)