:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Shi, Weijia, Min, Sewon, Lomeli, Maria, Zhou, Chunting, Li, Margaret, Szilvasy, Gergely, James, Rich, Lin, Xi Victoria, Smith, Noah A., Zettlemoyer, Luke, Yih, Scott, Lewis, Mike
Format:	Preprint
Published:	2023
Subjects:	Computation and Language Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2310.10638
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

RA-DIT: Retrieval-Augmented Dual Instruction Tuning
by: Lin, Xi Victoria, et al.
Published: (2023)

LMFusion: Adapting Pretrained Language Models for Multimodal Generation
by: Shi, Weijia, et al.
Published: (2024)

SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore
by: Min, Sewon, et al.
Published: (2023)

Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models
by: Liang, Weixin, et al.
Published: (2024)

Self-Pruned Key-Value Attention: Learning When to Write by Predicting Future Utility
by: Szilvasy, Gergely, et al.
Published: (2026)

Scaling Retrieval-Based Language Models with a Trillion-Token Datastore
by: Shao, Rulin, et al.
Published: (2024)

Self-Alignment with Instruction Backtranslation
by: Li, Xian, et al.
Published: (2023)

Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens
by: Liu, Jiacheng, et al.
Published: (2024)

Instruction-tuned Language Models are Better Knowledge Learners
by: Jiang, Zhengbao, et al.
Published: (2024)

Detecting Pretraining Data from Large Language Models
by: Shi, Weijia, et al.
Published: (2023)

Do Membership Inference Attacks Work on Large Language Models?
by: Duan, Michael, et al.
Published: (2024)

Inference-time sparse attention with asymmetric indexing
by: Mazaré, Pierre-Emmanuel, et al.
Published: (2025)

FlexOlmo: Open Language Models for Flexible Data Use
by: Shi, Weijia, et al.
Published: (2025)

ReasonIR: Training Retrievers for Reasoning Tasks
by: Shao, Rulin, et al.
Published: (2025)

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
by: Ma, Xuezhe, et al.
Published: (2024)

Vector search with small radiuses
by: Szilvasy, Gergely, et al.
Published: (2024)

Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models
by: Hu, Yushi, et al.
Published: (2024)

Evaluating Copyright Takedown Methods for Language Models
by: Wei, Boyi, et al.
Published: (2024)

Evaluation data contamination in LLMs: how do we measure it and (when) does it matter?
by: Singh, Aaditya K., et al.
Published: (2024)

MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts
by: Lin, Xi Victoria, et al.
Published: (2024)

EMO: Pretraining Mixture of Experts for Emergent Modularity
by: Wang, Ryan, et al.
Published: (2026)

CAT: Content-Adaptive Image Tokenization
by: Shen, Junhong, et al.
Published: (2025)

ALMA: Alignment with Minimal Annotation
by: Yasunaga, Michihiro, et al.
Published: (2024)

(Mis)Fitting: A Survey of Scaling Laws
by: Li, Margaret, et al.
Published: (2025)

Byte Latent Transformer: Patches Scale Better Than Tokens
by: Pagnoni, Artidoro, et al.
Published: (2024)

The Faiss library
by: Douze, Matthijs, et al.
Published: (2024)

Short window attention enables long-term memorization
by: Cabannes, Loïc, et al.
Published: (2025)

Beyond Language Modeling: An Exploration of Multimodal Pretraining
by: Tong, Shengbang, et al.
Published: (2026)

Breaking the Curse of Multilinguality with Cross-lingual Expert Language Models
by: Blevins, Terra, et al.
Published: (2024)

Does Liking Yellow Imply Driving a School Bus? Semantic Leakage in Language Models
by: Gonen, Hila, et al.
Published: (2024)

Demystifying Prompts in Language Models via Perplexity Estimation
by: Gonen, Hila, et al.
Published: (2022)

MUSE: Machine Unlearning Six-Way Evaluation for Language Models
by: Shi, Weijia, et al.
Published: (2024)

Stochastic activations
by: Lomeli, Maria, et al.
Published: (2025)

Compute Optimal Tokenization
by: Limisiewicz, Tomasz, et al.
Published: (2026)

Slicing and Dicing: Configuring Optimal Mixtures of Experts
by: Li, Margaret, et al.
Published: (2026)

The Hidden Cost of Thinking: Energy Use and Environmental Impact of LMs Beyond Pretraining
by: Morrison, Jacob, et al.
Published: (2026)

CopyBench: Measuring Literal and Non-Literal Reproduction of Copyright-Protected Text in Language Model Generation
by: Chen, Tong, et al.
Published: (2024)

Anchored Decoding: Provably Reducing Copyright Risk for Any Language Model
by: He, Jacqueline, et al.
Published: (2026)

Memory Layers at Scale
by: Berges, Vincent-Pierre, et al.
Published: (2024)

Train Separately, Merge Together: Modular Post-Training with Mixture-of-Experts
by: Morrison, Jacob, et al.
Published: (2026)