Saved in:
| Main Authors: | Merrill, William, Li, Yanhong, Romero, Tyler, Svete, Anej, Costello, Caia, Dasigi, Pradeep, Groeneveld, Dirk, Heineman, David, Kuehl, Bailey, Lambert, Nathan, Li, Chuan, Lo, Kyle, Malik, Saumya, Matusz, DJ, Minixhofer, Benjamin, Morrison, Jacob, Soldaini, Luca, Timbers, Finbarr, Walsh, Pete, Smith, Noah A., Hajishirzi, Hannaneh, Sabharwal, Ashish |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.03444 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Revisiting Padded Transformer Expressivity: Which Architectural Choices Matter and Which Don't
by: Svete, Anej, et al.
Published: (2026)
by: Svete, Anej, et al.
Published: (2026)
On the Reasoning Abilities of Masked Diffusion Language Models
by: Svete, Anej, et al.
Published: (2025)
by: Svete, Anej, et al.
Published: (2025)
Olmo 3
by: Olmo, Team, et al.
Published: (2025)
by: Olmo, Team, et al.
Published: (2025)
HREF: Human Response-Guided Evaluation of Instruction Following in Language Models
by: Lyu, Xinxi, et al.
Published: (2024)
by: Lyu, Xinxi, et al.
Published: (2024)
Generalizing Verifiable Instruction Following
by: Pyatkin, Valentina, et al.
Published: (2025)
by: Pyatkin, Valentina, et al.
Published: (2025)
Critical Batch Size Revisited: A Simple Empirical Approach to Large-Batch Language Model Training
by: Merrill, William, et al.
Published: (2025)
by: Merrill, William, et al.
Published: (2025)
FlexOlmo: Open Language Models for Flexible Data Use
by: Shi, Weijia, et al.
Published: (2025)
by: Shi, Weijia, et al.
Published: (2025)
Transformers Can Represent $n$-gram Language Models
by: Svete, Anej, et al.
Published: (2024)
by: Svete, Anej, et al.
Published: (2024)
Olmix: A Framework for Data Mixing Throughout LM Development
by: Chen, Mayee F., et al.
Published: (2026)
by: Chen, Mayee F., et al.
Published: (2026)
Merge to Learn: Efficiently Adding Skills to Language Models with Model Merging
by: Morrison, Jacob, et al.
Published: (2024)
by: Morrison, Jacob, et al.
Published: (2024)
Context-Free Recognition with Transformers
by: Jerad, Selim, et al.
Published: (2026)
by: Jerad, Selim, et al.
Published: (2026)
Meta-Reinforcement Learning with Self-Reflection for Agentic Search
by: Xiao, Teng, et al.
Published: (2026)
by: Xiao, Teng, et al.
Published: (2026)
ReFIT: Relevance Feedback from a Reranker during Inference
by: Reddy, Revanth Gangi, et al.
Published: (2023)
by: Reddy, Revanth Gangi, et al.
Published: (2023)
Answer, Assemble, Ace: Understanding How LMs Answer Multiple Choice Questions
by: Wiegreffe, Sarah, et al.
Published: (2024)
by: Wiegreffe, Sarah, et al.
Published: (2024)
OLMES: A Standard for Language Model Evaluations
by: Gu, Yuling, et al.
Published: (2024)
by: Gu, Yuling, et al.
Published: (2024)
Establishing Task Scaling Laws via Compute-Efficient Model Ladders
by: Bhagia, Akshita, et al.
Published: (2024)
by: Bhagia, Akshita, et al.
Published: (2024)
RewardBench 2: Advancing Reward Model Evaluation
by: Malik, Saumya, et al.
Published: (2025)
by: Malik, Saumya, et al.
Published: (2025)
Organize the Web: Constructing Domains Enhances Pre-Training Data Curation
by: Wettig, Alexander, et al.
Published: (2025)
by: Wettig, Alexander, et al.
Published: (2025)
On Efficiently Representing Regular Languages as RNNs
by: Svete, Anej, et al.
Published: (2024)
by: Svete, Anej, et al.
Published: (2024)
Gumbel Counterfactual Generation From Language Models
by: Ravfogel, Shauli, et al.
Published: (2024)
by: Ravfogel, Shauli, et al.
Published: (2024)
On the Representational Capacity of Recurrent Neural Language Models
by: Nowak, Franz, et al.
Published: (2023)
by: Nowak, Franz, et al.
Published: (2023)
Unique Hard Attention: A Tale of Two Sides
by: Jerad, Selim, et al.
Published: (2025)
by: Jerad, Selim, et al.
Published: (2025)
On the Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning
by: Nowak, Franz, et al.
Published: (2024)
by: Nowak, Franz, et al.
Published: (2024)
Think, Prune, Train, Improve: Scaling Reasoning without Scaling Models
by: Costello, Caia, et al.
Published: (2025)
by: Costello, Caia, et al.
Published: (2025)
Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback
by: Miranda, Lester James V., et al.
Published: (2024)
by: Miranda, Lester James V., et al.
Published: (2024)
TurnWise: The Gap between Single- and Multi-turn Language Model Capabilities
by: Graf, Victoria, et al.
Published: (2026)
by: Graf, Victoria, et al.
Published: (2026)
Lower Bounds on the Expressivity of Recurrent Neural Language Models
by: Svete, Anej, et al.
Published: (2024)
by: Svete, Anej, et al.
Published: (2024)
Why Are Linear RNNs More Parallelizable?
by: Merrill, William, et al.
Published: (2026)
by: Merrill, William, et al.
Published: (2026)
A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers
by: Merrill, William, et al.
Published: (2025)
by: Merrill, William, et al.
Published: (2025)
Exact Expressive Power of Transformers with Padding
by: Merrill, William, et al.
Published: (2025)
by: Merrill, William, et al.
Published: (2025)
The Expressive Power of Transformers with Chain of Thought
by: Merrill, William, et al.
Published: (2023)
by: Merrill, William, et al.
Published: (2023)
A Logic for Expressing Log-Precision Transformers
by: Merrill, William, et al.
Published: (2022)
by: Merrill, William, et al.
Published: (2022)
Signal and Noise: A Framework for Reducing Uncertainty in Language Model Evaluation
by: Heineman, David, et al.
Published: (2025)
by: Heineman, David, et al.
Published: (2025)
A Geometric Notion of Causal Probing
by: Guerner, Clément, et al.
Published: (2023)
by: Guerner, Clément, et al.
Published: (2023)
Can Transformers Learn $n$-gram Language Models?
by: Svete, Anej, et al.
Published: (2024)
by: Svete, Anej, et al.
Published: (2024)
Formal Aspects of Language Modeling
by: Cotterell, Ryan, et al.
Published: (2023)
by: Cotterell, Ryan, et al.
Published: (2023)
APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference
by: Zhao, Bowen, et al.
Published: (2024)
by: Zhao, Bowen, et al.
Published: (2024)
Paloma: A Benchmark for Evaluating Language Model Fit
by: Magnusson, Ian, et al.
Published: (2023)
by: Magnusson, Ian, et al.
Published: (2023)
2 OLMo 2 Furious
by: OLMo, Team, et al.
Published: (2024)
by: OLMo, Team, et al.
Published: (2024)
What's In My Big Data?
by: Elazar, Yanai, et al.
Published: (2023)
by: Elazar, Yanai, et al.
Published: (2023)
Similar Items
-
Revisiting Padded Transformer Expressivity: Which Architectural Choices Matter and Which Don't
by: Svete, Anej, et al.
Published: (2026) -
On the Reasoning Abilities of Masked Diffusion Language Models
by: Svete, Anej, et al.
Published: (2025) -
Olmo 3
by: Olmo, Team, et al.
Published: (2025) -
HREF: Human Response-Guided Evaluation of Instruction Following in Language Models
by: Lyu, Xinxi, et al.
Published: (2024) -
Generalizing Verifiable Instruction Following
by: Pyatkin, Valentina, et al.
Published: (2025)