Saved in:
| Main Authors: | Greydanus, Sam, Wimpee, Zachary |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.00051 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Scaling Down Deep Learning with MNIST-1D
by: Greydanus, Sam, et al.
Published: (2020)
by: Greydanus, Sam, et al.
Published: (2020)
When Models Examine Themselves: Vocabulary-Activation Correspondence in Self-Referential Processing
by: Dadfar, Zachary Pedram
Published: (2026)
by: Dadfar, Zachary Pedram
Published: (2026)
Proving that Cryptic Crossword Clue Answers are Correct
by: Andrews, Martin, et al.
Published: (2024)
by: Andrews, Martin, et al.
Published: (2024)
Efficient Benchmarking Is Just Feature Selection and Multiple Regression
by: Bowyer, Sam, et al.
Published: (2026)
by: Bowyer, Sam, et al.
Published: (2026)
Towards Detecting Contextual Real-Time Toxicity for In-Game Chat
by: Yang, Zachary, et al.
Published: (2023)
by: Yang, Zachary, et al.
Published: (2023)
LLM-Select: Feature Selection with Large Language Models
by: Jeong, Daniel P., et al.
Published: (2024)
by: Jeong, Daniel P., et al.
Published: (2024)
TPTT: Transforming Pretrained Transformers into Titans
by: Furfaro, Fabien
Published: (2025)
by: Furfaro, Fabien
Published: (2025)
Personalized Language Modeling from Personalized Human Feedback
by: Li, Xinyu, et al.
Published: (2024)
by: Li, Xinyu, et al.
Published: (2024)
Birdie: Advancing State Space Models with Reward-Driven Objectives and Curricula
by: Blouir, Sam, et al.
Published: (2024)
by: Blouir, Sam, et al.
Published: (2024)
Medical Adaptation of Large Language and Vision-Language Models: Are We Making Progress?
by: Jeong, Daniel P., et al.
Published: (2024)
by: Jeong, Daniel P., et al.
Published: (2024)
Evaluating the Factuality of Zero-shot Summarizers Across Varied Domains
by: Ramprasad, Sanjana, et al.
Published: (2024)
by: Ramprasad, Sanjana, et al.
Published: (2024)
DINT Transformer
by: Cang, Yueyang, et al.
Published: (2025)
by: Cang, Yueyang, et al.
Published: (2025)
Revisiting the Scaling Properties of Downstream Metrics in Large Language Model Training
by: Krajewski, Jakub, et al.
Published: (2025)
by: Krajewski, Jakub, et al.
Published: (2025)
VeRO: An Evaluation Harness for Agents to Optimize Agents
by: Ursekar, Varun, et al.
Published: (2026)
by: Ursekar, Varun, et al.
Published: (2026)
The Belief State Transformer
by: Hu, Edward S., et al.
Published: (2024)
by: Hu, Edward S., et al.
Published: (2024)
Three-Phase Transformer
by: Ayyash, Mohammad R. Abu
Published: (2026)
by: Ayyash, Mohammad R. Abu
Published: (2026)
The Limited Impact of Medical Adaptation of Large Language and Vision-Language Models
by: Jeong, Daniel P., et al.
Published: (2024)
by: Jeong, Daniel P., et al.
Published: (2024)
Selective Attention Improves Transformer
by: Leviathan, Yaniv, et al.
Published: (2024)
by: Leviathan, Yaniv, et al.
Published: (2024)
Fast Byte Latent Transformer
by: Kallini, Julie, et al.
Published: (2026)
by: Kallini, Julie, et al.
Published: (2026)
Algorithmic Capabilities of Random Transformers
by: Zhong, Ziqian, et al.
Published: (2024)
by: Zhong, Ziqian, et al.
Published: (2024)
Transformers Struggle to Learn to Search
by: Saparov, Abulhair, et al.
Published: (2024)
by: Saparov, Abulhair, et al.
Published: (2024)
An Evolved Universal Transformer Memory
by: Cetin, Edoardo, et al.
Published: (2024)
by: Cetin, Edoardo, et al.
Published: (2024)
On the Ability of Transformers to Verify Plans
by: Sarrof, Yash, et al.
Published: (2026)
by: Sarrof, Yash, et al.
Published: (2026)
Your Transformer is Secretly Linear
by: Razzhigaev, Anton, et al.
Published: (2024)
by: Razzhigaev, Anton, et al.
Published: (2024)
TransformLLM: Adapting Large Language Models via LLM-Transformed Reading Comprehension Text
by: Arbel, Iftach, et al.
Published: (2024)
by: Arbel, Iftach, et al.
Published: (2024)
On the Spatial Structure of Mixture-of-Experts in Transformers
by: Bershatsky, Daniel, et al.
Published: (2025)
by: Bershatsky, Daniel, et al.
Published: (2025)
Adaptive Computation Pruning for the Forgetting Transformer
by: Lin, Zhixuan, et al.
Published: (2025)
by: Lin, Zhixuan, et al.
Published: (2025)
Strategic Fusion Optimizes Transformer Compression
by: Rahman, Md Shoaibur
Published: (2025)
by: Rahman, Md Shoaibur
Published: (2025)
Transformer-Squared: Self-adaptive LLMs
by: Sun, Qi, et al.
Published: (2025)
by: Sun, Qi, et al.
Published: (2025)
The Role of Sparsity for Length Generalization in Transformers
by: Golowich, Noah, et al.
Published: (2025)
by: Golowich, Noah, et al.
Published: (2025)
An evolutionary perspective on modes of learning in Transformers
by: Ku, Alexander Y., et al.
Published: (2025)
by: Ku, Alexander Y., et al.
Published: (2025)
MICE for CATs: Model-Internal Confidence Estimation for Calibrating Agents with Tools
by: Subramani, Nishant, et al.
Published: (2025)
by: Subramani, Nishant, et al.
Published: (2025)
When Can Transformers Count to n?
by: Yehudai, Gilad, et al.
Published: (2024)
by: Yehudai, Gilad, et al.
Published: (2024)
Transformer Circuit Faithfulness Metrics are not Robust
by: Miller, Joseph, et al.
Published: (2024)
by: Miller, Joseph, et al.
Published: (2024)
Representing Rule-based Chatbots with Transformers
by: Friedman, Dan, et al.
Published: (2024)
by: Friedman, Dan, et al.
Published: (2024)
ALTA: Compiler-Based Analysis of Transformers
by: Shaw, Peter, et al.
Published: (2024)
by: Shaw, Peter, et al.
Published: (2024)
The Geometric Anatomy of Capability Acquisition in Transformers
by: Billa, Jayadev
Published: (2026)
by: Billa, Jayadev
Published: (2026)
Towards Infinite-Long Prefix in Transformer
by: Liang, Yingyu, et al.
Published: (2024)
by: Liang, Yingyu, et al.
Published: (2024)
Momentum Streams for Optimizer-Inspired Transformers
by: Gai, Jingchu, et al.
Published: (2026)
by: Gai, Jingchu, et al.
Published: (2026)
Does Transformer Interpretability Transfer to RNNs?
by: Paulo, Gonçalo, et al.
Published: (2024)
by: Paulo, Gonçalo, et al.
Published: (2024)
Similar Items
-
Scaling Down Deep Learning with MNIST-1D
by: Greydanus, Sam, et al.
Published: (2020) -
When Models Examine Themselves: Vocabulary-Activation Correspondence in Self-Referential Processing
by: Dadfar, Zachary Pedram
Published: (2026) -
Proving that Cryptic Crossword Clue Answers are Correct
by: Andrews, Martin, et al.
Published: (2024) -
Efficient Benchmarking Is Just Feature Selection and Multiple Regression
by: Bowyer, Sam, et al.
Published: (2026) -
Towards Detecting Contextual Real-Time Toxicity for In-Game Chat
by: Yang, Zachary, et al.
Published: (2023)