:: Library Catalog

صورة الغلاف

محفوظ في:

التفاصيل البيبلوغرافية
المؤلفون الرئيسيون:	Panigrahi, Abhishek, Saunshi, Nikunj, Lyu, Kaifeng, Miryoosefi, Sobhan, Reddi, Sashank, Kale, Satyen, Kumar, Sanjiv
التنسيق:	Preprint
منشور في:	2024
الموضوعات:	Computation and Language Machine Learning
الوصول للمادة أونلاين:	https://arxiv.org/abs/2402.05913
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!

مواد مشابهة

Landscape-Aware Growing: The Power of a Little LAG
حسب: Karp, Stefani, وآخرون
منشور في: (2024)

On the Inductive Bias of Stacking Towards Improving Reasoning
حسب: Saunshi, Nikunj, وآخرون
منشور في: (2024)

Reasoning with Latent Thoughts: On the Power of Looped Transformers
حسب: Saunshi, Nikunj, وآخرون
منشور في: (2025)

On the Role of Depth and Looping for In-Context Learning with Task Diversity
حسب: Gatmiry, Khashayar, وآخرون
منشور في: (2024)

Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?
حسب: Gatmiry, Khashayar, وآخرون
منشور في: (2024)

A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs
حسب: Rawat, Ankit Singh, وآخرون
منشور في: (2024)

AdaBoN: Adaptive Best-of-N Alignment
حسب: Raman, Vinod, وآخرون
منشور في: (2025)

The Heuristic Core: Understanding Subnetwork Generalization in Pretrained Language Models
حسب: Bhaskar, Adithya, وآخرون
منشور في: (2024)

Discovering Knowledge-Critical Subnetworks in Pretrained Language Models
حسب: Bayazit, Deniz, وآخرون
منشور في: (2023)

Structured Preconditioners in Adaptive Optimization: A Unified Analysis
حسب: Xie, Shuo, وآخرون
منشور في: (2025)

DistillSpec: Improving Speculative Decoding via Knowledge Distillation
حسب: Zhou, Yongchao, وآخرون
منشور في: (2023)

Analyzing Similarity Metrics for Data Selection for Language Model Pretraining
حسب: Sam, Dylan, وآخرون
منشور في: (2025)

Loss Landscape Degeneracy and Stagewise Development in Transformers
حسب: Hoogland, Jesse, وآخرون
منشور في: (2024)

Hierarchical Retrieval: The Geometry and a Pretrain-Finetune Recipe
حسب: You, Chong, وآخرون
منشور في: (2025)

Learning to Keep a Promise: Scaling Language Model Decoding Parallelism with Learned Asynchronous Decoding
حسب: Jin, Tian, وآخرون
منشور في: (2025)

Asynchronous Local-SGD Training for Language Modeling
حسب: Liu, Bo, وآخرون
منشور في: (2024)

RNNs are not Transformers (Yet): The Key Bottleneck on In-context Retrieval
حسب: Wen, Kaiyue, وآخرون
منشور في: (2024)

Faster Rates For Federated Variational Inequalities
حسب: Wang, Guanghui, وآخرون
منشور في: (2026)

Representing Rule-based Chatbots with Transformers
حسب: Friedman, Dan, وآخرون
منشور في: (2024)

Trainable Transformer in Transformer
حسب: Panigrahi, Abhishek, وآخرون
منشور في: (2023)

On the SDEs and Scaling Rules for Adaptive Gradient Algorithms
حسب: Malladi, Sadhika, وآخرون
منشور في: (2022)

How Learning Rate Decay Wastes Your Best Data in Curriculum-Based LLM Pretraining
حسب: Luo, Kairong, وآخرون
منشور في: (2025)

HierRouter: Coordinated Routing of Specialized Large Language Models via Reinforcement Learning
حسب: Gupta, Nikunj, وآخرون
منشور في: (2025)

In Good GRACEs: Principled Teacher Selection for Knowledge Distillation
حسب: Panigrahi, Abhishek, وآخرون
منشور في: (2025)

Eager Updates For Overlapped Communication and Computation in DiLoCo
حسب: Kale, Satyen, وآخرون
منشور في: (2025)

Data Mixing Can Induce Phase Transitions in Knowledge Acquisition
حسب: Gu, Xinran, وآخرون
منشور في: (2025)

SPA: A Simple but Tough-to-Beat Baseline for Knowledge Injection
حسب: Tang, Kexian, وآخرون
منشور في: (2026)

Towards Efficient Active Learning in NLP via Pretrained Representations
حسب: Vysogorets, Artem, وآخرون
منشور في: (2024)

The Power of Power Law: Asymmetry Enables Compositional Reasoning
حسب: Wang, Zixuan, وآخرون
منشور في: (2026)

Deep sequence models tend to memorize geometrically; it is unclear why
حسب: Noroozizadeh, Shahriar, وآخرون
منشور في: (2025)

Mimetic Initialization Helps State Space Models Learn to Recall
حسب: Trockman, Asher, وآخرون
منشور في: (2024)

Are More Tokens Rational? Inference-Time Scaling in Language Models as Adaptive Resource Rationality
حسب: Hu, Zhimin, وآخرون
منشور في: (2026)

Where to Begin: Efficient Pretraining via Subnetwork Selection and Distillation
حسب: Krishnakumar, Arjun, وآخرون
منشور في: (2025)

On Importance of Pruning and Distillation for Efficient Low Resource NLP
حسب: Mirashi, Aishwarya, وآخرون
منشور في: (2024)

Group-Level Data Selection for Efficient Pretraining
حسب: Yu, Zichun, وآخرون
منشور في: (2025)

Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs?
حسب: Park, Simon, وآخرون
منشور في: (2025)

Faster Cascades via Speculative Decoding
حسب: Narasimhan, Harikrishna, وآخرون
منشور في: (2024)

Fine-tuning MLLMs Without Forgetting Is Easier Than You Think
حسب: Li, He, وآخرون
منشور في: (2026)

Universal Model Routing for Efficient LLM Inference
حسب: Jitkrittum, Wittawat, وآخرون
منشور في: (2025)

Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates
حسب: Lyu, Kaifeng, وآخرون
منشور في: (2024)