Saved in:
| Main Authors: | Li, Shanda, You, Chong, Guruganesh, Guru, Ainslie, Joshua, Ontanon, Santiago, Zaheer, Manzil, Sanghai, Sumit, Yang, Yiming, Kumar, Sanjiv, Bhojanapalli, Srinadh |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2310.04418 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Scalable In-context Ranking with Generative Models
by: Gupta, Nilesh, et al.
Published: (2025)
by: Gupta, Nilesh, et al.
Published: (2025)
Position Coupling: Improving Length Generalization of Arithmetic Transformers Using Task Structure
by: Cho, Hanseul, et al.
Published: (2024)
by: Cho, Hanseul, et al.
Published: (2024)
HiRE: High Recall Approximate Top-$k$ Estimation for Efficient LLM Inference
by: L, Yashas Samaga B, et al.
Published: (2024)
by: L, Yashas Samaga B, et al.
Published: (2024)
On student-teacher deviations in distillation: does it pay to disobey?
by: Nagarajan, Vaishnavh, et al.
Published: (2023)
by: Nagarajan, Vaishnavh, et al.
Published: (2023)
Mimetic Initialization Helps State Space Models Learn to Recall
by: Trockman, Asher, et al.
Published: (2024)
by: Trockman, Asher, et al.
Published: (2024)
Arithmetic Transformers Can Length-Generalize in Both Operand Length and Count
by: Cho, Hanseul, et al.
Published: (2024)
by: Cho, Hanseul, et al.
Published: (2024)
Autoregressive Ranking: Bridging the Gap Between Dual and Cross Encoders
by: Rozonoyer, Benjamin, et al.
Published: (2026)
by: Rozonoyer, Benjamin, et al.
Published: (2026)
Differentially Private Model Merging
by: Yin, Qichuan, et al.
Published: (2026)
by: Yin, Qichuan, et al.
Published: (2026)
Efficient Language Model Architectures for Differentially Private Federated Learning
by: Ro, Jae Hun, et al.
Published: (2024)
by: Ro, Jae Hun, et al.
Published: (2024)
Dual-Encoders for Extreme Multi-Label Classification
by: Gupta, Nilesh, et al.
Published: (2023)
by: Gupta, Nilesh, et al.
Published: (2023)
Efficient Distributed Optimization under Heavy-Tailed Noise
by: Lee, Su Hyeong, et al.
Published: (2025)
by: Lee, Su Hyeong, et al.
Published: (2025)
A Statistical Framework for Data-dependent Retrieval-Augmented Models
by: Basu, Soumya, et al.
Published: (2024)
by: Basu, Soumya, et al.
Published: (2024)
Learning a Fourier Transform for Linear Relative Positional Encodings in Transformers
by: Choromanski, Krzysztof Marcin, et al.
Published: (2023)
by: Choromanski, Krzysztof Marcin, et al.
Published: (2023)
Federation over Text: Insight Sharing for Multi-Agent Reasoning
by: Yao, Dixi, et al.
Published: (2026)
by: Yao, Dixi, et al.
Published: (2026)
Spark Transformer: Reactivating Sparsity in FFN and Attention
by: You, Chong, et al.
Published: (2025)
by: You, Chong, et al.
Published: (2025)
Advances in Transformers for Robotic Applications: A Review
by: Sanghai, Nikunj, et al.
Published: (2024)
by: Sanghai, Nikunj, et al.
Published: (2024)
A Fresh Take on Stale Embeddings: Improving Dense Retriever Training with Corrector Networks
by: Monath, Nicholas, et al.
Published: (2024)
by: Monath, Nicholas, et al.
Published: (2024)
Efficient Adaptive Federated Optimization
by: Lee, Su Hyeong, et al.
Published: (2024)
by: Lee, Su Hyeong, et al.
Published: (2024)
Contracting with a Learning Agent
by: Guruganesh, Guru, et al.
Published: (2024)
by: Guruganesh, Guru, et al.
Published: (2024)
Cite Pretrain: Retrieval-Free Knowledge Attribution for Large Language Models
by: Huang, Yukun, et al.
Published: (2025)
by: Huang, Yukun, et al.
Published: (2025)
Maximal Update Parametrization and Zero-Shot Hyperparameter Transfer for Fourier Neural Operators
by: Li, Shanda, et al.
Published: (2025)
by: Li, Shanda, et al.
Published: (2025)
Adaptive Retrieval and Scalable Indexing for k-NN Search with Cross-Encoders
by: Yadav, Nishant, et al.
Published: (2024)
by: Yadav, Nishant, et al.
Published: (2024)
Integrating Planning into Single-Turn Long-Form Text Generation
by: Liang, Yi, et al.
Published: (2024)
by: Liang, Yi, et al.
Published: (2024)
Some Series Related to Extended Riemann Hypothesis for Dedekind Zeta Functions
by: Zaheer, Muhammad Atif
Published: (2025)
by: Zaheer, Muhammad Atif
Published: (2025)
Understanding Outer Optimizers in Local SGD: Learning Rates, Momentum, and Acceleration
by: Khaled, Ahmed, et al.
Published: (2025)
by: Khaled, Ahmed, et al.
Published: (2025)
ExeDec: Execution Decomposition for Compositional Generalization in Neural Program Synthesis
by: Shi, Kensen, et al.
Published: (2023)
by: Shi, Kensen, et al.
Published: (2023)
Asynchronous Heavy-Tailed Optimization
by: Sun, Junfei, et al.
Published: (2026)
by: Sun, Junfei, et al.
Published: (2026)
An Improved Collecting Bottle
by: Ainslie, C. N. (Charles Nicholas)
Published: (1915)
by: Ainslie, C. N. (Charles Nicholas)
Published: (1915)
HLA‐DPA1*01:228 is the First DPA1 Allele Described With an Alanine at Position 42 of Alpha‐1 Domain
by: Luis Alberto Marin Rubio, et al.
Published: (2025)
by: Luis Alberto Marin Rubio, et al.
Published: (2025)
HLA‐DRB1*08:130 Is the First DRB1 Allele Described With a Leucine at Position 64 of Beta‐1 Domain
by: Luis Alberto Marin Rubio, et al.
Published: (2025)
by: Luis Alberto Marin Rubio, et al.
Published: (2025)
CO-Bench: Benchmarking Language Model Agents in Algorithm Search for Combinatorial Optimization
by: Sun, Weiwei, et al.
Published: (2025)
by: Sun, Weiwei, et al.
Published: (2025)
Corpo e arte: uma proposta pedagógica na Educação Física a partir da bola de equilíbrio circense
by: Teresa Ontañón Barragán
Published: (2019)
by: Teresa Ontañón Barragán
Published: (2019)
APRENDENDO A ENSINAR CIRCO: A CURRICULARIZAÇÃO DA EXTENSÃO UNIVERSITÁRIA E SEUS IMPACTOS NA FORMAÇÃO DOS DISCENTES
by: Teresa Ontañón Barragán
Published: (2023)
by: Teresa Ontañón Barragán
Published: (2023)
CIMemories: A Compositional Benchmark for Contextual Integrity of Persistent Memory in LLMs
by: Mireshghallah, Niloofar, et al.
Published: (2025)
by: Mireshghallah, Niloofar, et al.
Published: (2025)
Analysis of Plan-based Retrieval for Grounded Text Generation
by: Godbole, Ameya, et al.
Published: (2024)
by: Godbole, Ameya, et al.
Published: (2024)
Deep Reinforcement Learning for Sequential Combinatorial Auctions
by: Ravindranath, Sai Srivatsa, et al.
Published: (2024)
by: Ravindranath, Sai Srivatsa, et al.
Published: (2024)
kylieainslie/mitey: mitey 0.3.1
by: Kylie Ainslie
Published: (2026)
by: Kylie Ainslie
Published: (2026)
India : historia del subcontinente desde las culturas del indo hasta el comienzo del dominio inglés / Ainslie T. Embree y Friedrich Wilhelm; traductores Antón Dieterich, María Isabel Carrillo
by: Embree, Ainslie
by: Embree, Ainslie
Inadequate housing is not neglect: How the family regulation system punishes parents for a housing crisis out of their control
by: Ainslie Martin
Published: (2025)
by: Ainslie Martin
Published: (2025)
La enfermedad meningocócica en España, 1990-1997. Cambio en su patrón epidemiológico
by: Salvador de Mateo Ontañón
Published: (2000)
by: Salvador de Mateo Ontañón
Published: (2000)
Similar Items
-
Scalable In-context Ranking with Generative Models
by: Gupta, Nilesh, et al.
Published: (2025) -
Position Coupling: Improving Length Generalization of Arithmetic Transformers Using Task Structure
by: Cho, Hanseul, et al.
Published: (2024) -
HiRE: High Recall Approximate Top-$k$ Estimation for Efficient LLM Inference
by: L, Yashas Samaga B, et al.
Published: (2024) -
On student-teacher deviations in distillation: does it pay to disobey?
by: Nagarajan, Vaishnavh, et al.
Published: (2023) -
Mimetic Initialization Helps State Space Models Learn to Recall
by: Trockman, Asher, et al.
Published: (2024)