Saved in:
| Main Authors: | Nguyen, John, Wang, Sid, Li, Ke, Wu, Carole-Jean |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2305.15348 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Stable-LoRA: Stabilizing Feature Learning of Low-Rank Adaptation
by: Wu, Yize, et al.
Published: (2026)
by: Wu, Yize, et al.
Published: (2026)
Recurrent Action Transformer with Memory
by: Cherepanov, Egor, et al.
Published: (2023)
by: Cherepanov, Egor, et al.
Published: (2023)
Recurrent Diffusion for Large-Scale Parameter Generation
by: Wang, Kai, et al.
Published: (2025)
by: Wang, Kai, et al.
Published: (2025)
Adaptation and Fine-tuning with TabPFN for Travelling Salesman Problem
by: Vu, Nguyen Gia Hien, et al.
Published: (2025)
by: Vu, Nguyen Gia Hien, et al.
Published: (2025)
Interpreting Affine Recurrence Learning in GPT-style Transformers
by: Bhargav, Samarth, et al.
Published: (2024)
by: Bhargav, Samarth, et al.
Published: (2024)
Understanding Dynamic Compute Allocation in Recurrent Transformers
by: Moosa, Ibraheem Muhammad, et al.
Published: (2026)
by: Moosa, Ibraheem Muhammad, et al.
Published: (2026)
TalkLoRA: Communication-Aware Mixture of Low-Rank Adaptation for Large Language Models
by: Mu, Lin, et al.
Published: (2026)
by: Mu, Lin, et al.
Published: (2026)
RingFormer: Rethinking Recurrent Transformer with Adaptive Level Signals
by: Heo, Jaemu, et al.
Published: (2025)
by: Heo, Jaemu, et al.
Published: (2025)
Renaissance of RNNs in Streaming Clinical Time Series: Compact Recurrence Remains Competitive with Transformers
by: Tong, Ran, et al.
Published: (2025)
by: Tong, Ran, et al.
Published: (2025)
Latent Chain-of-Thought? Decoding the Depth-Recurrent Transformer
by: Lu, Wenquan, et al.
Published: (2025)
by: Lu, Wenquan, et al.
Published: (2025)
Attention Saturation and Gradient Suppression at Inflection Layers: Diagnosing and Mitigating Bottlenecks in Transformer Adaptation
by: Zixian, Wang
Published: (2025)
by: Zixian, Wang
Published: (2025)
CAPS: Unifying Attention, Recurrence, and Alignment in Transformer-based Time Series Forecasting
by: Pati, Viresh, et al.
Published: (2026)
by: Pati, Viresh, et al.
Published: (2026)
Plug-and-Play Transformer Modules for Test-Time Adaptation
by: Chang, Xiangyu, et al.
Published: (2024)
by: Chang, Xiangyu, et al.
Published: (2024)
SeqBattNet: A Discrete-State Physics-Informed Neural Network with Aging Adaptation for Battery Modeling
by: Tran, Khoa, et al.
Published: (2025)
by: Tran, Khoa, et al.
Published: (2025)
CoSA: Compressed Sensing-Based Adaptation of Large Language Models
by: Wei, Songtao, et al.
Published: (2026)
by: Wei, Songtao, et al.
Published: (2026)
Investigating Recurrent Transformers with Dynamic Halt
by: Chowdhury, Jishnu Ray, et al.
Published: (2024)
by: Chowdhury, Jishnu Ray, et al.
Published: (2024)
GPU Memory Requirement Prediction for Deep Learning Task Based on Bidirectional Gated Recurrent Unit Optimization Transformer
by: Wang, Chao, et al.
Published: (2025)
by: Wang, Chao, et al.
Published: (2025)
How Transformers Learn In-Context Recall Tasks? Optimality, Training Dynamics and Generalization
by: Nguyen, Quan, et al.
Published: (2025)
by: Nguyen, Quan, et al.
Published: (2025)
Mixture-of-Subspaces in Low-Rank Adaptation
by: Wu, Taiqiang, et al.
Published: (2024)
by: Wu, Taiqiang, et al.
Published: (2024)
Decision Transformer vs. Decision Mamba: Analysing the Complexity of Sequential Decision Making in Atari Games
by: Yan, Ke
Published: (2024)
by: Yan, Ke
Published: (2024)
LlamaRL: A Distributed Asynchronous Reinforcement Learning Framework for Efficient Large-scale LLM Training
by: Wu, Bo, et al.
Published: (2025)
by: Wu, Bo, et al.
Published: (2025)
Train at Moving Edge: Online-Verified Prompt Selection for Efficient RL Training of Large Reasoning Model
by: Wu, Jiahao, et al.
Published: (2026)
by: Wu, Jiahao, et al.
Published: (2026)
PETAH: Parameter Efficient Task Adaptation for Hybrid Transformers in a resource-limited Context
by: Augustin, Maximilian, et al.
Published: (2024)
by: Augustin, Maximilian, et al.
Published: (2024)
CorDA: Context-Oriented Decomposition Adaptation of Large Language Models for Task-Aware Parameter-Efficient Fine-tuning
by: Yang, Yibo, et al.
Published: (2024)
by: Yang, Yibo, et al.
Published: (2024)
LoCA: Location-Aware Cosine Adaptation for Parameter-Efficient Fine-Tuning
by: Du, Zhekai, et al.
Published: (2025)
by: Du, Zhekai, et al.
Published: (2025)
Neural Organ Transplantation (NOT): Checkpoint-Based Modular Adaptation for Transformer Models
by: Al-Zuraiqi, Ahmad
Published: (2026)
by: Al-Zuraiqi, Ahmad
Published: (2026)
Thinking Deeper, Not Longer: Depth-Recurrent Transformers for Compositional Generalization
by: Chen, Hung-Hsuan
Published: (2026)
by: Chen, Hung-Hsuan
Published: (2026)
Loop, Think, & Generalize: Implicit Reasoning in Recurrent-Depth Transformers
by: Kohli, Harsh, et al.
Published: (2026)
by: Kohli, Harsh, et al.
Published: (2026)
Recurrent Stochastic Configuration Networks with Incremental Blocks
by: Dang, Gang, et al.
Published: (2024)
by: Dang, Gang, et al.
Published: (2024)
Block-Recurrent Dynamics in Vision Transformers
by: Jacobs, Mozes, et al.
Published: (2025)
by: Jacobs, Mozes, et al.
Published: (2025)
PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models
by: Meng, Fanxu, et al.
Published: (2024)
by: Meng, Fanxu, et al.
Published: (2024)
Gate Recurrent Unit for Efficient Industrial Gas Identification
by: Wang, Ding
Published: (2024)
by: Wang, Ding
Published: (2024)
Associative Recurrent Memory Transformer
by: Rodkin, Ivan, et al.
Published: (2024)
by: Rodkin, Ivan, et al.
Published: (2024)
Amortized Planning with Large-Scale Transformers: A Case Study on Chess
by: Ruoss, Anian, et al.
Published: (2024)
by: Ruoss, Anian, et al.
Published: (2024)
Collaborate to Adapt: Source-Free Graph Domain Adaptation via Bi-directional Adaptation
by: Zhang, Zhen, et al.
Published: (2024)
by: Zhang, Zhen, et al.
Published: (2024)
Outlier-Efficient Hopfield Layers for Large Transformer-Based Models
by: Hu, Jerry Yao-Chieh, et al.
Published: (2024)
by: Hu, Jerry Yao-Chieh, et al.
Published: (2024)
Inverse Approximation Theory for Nonlinear Recurrent Neural Networks
by: Wang, Shida, et al.
Published: (2023)
by: Wang, Shida, et al.
Published: (2023)
Quagmires in SFT-RL Post-Training: When High SFT Scores Mislead and What to Use Instead
by: Kang, Feiyang, et al.
Published: (2025)
by: Kang, Feiyang, et al.
Published: (2025)
Bridging Source and Target Domains via Link Prediction for Unsupervised Domain Adaptation on Graphs
by: Wang, Yilong, et al.
Published: (2025)
by: Wang, Yilong, et al.
Published: (2025)
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
by: Botev, Aleksandar, et al.
Published: (2024)
by: Botev, Aleksandar, et al.
Published: (2024)
Similar Items
-
Stable-LoRA: Stabilizing Feature Learning of Low-Rank Adaptation
by: Wu, Yize, et al.
Published: (2026) -
Recurrent Action Transformer with Memory
by: Cherepanov, Egor, et al.
Published: (2023) -
Recurrent Diffusion for Large-Scale Parameter Generation
by: Wang, Kai, et al.
Published: (2025) -
Adaptation and Fine-tuning with TabPFN for Travelling Salesman Problem
by: Vu, Nguyen Gia Hien, et al.
Published: (2025) -
Interpreting Affine Recurrence Learning in GPT-style Transformers
by: Bhargav, Samarth, et al.
Published: (2024)