:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Nguyen, John, Wang, Sid, Li, Ke, Wu, Carole-Jean
Format:	Preprint
Published:	2023
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2305.15348
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Stable-LoRA: Stabilizing Feature Learning of Low-Rank Adaptation
by: Wu, Yize, et al.
Published: (2026)

Recurrent Action Transformer with Memory
by: Cherepanov, Egor, et al.
Published: (2023)

Recurrent Diffusion for Large-Scale Parameter Generation
by: Wang, Kai, et al.
Published: (2025)

Adaptation and Fine-tuning with TabPFN for Travelling Salesman Problem
by: Vu, Nguyen Gia Hien, et al.
Published: (2025)

Interpreting Affine Recurrence Learning in GPT-style Transformers
by: Bhargav, Samarth, et al.
Published: (2024)

Understanding Dynamic Compute Allocation in Recurrent Transformers
by: Moosa, Ibraheem Muhammad, et al.
Published: (2026)

TalkLoRA: Communication-Aware Mixture of Low-Rank Adaptation for Large Language Models
by: Mu, Lin, et al.
Published: (2026)

RingFormer: Rethinking Recurrent Transformer with Adaptive Level Signals
by: Heo, Jaemu, et al.
Published: (2025)

Renaissance of RNNs in Streaming Clinical Time Series: Compact Recurrence Remains Competitive with Transformers
by: Tong, Ran, et al.
Published: (2025)

Latent Chain-of-Thought? Decoding the Depth-Recurrent Transformer
by: Lu, Wenquan, et al.
Published: (2025)

Attention Saturation and Gradient Suppression at Inflection Layers: Diagnosing and Mitigating Bottlenecks in Transformer Adaptation
by: Zixian, Wang
Published: (2025)

CAPS: Unifying Attention, Recurrence, and Alignment in Transformer-based Time Series Forecasting
by: Pati, Viresh, et al.
Published: (2026)

Plug-and-Play Transformer Modules for Test-Time Adaptation
by: Chang, Xiangyu, et al.
Published: (2024)

SeqBattNet: A Discrete-State Physics-Informed Neural Network with Aging Adaptation for Battery Modeling
by: Tran, Khoa, et al.
Published: (2025)

CoSA: Compressed Sensing-Based Adaptation of Large Language Models
by: Wei, Songtao, et al.
Published: (2026)

Investigating Recurrent Transformers with Dynamic Halt
by: Chowdhury, Jishnu Ray, et al.
Published: (2024)

GPU Memory Requirement Prediction for Deep Learning Task Based on Bidirectional Gated Recurrent Unit Optimization Transformer
by: Wang, Chao, et al.
Published: (2025)

How Transformers Learn In-Context Recall Tasks? Optimality, Training Dynamics and Generalization
by: Nguyen, Quan, et al.
Published: (2025)

Mixture-of-Subspaces in Low-Rank Adaptation
by: Wu, Taiqiang, et al.
Published: (2024)

Decision Transformer vs. Decision Mamba: Analysing the Complexity of Sequential Decision Making in Atari Games
by: Yan, Ke
Published: (2024)

LlamaRL: A Distributed Asynchronous Reinforcement Learning Framework for Efficient Large-scale LLM Training
by: Wu, Bo, et al.
Published: (2025)

Train at Moving Edge: Online-Verified Prompt Selection for Efficient RL Training of Large Reasoning Model
by: Wu, Jiahao, et al.
Published: (2026)

PETAH: Parameter Efficient Task Adaptation for Hybrid Transformers in a resource-limited Context
by: Augustin, Maximilian, et al.
Published: (2024)

CorDA: Context-Oriented Decomposition Adaptation of Large Language Models for Task-Aware Parameter-Efficient Fine-tuning
by: Yang, Yibo, et al.
Published: (2024)

LoCA: Location-Aware Cosine Adaptation for Parameter-Efficient Fine-Tuning
by: Du, Zhekai, et al.
Published: (2025)

Neural Organ Transplantation (NOT): Checkpoint-Based Modular Adaptation for Transformer Models
by: Al-Zuraiqi, Ahmad
Published: (2026)

Thinking Deeper, Not Longer: Depth-Recurrent Transformers for Compositional Generalization
by: Chen, Hung-Hsuan
Published: (2026)

Loop, Think, & Generalize: Implicit Reasoning in Recurrent-Depth Transformers
by: Kohli, Harsh, et al.
Published: (2026)

Recurrent Stochastic Configuration Networks with Incremental Blocks
by: Dang, Gang, et al.
Published: (2024)

Block-Recurrent Dynamics in Vision Transformers
by: Jacobs, Mozes, et al.
Published: (2025)

PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models
by: Meng, Fanxu, et al.
Published: (2024)

Gate Recurrent Unit for Efficient Industrial Gas Identification
by: Wang, Ding
Published: (2024)

Associative Recurrent Memory Transformer
by: Rodkin, Ivan, et al.
Published: (2024)

Amortized Planning with Large-Scale Transformers: A Case Study on Chess
by: Ruoss, Anian, et al.
Published: (2024)

Collaborate to Adapt: Source-Free Graph Domain Adaptation via Bi-directional Adaptation
by: Zhang, Zhen, et al.
Published: (2024)

Outlier-Efficient Hopfield Layers for Large Transformer-Based Models
by: Hu, Jerry Yao-Chieh, et al.
Published: (2024)

Inverse Approximation Theory for Nonlinear Recurrent Neural Networks
by: Wang, Shida, et al.
Published: (2023)

Quagmires in SFT-RL Post-Training: When High SFT Scores Mislead and What to Use Instead
by: Kang, Feiyang, et al.
Published: (2025)

Bridging Source and Target Domains via Link Prediction for Unsupervised Domain Adaptation on Graphs
by: Wang, Yilong, et al.
Published: (2025)

RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
by: Botev, Aleksandar, et al.
Published: (2024)