Saved in:
| Main Authors: | Yang, Songlin, Kautz, Jan, Hatamizadeh, Ali |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2412.06464 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Parallelizing Linear Transformers with the Delta Rule over Sequence Length
by: Yang, Songlin, et al.
Published: (2024)
by: Yang, Songlin, et al.
Published: (2024)
Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention
by: Hatamizadeh, Ali, et al.
Published: (2026)
by: Hatamizadeh, Ali, et al.
Published: (2026)
MambaVision: A Hybrid Mamba-Transformer Vision Backbone
by: Hatamizadeh, Ali, et al.
Published: (2024)
by: Hatamizadeh, Ali, et al.
Published: (2024)
OSDN: Improving Delta Rule with Provable Online Preconditioning in Linear Attention
by: Zhou, Chenyu, et al.
Published: (2026)
by: Zhou, Chenyu, et al.
Published: (2026)
An Empirical Study of Mamba-based Language Models
by: Waleffe, Roger, et al.
Published: (2024)
by: Waleffe, Roger, et al.
Published: (2024)
RLP: Reinforcement as a Pretraining Objective
by: Hatamizadeh, Ali, et al.
Published: (2025)
by: Hatamizadeh, Ali, et al.
Published: (2025)
DiffiT: Diffusion Vision Transformers for Image Generation
by: Hatamizadeh, Ali, et al.
Published: (2023)
by: Hatamizadeh, Ali, et al.
Published: (2023)
Gated Linear Attention Transformers with Hardware-Efficient Training
by: Yang, Songlin, et al.
Published: (2023)
by: Yang, Songlin, et al.
Published: (2023)
DeltaProduct: Improving State-Tracking in Linear RNNs via Householder Products
by: Siems, Julien, et al.
Published: (2025)
by: Siems, Julien, et al.
Published: (2025)
Deep Delta Learning
by: Zhang, Yifan, et al.
Published: (2026)
by: Zhang, Yifan, et al.
Published: (2026)
Delta Knowledge Distillation for Large Language Models
by: Cao, Yihan, et al.
Published: (2025)
by: Cao, Yihan, et al.
Published: (2025)
ViR: Towards Efficient Vision Retention Backbones
by: Hatamizadeh, Ali, et al.
Published: (2023)
by: Hatamizadeh, Ali, et al.
Published: (2023)
BitDelta: Your Fine-Tune May Only Be Worth One Bit
by: Liu, James, et al.
Published: (2024)
by: Liu, James, et al.
Published: (2024)
A Unified View of Delta Parameter Editing in Post-Trained Large-Scale Models
by: Tang, Qiaoyu, et al.
Published: (2024)
by: Tang, Qiaoyu, et al.
Published: (2024)
DARE the Extreme: Revisiting Delta-Parameter Pruning For Fine-Tuned Models
by: Deng, Wenlong, et al.
Published: (2024)
by: Deng, Wenlong, et al.
Published: (2024)
FasterViT: Fast Vision Transformers with Hierarchical Attention
by: Hatamizadeh, Ali, et al.
Published: (2023)
by: Hatamizadeh, Ali, et al.
Published: (2023)
Enhancing Delta Compression in LLMs via SVD-based Quantization Error Minimization
by: Xiong, Boya, et al.
Published: (2025)
by: Xiong, Boya, et al.
Published: (2025)
Delta Activations: A Representation for Finetuned Large Language Models
by: Xu, Zhiqiu, et al.
Published: (2025)
by: Xu, Zhiqiu, et al.
Published: (2025)
RuleR: Improving LLM Controllability by Rule-based Data Recycling
by: Li, Ming, et al.
Published: (2024)
by: Li, Ming, et al.
Published: (2024)
EfficientXpert: Efficient Domain Adaptation for Large Language Models via Propagation-Aware Pruning
by: Zhao, Songlin, et al.
Published: (2025)
by: Zhao, Songlin, et al.
Published: (2025)
AutoRule: Reasoning Chain-of-thought Extracted Rule-based Rewards Improve Preference Learning
by: Wang, Tevin, et al.
Published: (2025)
by: Wang, Tevin, et al.
Published: (2025)
Flextron: Many-in-One Flexible Large Language Model
by: Cai, Ruisi, et al.
Published: (2024)
by: Cai, Ruisi, et al.
Published: (2024)
MambaQuant: Quantizing the Mamba Family with Variance Aligned Rotation Methods
by: Xu, Zukang, et al.
Published: (2025)
by: Xu, Zukang, et al.
Published: (2025)
Representation Learning with Conditional Information Flow Maximization
by: Hu, Dou, et al.
Published: (2024)
by: Hu, Dou, et al.
Published: (2024)
Chain of Attack: a Semantic-Driven Contextual Multi-Turn attacker for LLM
by: Yang, Xikang, et al.
Published: (2024)
by: Yang, Xikang, et al.
Published: (2024)
Differential Mamba
by: Schneider, Nadav, et al.
Published: (2025)
by: Schneider, Nadav, et al.
Published: (2025)
RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling
by: Liu, Yang, et al.
Published: (2025)
by: Liu, Yang, et al.
Published: (2025)
Mamba Knockout for Unraveling Factual Information Flow
by: Endy, Nir, et al.
Published: (2025)
by: Endy, Nir, et al.
Published: (2025)
Structured Probabilistic Coding
by: Hu, Dou, et al.
Published: (2023)
by: Hu, Dou, et al.
Published: (2023)
Rule2Text: Natural Language Explanation of Logical Rules in Knowledge Graphs
by: Shirvani-Mahdavi, Nasim, et al.
Published: (2025)
by: Shirvani-Mahdavi, Nasim, et al.
Published: (2025)
BroRL: Scaling Reinforcement Learning via Broadened Exploration
by: Hu, Jian, et al.
Published: (2025)
by: Hu, Jian, et al.
Published: (2025)
Leveraging Logical Rules in Knowledge Editing: A Cherry on the Top
by: Cheng, Keyuan, et al.
Published: (2024)
by: Cheng, Keyuan, et al.
Published: (2024)
Scaling Stick-Breaking Attention: An Efficient Implementation and In-depth Study
by: Tan, Shawn, et al.
Published: (2024)
by: Tan, Shawn, et al.
Published: (2024)
Exploiting Synergistic Cognitive Biases to Bypass Safety in LLMs
by: Yang, Xikang, et al.
Published: (2025)
by: Yang, Xikang, et al.
Published: (2025)
Masked Gated Linear Unit
by: Tajima, Yukito, et al.
Published: (2025)
by: Tajima, Yukito, et al.
Published: (2025)
Jamba: A Hybrid Transformer-Mamba Language Model
by: Lieber, Opher, et al.
Published: (2024)
by: Lieber, Opher, et al.
Published: (2024)
Lost in State Space: Probing Frozen Mamba Representations
by: Wagh, Bhagyashree, et al.
Published: (2026)
by: Wagh, Bhagyashree, et al.
Published: (2026)
DenseMamba: State Space Models with Dense Hidden Connection for Efficient Large Language Models
by: He, Wei, et al.
Published: (2024)
by: He, Wei, et al.
Published: (2024)
Domain Gating Ensemble Networks for AI-Generated Text Detection
by: Tripathi, Arihant, et al.
Published: (2025)
by: Tripathi, Arihant, et al.
Published: (2025)
MambaByte: Token-free Selective State Space Model
by: Wang, Junxiong, et al.
Published: (2024)
by: Wang, Junxiong, et al.
Published: (2024)
Similar Items
-
Parallelizing Linear Transformers with the Delta Rule over Sequence Length
by: Yang, Songlin, et al.
Published: (2024) -
Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention
by: Hatamizadeh, Ali, et al.
Published: (2026) -
MambaVision: A Hybrid Mamba-Transformer Vision Backbone
by: Hatamizadeh, Ali, et al.
Published: (2024) -
OSDN: Improving Delta Rule with Provable Online Preconditioning in Linear Attention
by: Zhou, Chenyu, et al.
Published: (2026) -
An Empirical Study of Mamba-based Language Models
by: Waleffe, Roger, et al.
Published: (2024)