:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yang, Songlin, Kautz, Jan, Hatamizadeh, Ali
Format:	Preprint
Published:	2024
Subjects:	Computation and Language Machine Learning
Online Access:	https://arxiv.org/abs/2412.06464
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Parallelizing Linear Transformers with the Delta Rule over Sequence Length
by: Yang, Songlin, et al.
Published: (2024)

Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention
by: Hatamizadeh, Ali, et al.
Published: (2026)

MambaVision: A Hybrid Mamba-Transformer Vision Backbone
by: Hatamizadeh, Ali, et al.
Published: (2024)

OSDN: Improving Delta Rule with Provable Online Preconditioning in Linear Attention
by: Zhou, Chenyu, et al.
Published: (2026)

An Empirical Study of Mamba-based Language Models
by: Waleffe, Roger, et al.
Published: (2024)

RLP: Reinforcement as a Pretraining Objective
by: Hatamizadeh, Ali, et al.
Published: (2025)

DiffiT: Diffusion Vision Transformers for Image Generation
by: Hatamizadeh, Ali, et al.
Published: (2023)

Gated Linear Attention Transformers with Hardware-Efficient Training
by: Yang, Songlin, et al.
Published: (2023)

DeltaProduct: Improving State-Tracking in Linear RNNs via Householder Products
by: Siems, Julien, et al.
Published: (2025)

Deep Delta Learning
by: Zhang, Yifan, et al.
Published: (2026)

Delta Knowledge Distillation for Large Language Models
by: Cao, Yihan, et al.
Published: (2025)

ViR: Towards Efficient Vision Retention Backbones
by: Hatamizadeh, Ali, et al.
Published: (2023)

BitDelta: Your Fine-Tune May Only Be Worth One Bit
by: Liu, James, et al.
Published: (2024)

A Unified View of Delta Parameter Editing in Post-Trained Large-Scale Models
by: Tang, Qiaoyu, et al.
Published: (2024)

DARE the Extreme: Revisiting Delta-Parameter Pruning For Fine-Tuned Models
by: Deng, Wenlong, et al.
Published: (2024)

FasterViT: Fast Vision Transformers with Hierarchical Attention
by: Hatamizadeh, Ali, et al.
Published: (2023)

Enhancing Delta Compression in LLMs via SVD-based Quantization Error Minimization
by: Xiong, Boya, et al.
Published: (2025)

Delta Activations: A Representation for Finetuned Large Language Models
by: Xu, Zhiqiu, et al.
Published: (2025)

RuleR: Improving LLM Controllability by Rule-based Data Recycling
by: Li, Ming, et al.
Published: (2024)

EfficientXpert: Efficient Domain Adaptation for Large Language Models via Propagation-Aware Pruning
by: Zhao, Songlin, et al.
Published: (2025)

AutoRule: Reasoning Chain-of-thought Extracted Rule-based Rewards Improve Preference Learning
by: Wang, Tevin, et al.
Published: (2025)

Flextron: Many-in-One Flexible Large Language Model
by: Cai, Ruisi, et al.
Published: (2024)

MambaQuant: Quantizing the Mamba Family with Variance Aligned Rotation Methods
by: Xu, Zukang, et al.
Published: (2025)

Representation Learning with Conditional Information Flow Maximization
by: Hu, Dou, et al.
Published: (2024)

Chain of Attack: a Semantic-Driven Contextual Multi-Turn attacker for LLM
by: Yang, Xikang, et al.
Published: (2024)

Differential Mamba
by: Schneider, Nadav, et al.
Published: (2025)

RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling
by: Liu, Yang, et al.
Published: (2025)

Mamba Knockout for Unraveling Factual Information Flow
by: Endy, Nir, et al.
Published: (2025)

Structured Probabilistic Coding
by: Hu, Dou, et al.
Published: (2023)

Rule2Text: Natural Language Explanation of Logical Rules in Knowledge Graphs
by: Shirvani-Mahdavi, Nasim, et al.
Published: (2025)

BroRL: Scaling Reinforcement Learning via Broadened Exploration
by: Hu, Jian, et al.
Published: (2025)

Leveraging Logical Rules in Knowledge Editing: A Cherry on the Top
by: Cheng, Keyuan, et al.
Published: (2024)

Scaling Stick-Breaking Attention: An Efficient Implementation and In-depth Study
by: Tan, Shawn, et al.
Published: (2024)

Exploiting Synergistic Cognitive Biases to Bypass Safety in LLMs
by: Yang, Xikang, et al.
Published: (2025)

Masked Gated Linear Unit
by: Tajima, Yukito, et al.
Published: (2025)

Jamba: A Hybrid Transformer-Mamba Language Model
by: Lieber, Opher, et al.
Published: (2024)

Lost in State Space: Probing Frozen Mamba Representations
by: Wagh, Bhagyashree, et al.
Published: (2026)

DenseMamba: State Space Models with Dense Hidden Connection for Efficient Large Language Models
by: He, Wei, et al.
Published: (2024)

Domain Gating Ensemble Networks for AI-Generated Text Detection
by: Tripathi, Arihant, et al.
Published: (2025)

MambaByte: Token-free Selective State Space Model
by: Wang, Junxiong, et al.
Published: (2024)