:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Hao, Yongchang, Zhai, Mengyao, Hajimirsadeghi, Hossein, Hosseini, Sepidehsadat, Tung, Frederick
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2503.10571
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Prompting-based Temporal Domain Generalization
by: Hosseini, Sepidehsadat, et al.
Published: (2023)

FairNVT: Improving Fairness via Noise Injection in Vision Transformers
by: Tang, Qiaoyue, et al.
Published: (2026)

You Need Reasoning to Learn Reasoning: The Limitations of Label-Free RL in Weak Base Models
by: Roy, Shuvendu, et al.
Published: (2025)

Tree Cross Attention
by: Feng, Leo, et al.
Published: (2023)

Were RNNs All We Needed?
by: Feng, Leo, et al.
Published: (2024)

Memory Efficient Neural Processes via Constant Memory Attention Block
by: Feng, Leo, et al.
Published: (2023)

Attention as an RNN
by: Feng, Leo, et al.
Published: (2024)

Cactus: Accelerating Auto-Regressive Decoding with Constrained Acceptance Speculative Sampling
by: Hao, Yongchang, et al.
Published: (2026)

TabReason: A Reinforcement Learning-Enhanced Reasoning LLM for Explainable Tabular Data Prediction
by: Xu, Tommy, et al.
Published: (2025)

Forget Sharpness: Perturbed Forgetting of Model Biases Within SAM Dynamics
by: Vani, Ankit, et al.
Published: (2024)

ContextPilot: Fast Long-Context Inference via Context Reuse
by: Jiang, Yinsicheng, et al.
Published: (2025)

Flora: Low-Rank Adapters Are Secretly Gradient Compressors
by: Hao, Yongchang, et al.
Published: (2024)

NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks
by: Hao, Yongchang, et al.
Published: (2024)

SPINT: Spatial Permutation-Invariant Neural Transformer for Consistent Intracortical Motor Decoding
by: Le, Trung, et al.
Published: (2025)

Ginger: An Efficient Curvature Approximation with Linear Complexity for General Neural Networks
by: Hao, Yongchang, et al.
Published: (2024)

DEL: Context-Aware Dynamic Exit Layer for Efficient Self-Speculative Decoding
by: Zarch, Hossein Entezari, et al.
Published: (2025)

Revolutionizing Traffic Management with AI-Powered Machine Vision: A Step Toward Smart Cities
by: DolatAbadi, Seyed Hossein Hosseini, et al.
Published: (2025)

Core Context Aware Transformers for Long Context Language Modeling
by: Chen, Yaofo, et al.
Published: (2024)

CADET: Context-Conditioned Ads CTR Prediction With a Decoder-Only Transformer
by: Pardoe, David, et al.
Published: (2026)

LongSpec: Long-Context Lossless Speculative Decoding with Efficient Drafting and Verification
by: Yang, Penghui, et al.
Published: (2025)

Efficient Solutions For An Intriguing Failure of LLMs: Long Context Window Does Not Mean LLMs Can Analyze Long Sequences Flawlessly
by: Hosseini, Peyman, et al.
Published: (2024)

FastKV: Decoupling of Context Reduction and KV Cache Compression for Prefill-Decoding Acceleration
by: Jo, Dongwon, et al.
Published: (2025)

DELTA: Dynamic Layer-Aware Token Attention for Efficient Long-Context Reasoning
by: Zarch, Hossein Entezari, et al.
Published: (2025)

AnyLoss: Transforming Classification Metrics into Loss Functions
by: Han, Doheon, et al.
Published: (2024)

Functional Interpolation for Relative Positions Improves Long Context Transformers
by: Li, Shanda, et al.
Published: (2023)

Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
by: Shyam, Vasudev, et al.
Published: (2024)

SpecPV: Improving Self-Speculative Decoding for Long-Context Generation via Partial Verification
by: Tan, Zhendong, et al.
Published: (2025)

NExT-GPT: Any-to-Any Multimodal LLM
by: Wu, Shengqiong, et al.
Published: (2023)

Scaling Limits of Long-Context Transformers
by: Bruno, Giuseppe, et al.
Published: (2026)

Decoder-Hybrid-Decoder Architecture for Efficient Reasoning with Long Generation
by: Ren, Liliang, et al.
Published: (2025)

Reviving Any-Subset Autoregressive Models with Principled Parallel Sampling and Speculative Decoding
by: Guo, Gabe, et al.
Published: (2025)

Timer-XL: Long-Context Transformers for Unified Time Series Forecasting
by: Liu, Yong, et al.
Published: (2024)

Variational Linear Attention: Stable Associative Memory for Long-Context Transformers
by: Pandey, Vishal, et al.
Published: (2026)

Attention in Constant Time: Vashista Sparse Attention for Long-Context Decoding with Exponential Guarantees
by: Nobaub, Vashista
Published: (2026)

Fast Inference with Kronecker-Sparse Matrices
by: Gonon, Antoine, et al.
Published: (2024)

Compress, Gather, and Recompute: REFORMing Long-Context Processing in Transformers
by: Song, Woomin, et al.
Published: (2025)

Short Data, Long Context: Distilling Positional Knowledge in Transformers
by: Huber, Patrick, et al.
Published: (2026)

Data Efficient Any Transformer-to-Mamba Distillation via Attention Bridge
by: Wang, Penghao, et al.
Published: (2025)

Fast Inference via Hierarchical Speculative Decoding
by: Mohri, Clara, et al.
Published: (2025)

Remote Sensing-Based Assessment of Economic Development
by: Pan, Yijian, et al.
Published: (2024)