:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Penghao, Zhou, Yuhao, Wu, Mengxuan, Zhang, Panpan, Wang, Zhangyang, Wang, Kai
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2510.19266
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

ResearchGPT: Benchmarking and Training LLMs for End-to-End Computer Science Research Workflows
by: Wang, Penghao, et al.
Published: (2025)

On-the-Fly Adaptive Distillation of Transformer to Dual-State Linear Attention
by: Ro, Yeonju, et al.
Published: (2025)

InfoMamba: An Attention-Free Hybrid Mamba-Transformer Model
by: Wang, Youjin, et al.
Published: (2026)

Class Incremental Fault Diagnosis under Limited Fault Data via Supervised Contrastive Knowledge Distillation
by: Zhang, Hanrong, et al.
Published: (2025)

Position: Weight Space Should Be a First-Class Generative AI Modality
by: Wang, Zhangyang, et al.
Published: (2026)

The Effect of Attention Head Count on Transformer Approximation
by: Yu, Penghao, et al.
Published: (2025)

Efficient Image Generation with Variadic Attention Heads
by: Walton, Steven, et al.
Published: (2022)

Why Neural Network Can Discover Symbolic Structures with Gradient-based Training: An Algebraic and Geometric Foundation for Neurosymbolic Reasoning
by: Wang, Peihao, et al.
Published: (2025)

Training Dynamics of Transformers to Recognize Word Co-occurrence via Gradient Flow Analysis
by: Yang, Hongru, et al.
Published: (2024)

NetMamba: Efficient Network Traffic Classification via Pre-training Unidirectional Mamba
by: Wang, Tongze, et al.
Published: (2024)

A Teacher-Free Graph Knowledge Distillation Framework with Dual Self-Distillation
by: Wu, Lirong, et al.
Published: (2024)

Ranking and Selection with Simultaneous Input Data Collection
by: Wang, Yuhao, et al.
Published: (2025)

Attention to Mamba: A Recipe for Cross-Architecture Distillation
by: Moudgil, Abhinav, et al.
Published: (2026)

Efficient Attention via Pre-Scoring: Prioritizing Informative Keys in Transformers
by: Li, Zhexiang, et al.
Published: (2025)

FHBench: Towards Efficient and Personalized Federated Learning for Multimodal Healthcare
by: Wang, Penghao, et al.
Published: (2025)

Attention-Mamba: A Mamba-Enhanced Multi-Scale Parallel Inference Network for Medical Image Segmentation
by: Zhang, Yanhua, et al.
Published: (2024)

ImputeINR: Time Series Imputation via Implicit Neural Representations for Disease Diagnosis with Missing Data
by: Li, Mengxuan, et al.
Published: (2025)

Forgetting Any Data at Any Time: A Theoretically Certified Unlearning Framework for Vertical Federated Learning
by: Wang, Linian, et al.
Published: (2025)

The Mamba in the Llama: Distilling and Accelerating Hybrid Models
by: Wang, Junxiong, et al.
Published: (2024)

DKDM: Data-Free Knowledge Distillation for Diffusion Models with Any Architecture
by: Xiang, Qianlong, et al.
Published: (2024)

Data-Distill-Net: A Data Distillation Approach Tailored for Reply-based Continual Learning
by: Liao, Wenyang, et al.
Published: (2025)

Data-Efficient Symbolic Regression via Foundation Model Distillation
by: Ying, Wangyang, et al.
Published: (2025)

Generalization Error Analysis for Sparse Mixture-of-Experts: A Preliminary Study
by: Zhao, Jinze, et al.
Published: (2024)

PackMamba: Efficient Processing of Variable-Length Sequences in Mamba training
by: Xu, Haoran, et al.
Published: (2024)

BitStopper: An Efficient Transformer Attention Accelerator via Stage-fusion and Early Termination
by: Wang, Huizheng, et al.
Published: (2025)

BLADE: Block-Sparse Attention Meets Step Distillation for Efficient Video Generation
by: Gu, Youping, et al.
Published: (2025)

Schrödinger Bridge Mamba for One-Step Speech Enhancement
by: Yang, Jing, et al.
Published: (2025)

Hamming Attention Distillation: Binarizing Keys and Queries for Efficient Long-Context Transformers
by: Horton, Mark, et al.
Published: (2025)

Visual Attention Exploration in Vision-Based Mamba Models
by: Wang, Junpeng, et al.
Published: (2025)

DVMSR: Distillated Vision Mamba for Efficient Super-Resolution
by: Lei, Xiaoyan, et al.
Published: (2024)

Make Optimization Once and for All with Fine-grained Guidance
by: Shi, Mingjia, et al.
Published: (2025)

MSECG: Incorporating Mamba for Robust and Efficient ECG Super-Resolution
by: Lin, Jie, et al.
Published: (2024)

Modeling Cell Dynamics and Interactions with Unbalanced Mean Field Schrödinger Bridge
by: Zhang, Zhenyi, et al.
Published: (2025)

VecFormer: Towards Efficient and Generalizable Graph Transformer with Graph Token Attention
by: Zhou, Jingbo, et al.
Published: (2026)

Principled Architecture-aware Scaling of Hyperparameters
by: Chen, Wuyang, et al.
Published: (2024)

DataDAM: Efficient Dataset Distillation with Attention Matching
by: Sajedi, Ahmad, et al.
Published: (2023)

R-Sparse: Rank-Aware Activation Sparsity for Efficient LLM Inference
by: Zhang, Zhenyu, et al.
Published: (2025)

Evolving Robustness--Exploration Trade-off in Online Reinforcement Learning via Quantile Bayesian Risk MDPs
by: Song, Meichen, et al.
Published: (2026)

Score Distillation Beyond Acceleration: Generative Modeling from Corrupted Data
by: Zhang, Yasi, et al.
Published: (2025)

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
by: Zhao, Jiawei, et al.
Published: (2024)