Saved in:
| Main Authors: | Wang, Penghao, Zhou, Yuhao, Wu, Mengxuan, Zhang, Panpan, Wang, Zhangyang, Wang, Kai |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.19266 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
ResearchGPT: Benchmarking and Training LLMs for End-to-End Computer Science Research Workflows
by: Wang, Penghao, et al.
Published: (2025)
by: Wang, Penghao, et al.
Published: (2025)
On-the-Fly Adaptive Distillation of Transformer to Dual-State Linear Attention
by: Ro, Yeonju, et al.
Published: (2025)
by: Ro, Yeonju, et al.
Published: (2025)
InfoMamba: An Attention-Free Hybrid Mamba-Transformer Model
by: Wang, Youjin, et al.
Published: (2026)
by: Wang, Youjin, et al.
Published: (2026)
Class Incremental Fault Diagnosis under Limited Fault Data via Supervised Contrastive Knowledge Distillation
by: Zhang, Hanrong, et al.
Published: (2025)
by: Zhang, Hanrong, et al.
Published: (2025)
Position: Weight Space Should Be a First-Class Generative AI Modality
by: Wang, Zhangyang, et al.
Published: (2026)
by: Wang, Zhangyang, et al.
Published: (2026)
The Effect of Attention Head Count on Transformer Approximation
by: Yu, Penghao, et al.
Published: (2025)
by: Yu, Penghao, et al.
Published: (2025)
Efficient Image Generation with Variadic Attention Heads
by: Walton, Steven, et al.
Published: (2022)
by: Walton, Steven, et al.
Published: (2022)
Why Neural Network Can Discover Symbolic Structures with Gradient-based Training: An Algebraic and Geometric Foundation for Neurosymbolic Reasoning
by: Wang, Peihao, et al.
Published: (2025)
by: Wang, Peihao, et al.
Published: (2025)
Training Dynamics of Transformers to Recognize Word Co-occurrence via Gradient Flow Analysis
by: Yang, Hongru, et al.
Published: (2024)
by: Yang, Hongru, et al.
Published: (2024)
NetMamba: Efficient Network Traffic Classification via Pre-training Unidirectional Mamba
by: Wang, Tongze, et al.
Published: (2024)
by: Wang, Tongze, et al.
Published: (2024)
A Teacher-Free Graph Knowledge Distillation Framework with Dual Self-Distillation
by: Wu, Lirong, et al.
Published: (2024)
by: Wu, Lirong, et al.
Published: (2024)
Ranking and Selection with Simultaneous Input Data Collection
by: Wang, Yuhao, et al.
Published: (2025)
by: Wang, Yuhao, et al.
Published: (2025)
Attention to Mamba: A Recipe for Cross-Architecture Distillation
by: Moudgil, Abhinav, et al.
Published: (2026)
by: Moudgil, Abhinav, et al.
Published: (2026)
Efficient Attention via Pre-Scoring: Prioritizing Informative Keys in Transformers
by: Li, Zhexiang, et al.
Published: (2025)
by: Li, Zhexiang, et al.
Published: (2025)
FHBench: Towards Efficient and Personalized Federated Learning for Multimodal Healthcare
by: Wang, Penghao, et al.
Published: (2025)
by: Wang, Penghao, et al.
Published: (2025)
Attention-Mamba: A Mamba-Enhanced Multi-Scale Parallel Inference Network for Medical Image Segmentation
by: Zhang, Yanhua, et al.
Published: (2024)
by: Zhang, Yanhua, et al.
Published: (2024)
ImputeINR: Time Series Imputation via Implicit Neural Representations for Disease Diagnosis with Missing Data
by: Li, Mengxuan, et al.
Published: (2025)
by: Li, Mengxuan, et al.
Published: (2025)
Forgetting Any Data at Any Time: A Theoretically Certified Unlearning Framework for Vertical Federated Learning
by: Wang, Linian, et al.
Published: (2025)
by: Wang, Linian, et al.
Published: (2025)
The Mamba in the Llama: Distilling and Accelerating Hybrid Models
by: Wang, Junxiong, et al.
Published: (2024)
by: Wang, Junxiong, et al.
Published: (2024)
DKDM: Data-Free Knowledge Distillation for Diffusion Models with Any Architecture
by: Xiang, Qianlong, et al.
Published: (2024)
by: Xiang, Qianlong, et al.
Published: (2024)
Data-Distill-Net: A Data Distillation Approach Tailored for Reply-based Continual Learning
by: Liao, Wenyang, et al.
Published: (2025)
by: Liao, Wenyang, et al.
Published: (2025)
Data-Efficient Symbolic Regression via Foundation Model Distillation
by: Ying, Wangyang, et al.
Published: (2025)
by: Ying, Wangyang, et al.
Published: (2025)
Generalization Error Analysis for Sparse Mixture-of-Experts: A Preliminary Study
by: Zhao, Jinze, et al.
Published: (2024)
by: Zhao, Jinze, et al.
Published: (2024)
PackMamba: Efficient Processing of Variable-Length Sequences in Mamba training
by: Xu, Haoran, et al.
Published: (2024)
by: Xu, Haoran, et al.
Published: (2024)
BitStopper: An Efficient Transformer Attention Accelerator via Stage-fusion and Early Termination
by: Wang, Huizheng, et al.
Published: (2025)
by: Wang, Huizheng, et al.
Published: (2025)
BLADE: Block-Sparse Attention Meets Step Distillation for Efficient Video Generation
by: Gu, Youping, et al.
Published: (2025)
by: Gu, Youping, et al.
Published: (2025)
Schrödinger Bridge Mamba for One-Step Speech Enhancement
by: Yang, Jing, et al.
Published: (2025)
by: Yang, Jing, et al.
Published: (2025)
Hamming Attention Distillation: Binarizing Keys and Queries for Efficient Long-Context Transformers
by: Horton, Mark, et al.
Published: (2025)
by: Horton, Mark, et al.
Published: (2025)
Visual Attention Exploration in Vision-Based Mamba Models
by: Wang, Junpeng, et al.
Published: (2025)
by: Wang, Junpeng, et al.
Published: (2025)
DVMSR: Distillated Vision Mamba for Efficient Super-Resolution
by: Lei, Xiaoyan, et al.
Published: (2024)
by: Lei, Xiaoyan, et al.
Published: (2024)
Make Optimization Once and for All with Fine-grained Guidance
by: Shi, Mingjia, et al.
Published: (2025)
by: Shi, Mingjia, et al.
Published: (2025)
MSECG: Incorporating Mamba for Robust and Efficient ECG Super-Resolution
by: Lin, Jie, et al.
Published: (2024)
by: Lin, Jie, et al.
Published: (2024)
Modeling Cell Dynamics and Interactions with Unbalanced Mean Field Schrödinger Bridge
by: Zhang, Zhenyi, et al.
Published: (2025)
by: Zhang, Zhenyi, et al.
Published: (2025)
VecFormer: Towards Efficient and Generalizable Graph Transformer with Graph Token Attention
by: Zhou, Jingbo, et al.
Published: (2026)
by: Zhou, Jingbo, et al.
Published: (2026)
Principled Architecture-aware Scaling of Hyperparameters
by: Chen, Wuyang, et al.
Published: (2024)
by: Chen, Wuyang, et al.
Published: (2024)
DataDAM: Efficient Dataset Distillation with Attention Matching
by: Sajedi, Ahmad, et al.
Published: (2023)
by: Sajedi, Ahmad, et al.
Published: (2023)
R-Sparse: Rank-Aware Activation Sparsity for Efficient LLM Inference
by: Zhang, Zhenyu, et al.
Published: (2025)
by: Zhang, Zhenyu, et al.
Published: (2025)
Evolving Robustness--Exploration Trade-off in Online Reinforcement Learning via Quantile Bayesian Risk MDPs
by: Song, Meichen, et al.
Published: (2026)
by: Song, Meichen, et al.
Published: (2026)
Score Distillation Beyond Acceleration: Generative Modeling from Corrupted Data
by: Zhang, Yasi, et al.
Published: (2025)
by: Zhang, Yasi, et al.
Published: (2025)
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
by: Zhao, Jiawei, et al.
Published: (2024)
by: Zhao, Jiawei, et al.
Published: (2024)
Similar Items
-
ResearchGPT: Benchmarking and Training LLMs for End-to-End Computer Science Research Workflows
by: Wang, Penghao, et al.
Published: (2025) -
On-the-Fly Adaptive Distillation of Transformer to Dual-State Linear Attention
by: Ro, Yeonju, et al.
Published: (2025) -
InfoMamba: An Attention-Free Hybrid Mamba-Transformer Model
by: Wang, Youjin, et al.
Published: (2026) -
Class Incremental Fault Diagnosis under Limited Fault Data via Supervised Contrastive Knowledge Distillation
by: Zhang, Hanrong, et al.
Published: (2025) -
Position: Weight Space Should Be a First-Class Generative AI Modality
by: Wang, Zhangyang, et al.
Published: (2026)