Saved in:
| Main Authors: | Wang, Qinsi, Ye, Hancheng, Chung, Ming-Yu, Liu, Yudong, Lin, Yueqian, Kuo, Martin, Ma, Mingyuan, Zhang, Jianyi, Chen, Yiran |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.19235 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
CoreInfer: Accelerating Large Language Model Inference with Semantics-Inspired Adaptive Sparse Activation
by: Wang, Qinsi, et al.
Published: (2024)
by: Wang, Qinsi, et al.
Published: (2024)
Keyframe-oriented Vision Token Pruning: Enhancing Efficiency of Large Vision Language Models on Long-Form Video Processing
by: Liu, Yudong, et al.
Published: (2025)
by: Liu, Yudong, et al.
Published: (2025)
KVCOMM: Online Cross-context KV-cache Communication for Efficient LLM-based Multi-agent Systems
by: Ye, Hancheng, et al.
Published: (2025)
by: Ye, Hancheng, et al.
Published: (2025)
SpeechPrune: Context-aware Token Pruning for Speech Information Retrieval
by: Lin, Yueqian, et al.
Published: (2024)
by: Lin, Yueqian, et al.
Published: (2024)
HippoMM: Hippocampal-inspired Multimodal Memory for Long Audiovisual Event Understanding
by: Lin, Yueqian, et al.
Published: (2025)
by: Lin, Yueqian, et al.
Published: (2025)
Angles Don't Lie: Unlocking Training-Efficient RL Through the Model's Own Signals
by: Wang, Qinsi, et al.
Published: (2025)
by: Wang, Qinsi, et al.
Published: (2025)
FlashFPS: Efficient Farthest Point Sampling for Large-Scale Point Clouds via Pruning and Caching
by: Fu, Yuzhe, et al.
Published: (2026)
by: Fu, Yuzhe, et al.
Published: (2026)
ZEUS: Accelerating Diffusion Models with Only Second-Order Predictor
by: Wang, Yixiao, et al.
Published: (2026)
by: Wang, Yixiao, et al.
Published: (2026)
T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning
by: Wang, Qinsi, et al.
Published: (2026)
by: Wang, Qinsi, et al.
Published: (2026)
LLaViDA: A Large Language Vision Driving Assistant for Explicit Reasoning and Enhanced Trajectory Planning
by: Liu, Yudong, et al.
Published: (2025)
by: Liu, Yudong, et al.
Published: (2025)
FlashSVD: Memory-Efficient Inference with Streaming for Low-Rank Models
by: Shao, Zishan, et al.
Published: (2025)
by: Shao, Zishan, et al.
Published: (2025)
Latent Bridge: Feature Delta Prediction for Efficient Dual-System Vision-Language-Action Model Inference
by: Liu, Yudong, et al.
Published: (2026)
by: Liu, Yudong, et al.
Published: (2026)
Voice Evaluation of Reasoning Ability: Diagnosing the Modality-Induced Performance Gap
by: Lin, Yueqian, et al.
Published: (2025)
by: Lin, Yueqian, et al.
Published: (2025)
H-CoT: Hijacking the Chain-of-Thought Safety Reasoning Mechanism to Jailbreak Large Reasoning Models, Including OpenAI o1/o3, DeepSeek-R1, and Gemini 2.0 Flash Thinking
by: Kuo, Martin, et al.
Published: (2025)
by: Kuo, Martin, et al.
Published: (2025)
Model Reprogramming Demystified: A Neural Tangent Kernel Perspective
by: Chung, Ming-Yu, et al.
Published: (2025)
by: Chung, Ming-Yu, et al.
Published: (2025)
Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play
by: Wang, Qinsi, et al.
Published: (2025)
by: Wang, Qinsi, et al.
Published: (2025)
AsyncVoice Agent: Real-Time Explanation for LLM Planning and Reasoning
by: Lin, Yueqian, et al.
Published: (2025)
by: Lin, Yueqian, et al.
Published: (2025)
SADA: Stability-guided Adaptive Diffusion Acceleration
by: Jiang, Ting, et al.
Published: (2025)
by: Jiang, Ting, et al.
Published: (2025)
MARS: Efficient, Adaptive Co-Scheduling for Heterogeneous Agentic Systems
by: Wang, Yifei, et al.
Published: (2026)
by: Wang, Yifei, et al.
Published: (2026)
Optimal Parameter and Neuron Pruning for Out-of-Distribution Detection
by: Chen, Chao, et al.
Published: (2024)
by: Chen, Chao, et al.
Published: (2024)
DA-Cramming: Enhancing Cost-Effective Language Model Pretraining with Dependency Agreement Integration
by: Kuo, Martin, et al.
Published: (2023)
by: Kuo, Martin, et al.
Published: (2023)
MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer
by: Cao, Jianjian, et al.
Published: (2024)
by: Cao, Jianjian, et al.
Published: (2024)
Structural Pruning of Large Vision Language Models: A Comprehensive Study on Pruning Dynamics, Recovery, and Data Efficiency
by: Huang, Yiran, et al.
Published: (2026)
by: Huang, Yiran, et al.
Published: (2026)
Attention Debiasing for Token Pruning in Vision Language Models
by: Zhao, Kai, et al.
Published: (2025)
by: Zhao, Kai, et al.
Published: (2025)
RedVTP: Training-Free Acceleration of Diffusion Vision-Language Models Inference via Masked Token-Guided Visual Token Pruning
by: Xu, Jingqi, et al.
Published: (2025)
by: Xu, Jingqi, et al.
Published: (2025)
Towards Joint Quantization and Token Pruning of Vision-Language Models
by: Li, Xinqing, et al.
Published: (2026)
by: Li, Xinqing, et al.
Published: (2026)
Focus: A Streaming Concentration Architecture for Efficient Vision-Language Models
by: Wei, Chiyue, et al.
Published: (2025)
by: Wei, Chiyue, et al.
Published: (2025)
Query-Conditioned Evidential Keyframe Sampling for MLLM-Based Long-Form Video Understanding
by: Wang, Yiheng, et al.
Published: (2026)
by: Wang, Yiheng, et al.
Published: (2026)
Balanced Token Pruning: Accelerating Vision Language Models Beyond Local Optimization
by: Li, Kaiyuan, et al.
Published: (2025)
by: Li, Kaiyuan, et al.
Published: (2025)
Feather the Throttle: Revisiting Visual Token Pruning for Vision-Language Model Acceleration
by: Endo, Mark, et al.
Published: (2024)
by: Endo, Mark, et al.
Published: (2024)
FastAV: Efficient Token Pruning for Audio-Visual Large Language Model Inference
by: Jung, Chaeyoung, et al.
Published: (2026)
by: Jung, Chaeyoung, et al.
Published: (2026)
AsymVLM: Asymmetric Token Pruning for Efficient Vision-Language Model Inference
by: Feng, Yilin, et al.
Published: (2026)
by: Feng, Yilin, et al.
Published: (2026)
SlimInfer: Accelerating Long-Context LLM Inference via Dynamic Token Pruning
by: Long, Lingkun, et al.
Published: (2025)
by: Long, Lingkun, et al.
Published: (2025)
Window-Diffusion: Accelerating Diffusion Language Model Inference with Windowed Token Pruning and Caching
by: Zuo, Fengrui, et al.
Published: (2026)
by: Zuo, Fengrui, et al.
Published: (2026)
GenAI at the Edge: Comprehensive Survey on Empowering Edge Devices
by: Navardi, Mozhgan, et al.
Published: (2025)
by: Navardi, Mozhgan, et al.
Published: (2025)
IoT-MCP: Bridging LLMs and IoT Systems Through Model Context Protocol
by: Yang, Ningyuan, et al.
Published: (2025)
by: Yang, Ningyuan, et al.
Published: (2025)
SD-NAE: Generating Natural Adversarial Examples with Stable Diffusion
by: Lin, Yueqian, et al.
Published: (2023)
by: Lin, Yueqian, et al.
Published: (2023)
Exploring Vision Neural Network Pruning via Screening Methodology
by: Wang, Mingyuan, et al.
Published: (2025)
by: Wang, Mingyuan, et al.
Published: (2025)
Sparse ActionGen: Accelerating Diffusion Policy with Real-time Pruning
by: Ji, Kangye, et al.
Published: (2026)
by: Ji, Kangye, et al.
Published: (2026)
FlashSVD v1.5: Making Low-Rank Transformers Inference Actually Fast
by: Wu, Wenhao, et al.
Published: (2026)
by: Wu, Wenhao, et al.
Published: (2026)
Similar Items
-
CoreInfer: Accelerating Large Language Model Inference with Semantics-Inspired Adaptive Sparse Activation
by: Wang, Qinsi, et al.
Published: (2024) -
Keyframe-oriented Vision Token Pruning: Enhancing Efficiency of Large Vision Language Models on Long-Form Video Processing
by: Liu, Yudong, et al.
Published: (2025) -
KVCOMM: Online Cross-context KV-cache Communication for Efficient LLM-based Multi-agent Systems
by: Ye, Hancheng, et al.
Published: (2025) -
SpeechPrune: Context-aware Token Pruning for Speech Information Retrieval
by: Lin, Yueqian, et al.
Published: (2024) -
HippoMM: Hippocampal-inspired Multimodal Memory for Long Audiovisual Event Understanding
by: Lin, Yueqian, et al.
Published: (2025)