Saved in:
| Main Authors: | He, Tao, Huang, Guang, Yang, Yu, Xu, Tianshi, Zhao, Sicheng, Ding, Guiguang, Wang, Pengyang, Tian, Feng |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.14158 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
FastVID: Dynamic Density Pruning for Fast Video Large Language Models
by: Shen, Leqi, et al.
Published: (2025)
by: Shen, Leqi, et al.
Published: (2025)
More is Better: Deep Domain Adaptation with Multiple Sources
by: Zhao, Sicheng, et al.
Published: (2024)
by: Zhao, Sicheng, et al.
Published: (2024)
AdaTP: Attention-Debiased Token Pruning for Video Large Language Models
by: Sun, Fengyuan, et al.
Published: (2025)
by: Sun, Fengyuan, et al.
Published: (2025)
Towards Efficient Vision-Language Tuning: More Information Density, More Generalizability
by: Hao, Tianxiang, et al.
Published: (2023)
by: Hao, Tianxiang, et al.
Published: (2023)
DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text Retrieval
by: Shen, Leqi, et al.
Published: (2025)
by: Shen, Leqi, et al.
Published: (2025)
TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval
by: Shen, Leqi, et al.
Published: (2024)
by: Shen, Leqi, et al.
Published: (2024)
CAIT: Triple-Win Compression towards High Accuracy, Fast Inference, and Favorable Transferability For ViTs
by: Wang, Ao, et al.
Published: (2023)
by: Wang, Ao, et al.
Published: (2023)
Modality Reliability Guided Multimodal Recommendation
by: Dong, Xue, et al.
Published: (2025)
by: Dong, Xue, et al.
Published: (2025)
Quasar: Quantized Self-Speculative Acceleration for Rapid Inference via Memory-Efficient Verification
by: Huang, Guang, et al.
Published: (2026)
by: Huang, Guang, et al.
Published: (2026)
LLaVA-MLB: Mitigating and Leveraging Attention Bias for Training-Free Video LLMs
by: Shen, Leqi, et al.
Published: (2025)
by: Shen, Leqi, et al.
Published: (2025)
Quantized Prompt for Efficient Generalization of Vision-Language Models
by: Hao, Tianxiang, et al.
Published: (2024)
by: Hao, Tianxiang, et al.
Published: (2024)
Collaborative Speculative Inference for Efficient LLM Inference Serving
by: Gao, Luyao, et al.
Published: (2025)
by: Gao, Luyao, et al.
Published: (2025)
Efficient Adaptive Rejection Sampling for Accelerating Speculative Decoding in Large Language Models
by: Sun, Chendong, et al.
Published: (2025)
by: Sun, Chendong, et al.
Published: (2025)
Speculative Decoding for Multi-Sample Inference
by: Li, Yiwei, et al.
Published: (2025)
by: Li, Yiwei, et al.
Published: (2025)
AdaSD: Adaptive Speculative Decoding for Efficient Language Model Inference
by: Lu, Kuan-Wei, et al.
Published: (2025)
by: Lu, Kuan-Wei, et al.
Published: (2025)
FastOCR: Dynamic Visual Fixation via KV Cache Pruning for Efficient Document Parsing
by: Tang, Zihan, et al.
Published: (2026)
by: Tang, Zihan, et al.
Published: (2026)
HEQuant: Marrying Homomorphic Encryption and Quantization for Communication-Efficient Private Inference
by: Xu, Tianshi, et al.
Published: (2024)
by: Xu, Tianshi, et al.
Published: (2024)
LLMI3D: MLLM-based 3D Perception from a Single 2D Image
by: Yang, Fan, et al.
Published: (2024)
by: Yang, Fan, et al.
Published: (2024)
Evaluating Semantic and Syntactic Understanding in Large Language Models for Payroll Systems
by: Maclean, Hendrika, et al.
Published: (2026)
by: Maclean, Hendrika, et al.
Published: (2026)
Syntactic Control of Language Models by Posterior Inference
by: Xefteri, Vicky, et al.
Published: (2025)
by: Xefteri, Vicky, et al.
Published: (2025)
FR-Spec: Accelerating Large-Vocabulary Language Models via Frequency-Ranked Speculative Sampling
by: Zhao, Weilin, et al.
Published: (2025)
by: Zhao, Weilin, et al.
Published: (2025)
Syntactic and Semantic Control of Large Language Models via Sequential Monte Carlo
by: Loula, João, et al.
Published: (2025)
by: Loula, João, et al.
Published: (2025)
Probing Multimodal Large Language Models for Global and Local Semantic Representations
by: Tao, Mingxu, et al.
Published: (2024)
by: Tao, Mingxu, et al.
Published: (2024)
Minions: Accelerating Large Language Model Inference with Aggregated Speculative Execution
by: Wang, Siqi, et al.
Published: (2024)
by: Wang, Siqi, et al.
Published: (2024)
Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference
by: Zhao, Han, et al.
Published: (2024)
by: Zhao, Han, et al.
Published: (2024)
PrivCirNet: Efficient Private Inference via Block Circulant Transformation
by: Xu, Tianshi, et al.
Published: (2024)
by: Xu, Tianshi, et al.
Published: (2024)
HIPPO: Accelerating Video Large Language Models Inference via Holistic-aware Parallel Speculative Decoding
by: Lv, Qitan, et al.
Published: (2026)
by: Lv, Qitan, et al.
Published: (2026)
HEIE: MLLM-Based Hierarchical Explainable AIGC Image Implausibility Evaluator
by: Yang, Fan, et al.
Published: (2024)
by: Yang, Fan, et al.
Published: (2024)
Advancing Reliable Test-Time Adaptation of Vision-Language Models under Visual Variations
by: Liang, Yiwen, et al.
Published: (2025)
by: Liang, Yiwen, et al.
Published: (2025)
Incremental Comprehension of Garden-Path Sentences by Large Language Models: Semantic Interpretation, Syntactic Re-Analysis, and Attention
by: Li, Andrew, et al.
Published: (2024)
by: Li, Andrew, et al.
Published: (2024)
Dual Encoder: Exploiting the Potential of Syntactic and Semantic for Aspect Sentiment Triplet Extraction
by: Zhao, Xiaowei, et al.
Published: (2024)
by: Zhao, Xiaowei, et al.
Published: (2024)
PRISM: Parametrically Refactoring Inference for Speculative Sampling Draft Models
by: Wang, Xuliang, et al.
Published: (2026)
by: Wang, Xuliang, et al.
Published: (2026)
Unlocking Efficiency in Large Language Model Inference: A Comprehensive Survey of Speculative Decoding
by: Xia, Heming, et al.
Published: (2024)
by: Xia, Heming, et al.
Published: (2024)
FlowSpec: Continuous Pipelined Speculative Decoding for Efficient Distributed LLM Inference
by: Liu, Xing, et al.
Published: (2025)
by: Liu, Xing, et al.
Published: (2025)
Learning Harmonized Representations for Speculative Sampling
by: Zhang, Lefan, et al.
Published: (2024)
by: Zhang, Lefan, et al.
Published: (2024)
HADES: Hardware Accelerated Decoding for Efficient Speculation in Large Language Models
by: Yang, Ze, et al.
Published: (2024)
by: Yang, Ze, et al.
Published: (2024)
Dynamic-Width Speculative Beam Decoding for Efficient LLM Inference
by: Qin, Zongyue, et al.
Published: (2024)
by: Qin, Zongyue, et al.
Published: (2024)
SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting
by: Xu, Jiaming, et al.
Published: (2025)
by: Xu, Jiaming, et al.
Published: (2025)
EQO: Exploring Ultra-Efficient Private Inference with Winograd-Based Protocol and Quantization Co-Optimization
by: Zeng, Wenxuan, et al.
Published: (2024)
by: Zeng, Wenxuan, et al.
Published: (2024)
Large Language Models Enhanced by Plug and Play Syntactic Knowledge for Aspect-based Sentiment Analysis
by: Tian, Yuanhe, et al.
Published: (2025)
by: Tian, Yuanhe, et al.
Published: (2025)
Similar Items
-
FastVID: Dynamic Density Pruning for Fast Video Large Language Models
by: Shen, Leqi, et al.
Published: (2025) -
More is Better: Deep Domain Adaptation with Multiple Sources
by: Zhao, Sicheng, et al.
Published: (2024) -
AdaTP: Attention-Debiased Token Pruning for Video Large Language Models
by: Sun, Fengyuan, et al.
Published: (2025) -
Towards Efficient Vision-Language Tuning: More Information Density, More Generalizability
by: Hao, Tianxiang, et al.
Published: (2023) -
DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text Retrieval
by: Shen, Leqi, et al.
Published: (2025)