:: Library Catalog

Imagen de Portada

Guardado en:

Detalles Bibliográficos
Autores principales:	Wang, Zhaode, Yang, Jingbang, Qian, Xinyu, Xing, Shiwen, Jiang, Xiaotang, Lv, Chengfei, Zhang, Shengyu
Formato:	Preprint
Publicado:	2025
Materias:	Machine Learning
Acceso en línea:	https://arxiv.org/abs/2506.10443
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

Ejemplares similares

MNN-AECS: Energy Optimization for LLM Decoding on Mobile Devices via Adaptive Core Selection
por: Huang, Zhengxiang, et al.
Publicado: (2025)

MadaKV: Adaptive Modality-Perception KV Cache Eviction for Efficient Multimodal Long-Context Inference
por: Li, Kunxi, et al.
Publicado: (2025)

FlowMM: Cross-Modal Information Flow Guided KV Cache Merging for Efficient Multimodal Context Inference
por: Li, Kunxi, et al.
Publicado: (2025)

MobileKernelBench: Can LLMs Write Efficient Kernels for Mobile Devices?
por: Zou, Xingze, et al.
Publicado: (2026)

PureKV: Plug-and-Play KV Cache Optimization with Spatial-Temporal Sparse Attention for Vision-Language Large Models
por: Jiang, Zhonghua, et al.
Publicado: (2025)

AccKV: Towards Efficient Audio-Video LLMs Inference via Adaptive-Focusing and Cross-Calibration KV Cache Optimization
por: Jiang, Zhonghua, et al.
Publicado: (2025)

RecGPT-Mobile: On-Device Large Language Models for User Intent Understanding in Taobao Feed Recommendation
por: Zhang, Bin, et al.
Publicado: (2026)

Fast Distributed Inference Serving for Large Language Models
por: Wu, Bingyang, et al.
Publicado: (2023)

RetentiveKV: State-Space Memory for Uncertainty-Aware Multimodal KV Cache Eviction
por: Liu, Sihao, et al.
Publicado: (2026)

MobileLLM-Flash: Latency-Guided On-Device LLM Design for Industry Scale Deployment
por: Huang, Hanxian, et al.
Publicado: (2026)

Efficient Deployment of Large Language Models on Resource-constrained Devices
por: Yao, Zhiwei, et al.
Publicado: (2025)

ModelGPT: Unleashing LLM's Capabilities for Tailored Model Generation
por: Tang, Zihao, et al.
Publicado: (2024)

Semantic Trimming and Auxiliary Multi-step Prediction for Generative Recommendation
por: Zhan, Tianyu, et al.
Publicado: (2026)

Large Language Models Inference Engines based on Spiking Neural Networks
por: Balaji, Adarsha, et al.
Publicado: (2025)

Large Language Models as Urban Residents: An LLM Agent Framework for Personal Mobility Generation
por: Wang, Jiawei, et al.
Publicado: (2024)

Collaborative Learning of On-Device Small Model and Cloud-Based Large Model: Advances and Future Directions
por: Niu, Chaoyue, et al.
Publicado: (2025)

ABQ-LLM: Arbitrary-Bit Quantized Inference Acceleration for Large Language Models
por: Zeng, Chao, et al.
Publicado: (2024)

AmoebaLLM: Constructing Any-Shape Large Language Models for Efficient and Instant Deployment
por: Fu, Yonggan, et al.
Publicado: (2024)

Efficient Deployment of Vision-Language Models on Mobile Devices: A Case Study on OnePlus 13R
por: Guerrero, Pablo Robin, et al.
Publicado: (2025)

Fast Inference for Augmented Large Language Models
por: Shahout, Rana, et al.
Publicado: (2024)

BEAR: Towards Beam-Search-Aware Optimization for Recommendation with Large Language Models
por: Yang, Weiqin, et al.
Publicado: (2026)

Fast and Compact Tsetlin Machine Inference on CPUs Using Instruction-Level Optimization
por: Zeng, Yefan, et al.
Publicado: (2025)

Fast NF4 Dequantization Kernels for Large Language Model Inference
por: Qi, Xiangbo, et al.
Publicado: (2026)

RetroInfer: A Vector Storage Engine for Scalable Long-Context LLM Inference
por: Chen, Yaoqi, et al.
Publicado: (2025)

EntroLLM: Entropy Encoded Weight Compression for Efficient Large Language Model Inference on Edge Devices
por: Sanyal, Arnab, et al.
Publicado: (2025)

Scaling On-Device GPU Inference for Large Generative Models
por: Tang, Jiuqiang, et al.
Publicado: (2025)

Resource-Efficient Generative AI Model Deployment in Mobile Edge Networks
por: Liang, Yuxin, et al.
Publicado: (2024)

Graph Neural Networks Automated Design and Deployment on Device-Edge Co-Inference Systems
por: Zhou, Ao, et al.
Publicado: (2024)

PLMM: Personal Large Language Models on Mobile Devices
por: Gong, Yuanhao
Publicado: (2023)

Fast-PGM: Fast Probabilistic Graphical Model Learning and Inference
por: Jiang, Jiantong, et al.
Publicado: (2024)

A Rolling Stone Gathers No Moss: Adaptive Policy Optimization for Stable Self-Evaluation in Large Multimodal Models
por: Wang, Wenkai, et al.
Publicado: (2025)

Collaboration of Large Language Models and Small Recommendation Models for Device-Cloud Recommendation
por: Lv, Zheqi, et al.
Publicado: (2025)

Making Language Models Better Tool Learners with Execution Feedback
por: Qiao, Shuofei, et al.
Publicado: (2023)

I-LLM: Efficient Integer-Only Inference for Fully-Quantized Low-Bit Large Language Models
por: Hu, Xing, et al.
Publicado: (2024)

A Novel Hat-Shaped Device-Cloud Collaborative Inference Framework for Large Language Models
por: Xie, Zuan, et al.
Publicado: (2025)

WebLLM: A High-Performance In-Browser LLM Inference Engine
por: Ruan, Charlie F., et al.
Publicado: (2024)

Memory-Efficient Backpropagation for Fine-Tuning LLMs on Resource-Constrained Mobile Devices
por: Song, Congzheng, et al.
Publicado: (2025)

GQSA: Group Quantization and Sparsity for Accelerating Large Language Model Inference
por: Zeng, Chao, et al.
Publicado: (2024)

Understanding Large Language Models in Your Pockets: Performance Study on COTS Mobile Devices
por: Xiao, Jie, et al.
Publicado: (2024)

LLM-Inference-Bench: Inference Benchmarking of Large Language Models on AI Accelerators
por: Chitty-Venkata, Krishna Teja, et al.
Publicado: (2024)