Saved in:
| Main Authors: | Guerrero, Pablo Robin, Pan, Yueyang, Kashyap, Sanidhya |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.08505 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Efficient On-Device Agents via Adaptive Context Management
by: Vijayvargiya, Sanidhya, et al.
Published: (2025)
by: Vijayvargiya, Sanidhya, et al.
Published: (2025)
MNN-LLM: A Generic Inference Engine for Fast Large Language Model Deployment on Mobile Devices
by: Wang, Zhaode, et al.
Published: (2025)
by: Wang, Zhaode, et al.
Published: (2025)
Efficient Deployment of Large Language Models on Resource-constrained Devices
by: Yao, Zhiwei, et al.
Published: (2025)
by: Yao, Zhiwei, et al.
Published: (2025)
Training Machine Learning Models on Human Spatio-temporal Mobility Data: An Experimental Study [Experiment Paper]
by: Liu, Yueyang, et al.
Published: (2025)
by: Liu, Yueyang, et al.
Published: (2025)
AutoTailor: Automatic and Efficient Adaptive Model Deployment for Diverse Edge Devices
by: Liu, Mengyang, et al.
Published: (2025)
by: Liu, Mengyang, et al.
Published: (2025)
A Study on Inference Latency for Vision Transformers on Mobile Devices
by: Li, Zhuojin, et al.
Published: (2025)
by: Li, Zhuojin, et al.
Published: (2025)
Resource-Efficient Generative AI Model Deployment in Mobile Edge Networks
by: Liang, Yuxin, et al.
Published: (2024)
by: Liang, Yuxin, et al.
Published: (2024)
Understanding Large Language Models in Your Pockets: Performance Study on COTS Mobile Devices
by: Xiao, Jie, et al.
Published: (2024)
by: Xiao, Jie, et al.
Published: (2024)
MobileLLM-Flash: Latency-Guided On-Device LLM Design for Industry Scale Deployment
by: Huang, Hanxian, et al.
Published: (2026)
by: Huang, Hanxian, et al.
Published: (2026)
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
by: Liu, Zechun, et al.
Published: (2024)
by: Liu, Zechun, et al.
Published: (2024)
TinyFormer: Efficient Transformer Design and Deployment on Tiny Devices
by: Yang, Jianlei, et al.
Published: (2023)
by: Yang, Jianlei, et al.
Published: (2023)
Interpretable Discovery of One-parameter Subgroups: A Modular Framework for Elliptical, Hyperbolic, and Parabolic Symmetries
by: Karjol, Pavan, et al.
Published: (2025)
by: Karjol, Pavan, et al.
Published: (2025)
Energy-Efficient Vision Transformer Inference for Edge-AI Deployment
by: Amanzhol, Nursultan, et al.
Published: (2025)
by: Amanzhol, Nursultan, et al.
Published: (2025)
On-Device Vision Training, Deployment, and Inference on a Thumb-Sized Microcontroller
by: Ellis, Jeremy
Published: (2026)
by: Ellis, Jeremy
Published: (2026)
Methodology to Deploy CNN-Based Computer Vision Models on Immersive Wearable Devices
by: Malek, Kaveh, et al.
Published: (2024)
by: Malek, Kaveh, et al.
Published: (2024)
Designing and Deploying AI Models for Sustainable Logistics Optimization: A Case Study on Eco-Efficient Supply Chains in the USA
by: Shawon, Reza E Rabbi, et al.
Published: (2025)
by: Shawon, Reza E Rabbi, et al.
Published: (2025)
On-Demand Multi-Task Sparsity for Efficient Large-Model Deployment on Edge Devices
by: Huang, Lianming, et al.
Published: (2025)
by: Huang, Lianming, et al.
Published: (2025)
MobileKernelBench: Can LLMs Write Efficient Kernels for Mobile Devices?
by: Zou, Xingze, et al.
Published: (2026)
by: Zou, Xingze, et al.
Published: (2026)
FLoRA: Enhancing Vision-Language Models with Parameter-Efficient Federated Learning
by: Nguyen, Duy Phuong, et al.
Published: (2024)
by: Nguyen, Duy Phuong, et al.
Published: (2024)
Deploying Large AI Models on Resource-Limited Devices with Split Federated Learning
by: Qiang, Xianke, et al.
Published: (2025)
by: Qiang, Xianke, et al.
Published: (2025)
Bridging Embodiment Gaps: Deploying Vision-Language-Action Models on Soft Robots
by: Su, Haochen, et al.
Published: (2025)
by: Su, Haochen, et al.
Published: (2025)
TAP-ViTs: Task-Adaptive Pruning for On-Device Deployment of Vision Transformers
by: Wang, Zhibo, et al.
Published: (2026)
by: Wang, Zhibo, et al.
Published: (2026)
LQ-LoRA: Low-rank Plus Quantized Matrix Decomposition for Efficient Language Model Finetuning
by: Guo, Han, et al.
Published: (2023)
by: Guo, Han, et al.
Published: (2023)
Efficient Exact Resistance Distance Computation on Small-Treewidth Graphs: a Labelling Approach
by: Liao, Meihao, et al.
Published: (2025)
by: Liao, Meihao, et al.
Published: (2025)
Memory-Efficient Backpropagation for Fine-Tuning LLMs on Resource-Constrained Mobile Devices
by: Song, Congzheng, et al.
Published: (2025)
by: Song, Congzheng, et al.
Published: (2025)
Deeploy: Enabling Energy-Efficient Deployment of Small Language Models On Heterogeneous Microcontrollers
by: Scherer, Moritz, et al.
Published: (2024)
by: Scherer, Moritz, et al.
Published: (2024)
PSK@EEUCA 2026: Fine-Tuning Large Language Models with Synthetic Data Augmentation for Multi-Class Toxicity Detection in Gaming Chat
by: Pulipaka, Srikar Kashyap
Published: (2026)
by: Pulipaka, Srikar Kashyap
Published: (2026)
MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases
by: Murthy, Rithesh, et al.
Published: (2024)
by: Murthy, Rithesh, et al.
Published: (2024)
Recurrent Memory-Augmented Transformers with Chunked Attention for Long-Context Language Modeling
by: Kashyap, Ankit
Published: (2025)
by: Kashyap, Ankit
Published: (2025)
Breaking SafetyCore: Exploring the Risks of On-Device AI Deployment
by: Guyomard, Victor, et al.
Published: (2025)
by: Guyomard, Victor, et al.
Published: (2025)
SmallThinker: A Family of Efficient Large Language Models Natively Trained for Local Deployment
by: Song, Yixin, et al.
Published: (2025)
by: Song, Yixin, et al.
Published: (2025)
Promote, Suppress, Iterate: How Language Models Answer One-to-Many Factual Queries
by: Yan, Tianyi Lorena, et al.
Published: (2025)
by: Yan, Tianyi Lorena, et al.
Published: (2025)
A Cost-Benefit Analysis of On-Premise Large Language Model Deployment: Breaking Even with Commercial LLM Services
by: Pan, Guanzhong, et al.
Published: (2025)
by: Pan, Guanzhong, et al.
Published: (2025)
CASCADE: Case-Based Continual Adaptation for Large Language Models During Deployment
by: Guo, Siyuan, et al.
Published: (2026)
by: Guo, Siyuan, et al.
Published: (2026)
DTMM: Deploying TinyML Models on Extremely Weak IoT Devices with Pruning
by: Han, Lixiang, et al.
Published: (2024)
by: Han, Lixiang, et al.
Published: (2024)
Fed MobiLLM: Efficient Federated LLM Fine-Tuning over Heterogeneous Mobile Devices via Server Assisted Side-Tuning
by: Yang, Xingke, et al.
Published: (2025)
by: Yang, Xingke, et al.
Published: (2025)
Scaling Up Efficient Small Language Models Serving and Deployment for Semantic Job Search
by: Behdin, Kayhan, et al.
Published: (2025)
by: Behdin, Kayhan, et al.
Published: (2025)
AmoebaLLM: Constructing Any-Shape Large Language Models for Efficient and Instant Deployment
by: Fu, Yonggan, et al.
Published: (2024)
by: Fu, Yonggan, et al.
Published: (2024)
EdgeMoE: Empowering Sparse Large Language Models on Mobile Devices
by: Yi, Rongjie, et al.
Published: (2023)
by: Yi, Rongjie, et al.
Published: (2023)
Mixture of Cache-Conditional Experts for Efficient Mobile Device Inference
by: Skliar, Andrii, et al.
Published: (2024)
by: Skliar, Andrii, et al.
Published: (2024)
Similar Items
-
Efficient On-Device Agents via Adaptive Context Management
by: Vijayvargiya, Sanidhya, et al.
Published: (2025) -
MNN-LLM: A Generic Inference Engine for Fast Large Language Model Deployment on Mobile Devices
by: Wang, Zhaode, et al.
Published: (2025) -
Efficient Deployment of Large Language Models on Resource-constrained Devices
by: Yao, Zhiwei, et al.
Published: (2025) -
Training Machine Learning Models on Human Spatio-temporal Mobility Data: An Experimental Study [Experiment Paper]
by: Liu, Yueyang, et al.
Published: (2025) -
AutoTailor: Automatic and Efficient Adaptive Model Deployment for Diverse Edge Devices
by: Liu, Mengyang, et al.
Published: (2025)