Enregistré dans:
| Auteurs principaux: | Pan, Guanzhong, Chodnekar, Vishal, Roy, Abinas, Wang, Haibo |
|---|---|
| Format: | Preprint |
| Publié: |
2025
|
| Sujets: | |
| Accès en ligne: | https://arxiv.org/abs/2509.18101 |
| Tags: |
Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
|
Documents similaires
A Middle Path for On-Premises LLM Deployment: Preserving Privacy Without Sacrificing Model Confidentiality
par: Huang, Hanbo, et autres
Publié: (2024)
par: Huang, Hanbo, et autres
Publié: (2024)
VSPrefill: Vertical-Slash Sparse Attention with Lightweight Indexing for Long-Context Prefilling
par: Guanzhong, Chen
Publié: (2026)
par: Guanzhong, Chen
Publié: (2026)
Multimodal Survival Analysis with Locally Deployable Large Language Models
par: Gögl, Moritz, et autres
Publié: (2026)
par: Gögl, Moritz, et autres
Publié: (2026)
CATP-LLM: Empowering Large Language Models for Cost-Aware Tool Planning
par: Wu, Duo, et autres
Publié: (2024)
par: Wu, Duo, et autres
Publié: (2024)
AmoebaLLM: Constructing Any-Shape Large Language Models for Efficient and Instant Deployment
par: Fu, Yonggan, et autres
Publié: (2024)
par: Fu, Yonggan, et autres
Publié: (2024)
Comparing Specialised Small and General Large Language Models on Text Classification: 100 Labelled Samples to Achieve Break-Even Performance
par: Pecher, Branislav, et autres
Publié: (2024)
par: Pecher, Branislav, et autres
Publié: (2024)
SmallThinker: A Family of Efficient Large Language Models Natively Trained for Local Deployment
par: Song, Yixin, et autres
Publié: (2025)
par: Song, Yixin, et autres
Publié: (2025)
Do Large Language Models Reason Causally Like Us? Even Better?
par: Dettki, Hanna M., et autres
Publié: (2025)
par: Dettki, Hanna M., et autres
Publié: (2025)
Deploying Open-Source Large Language Models: A performance Analysis
par: Bendi-Ouis, Yannis, et autres
Publié: (2024)
par: Bendi-Ouis, Yannis, et autres
Publié: (2024)
Performance Control in Early Exiting to Deploy Large Models at the Same Cost of Smaller Ones
par: Mofakhami, Mehrnaz, et autres
Publié: (2024)
par: Mofakhami, Mehrnaz, et autres
Publié: (2024)
Provable Benefits of In-Tool Learning for Large Language Models
par: Houliston, Sam, et autres
Publié: (2025)
par: Houliston, Sam, et autres
Publié: (2025)
Are Large-Language Models Graph Algorithmic Reasoners?
par: Taylor, Alexander K, et autres
Publié: (2024)
par: Taylor, Alexander K, et autres
Publié: (2024)
On-Premise SLMs vs. Commercial LLMs: Prompt Engineering and Incident Classification in SOCs and CSIRTs
par: Almeida, Gefté, et autres
Publié: (2025)
par: Almeida, Gefté, et autres
Publié: (2025)
FlattenQuant: Breaking Through the Inference Compute-bound for Large Language Models with Per-tensor Quantization
par: Zhang, Yi, et autres
Publié: (2024)
par: Zhang, Yi, et autres
Publié: (2024)
Breaking Model Lock-in: Cost-Efficient Zero-Shot LLM Routing via a Universal Latent Space
par: Yan, Cheng, et autres
Publié: (2026)
par: Yan, Cheng, et autres
Publié: (2026)
On-Demand Multi-Task Sparsity for Efficient Large-Model Deployment on Edge Devices
par: Huang, Lianming, et autres
Publié: (2025)
par: Huang, Lianming, et autres
Publié: (2025)
Modular Machine Learning: An Indispensable Path towards New-Generation Large Language Models
par: Wang, Xin, et autres
Publié: (2025)
par: Wang, Xin, et autres
Publié: (2025)
Time-LLM: Time Series Forecasting by Reprogramming Large Language Models
par: Jin, Ming, et autres
Publié: (2023)
par: Jin, Ming, et autres
Publié: (2023)
Steering Without Breaking: Mechanistically Informed Interventions for Discrete Diffusion Language Models
par: Zhou, Hanhan, et autres
Publié: (2026)
par: Zhou, Hanhan, et autres
Publié: (2026)
CASCADE: Case-Based Continual Adaptation for Large Language Models During Deployment
par: Guo, Siyuan, et autres
Publié: (2026)
par: Guo, Siyuan, et autres
Publié: (2026)
ML Compass: Navigating Capability, Cost, and Compliance Trade-offs in AI Model Deployment
par: Digalakis Jr, Vassilis, et autres
Publié: (2025)
par: Digalakis Jr, Vassilis, et autres
Publié: (2025)
Premise Selection for a Lean Hammer
par: Zhu, Thomas, et autres
Publié: (2025)
par: Zhu, Thomas, et autres
Publié: (2025)
Breaking the Factorization Barrier in Diffusion Language Models
par: Li, Ian, et autres
Publié: (2026)
par: Li, Ian, et autres
Publié: (2026)
SLMQuant:Benchmarking Small Language Model Quantization for Practical Deployment
par: Wang, Jiacheng, et autres
Publié: (2025)
par: Wang, Jiacheng, et autres
Publié: (2025)
CONSTRUCTA: Automating Commercial Construction Schedules in Fabrication Facilities with Large Language Models
par: Zhang, Yifan, et autres
Publié: (2025)
par: Zhang, Yifan, et autres
Publié: (2025)
Scaling Down to Scale Up: A Cost-Benefit Analysis of Replacing OpenAI's LLM with Open Source SLMs in Production
par: Irugalbandara, Chandra, et autres
Publié: (2023)
par: Irugalbandara, Chandra, et autres
Publié: (2023)
Large Language Models for Controllable Multi-property Multi-objective Molecule Optimization
par: Dey, Vishal, et autres
Publié: (2025)
par: Dey, Vishal, et autres
Publié: (2025)
SLOT: Structuring the Output of Large Language Models
par: Wang, Darren Yow-Bang, et autres
Publié: (2025)
par: Wang, Darren Yow-Bang, et autres
Publié: (2025)
Private LLM Inference on Consumer Blackwell GPUs: A Practical Guide for Cost-Effective Local Deployment in SMEs
par: Knoop, Jonathan, et autres
Publié: (2026)
par: Knoop, Jonathan, et autres
Publié: (2026)
Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?
par: Fan, Chenrui, et autres
Publié: (2025)
par: Fan, Chenrui, et autres
Publié: (2025)
Advancing Model Refinement: Muon-Optimized Distillation and Quantization for LLM Deployment
par: Sander, Jacob, et autres
Publié: (2026)
par: Sander, Jacob, et autres
Publié: (2026)
Theoretical Benefit and Limitation of Diffusion Language Model
par: Feng, Guhao, et autres
Publié: (2025)
par: Feng, Guhao, et autres
Publié: (2025)
Budget-Constrained Agentic Large Language Models: Intention-Based Planning for Costly Tool Use
par: Liu, Hanbing, et autres
Publié: (2026)
par: Liu, Hanbing, et autres
Publié: (2026)
Robust Batch-Level Query Routing for Large Language Models under Cost and Capacity Constraints
par: Markovic-Voronov, Jelena, et autres
Publié: (2026)
par: Markovic-Voronov, Jelena, et autres
Publié: (2026)
Renewable Energy Prediction: A Comparative Study of Deep Learning Models for Complex Dataset Analysis
par: Wang, Haibo, et autres
Publié: (2025)
par: Wang, Haibo, et autres
Publié: (2025)
Beyond the Black Box: A Statistical Model for LLM Reasoning and Inference
par: Dalal, Siddhartha, et autres
Publié: (2024)
par: Dalal, Siddhartha, et autres
Publié: (2024)
$\text{M}^{2}$LLM: Multi-view Molecular Representation Learning with Large Language Models
par: Ju, Jiaxin, et autres
Publié: (2025)
par: Ju, Jiaxin, et autres
Publié: (2025)
EEGAgent: A Unified Framework for Automated EEG Analysis Using Large Language Models
par: Zhao, Sha, et autres
Publié: (2025)
par: Zhao, Sha, et autres
Publié: (2025)
Self-Reported Confidence of Large Language Models in Gastroenterology: Analysis of Commercial, Open-Source, and Quantized Models
par: Naderi, Nariman, et autres
Publié: (2025)
par: Naderi, Nariman, et autres
Publié: (2025)
Position: What Can Large Language Models Tell Us about Time Series Analysis
par: Jin, Ming, et autres
Publié: (2024)
par: Jin, Ming, et autres
Publié: (2024)
Documents similaires
-
A Middle Path for On-Premises LLM Deployment: Preserving Privacy Without Sacrificing Model Confidentiality
par: Huang, Hanbo, et autres
Publié: (2024) -
VSPrefill: Vertical-Slash Sparse Attention with Lightweight Indexing for Long-Context Prefilling
par: Guanzhong, Chen
Publié: (2026) -
Multimodal Survival Analysis with Locally Deployable Large Language Models
par: Gögl, Moritz, et autres
Publié: (2026) -
CATP-LLM: Empowering Large Language Models for Cost-Aware Tool Planning
par: Wu, Duo, et autres
Publié: (2024) -
AmoebaLLM: Constructing Any-Shape Large Language Models for Efficient and Instant Deployment
par: Fu, Yonggan, et autres
Publié: (2024)