Saved in:
| Main Authors: | Bian, Song, Yu, Tao, Venkataraman, Shivaram, Park, Youngsuk |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.18245 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Scaling Inference-Efficient Language Models
by: Bian, Song, et al.
Published: (2025)
by: Bian, Song, et al.
Published: (2025)
What Limits Agentic Systems Efficiency?
by: Bian, Song, et al.
Published: (2025)
by: Bian, Song, et al.
Published: (2025)
LV-XAttn: Distributed Cross-Attention for Long Visual Inputs in Multimodal Large Language Models
by: Chang, Tzu-Tao, et al.
Published: (2025)
by: Chang, Tzu-Tao, et al.
Published: (2025)
PGT-I: Scaling Spatiotemporal GNNs with Memory-Efficient Distributed Training
by: Ockerman, Seth, et al.
Published: (2025)
by: Ockerman, Seth, et al.
Published: (2025)
Inference Optimization of Foundation Models on AI Accelerators
by: Park, Youngsuk, et al.
Published: (2024)
by: Park, Youngsuk, et al.
Published: (2024)
Spectra 1.1: Scaling Laws and Efficient Inference for Ternary Language Models
by: Vaidhya, Tejas, et al.
Published: (2025)
by: Vaidhya, Tejas, et al.
Published: (2025)
RoSTE: An Efficient Quantization-Aware Supervised Fine-Tuning Approach for Large Language Models
by: Wei, Quan, et al.
Published: (2025)
by: Wei, Quan, et al.
Published: (2025)
Wukong: Towards a Scaling Law for Large-Scale Recommendation
by: Zhang, Buyun, et al.
Published: (2024)
by: Zhang, Buyun, et al.
Published: (2024)
A Simple Model of Inference Scaling Laws
by: Levi, Noam
Published: (2024)
by: Levi, Noam
Published: (2024)
Training LLMs with MXFP4
by: Tseng, Albert, et al.
Published: (2025)
by: Tseng, Albert, et al.
Published: (2025)
Towards Neural Scaling Laws on Graphs
by: Liu, Jingzhe, et al.
Published: (2024)
by: Liu, Jingzhe, et al.
Published: (2024)
Variance-reduced Zeroth-Order Methods for Fine-Tuning Language Models
by: Gautam, Tanmay, et al.
Published: (2024)
by: Gautam, Tanmay, et al.
Published: (2024)
Towards Neural Scaling Laws for Time Series Foundation Models
by: Yao, Qingren, et al.
Published: (2024)
by: Yao, Qingren, et al.
Published: (2024)
Tesserae: Scalable Placement Policies for Deep Learning Workloads
by: Bian, Song, et al.
Published: (2025)
by: Bian, Song, et al.
Published: (2025)
Geometric Scaling of Bayesian Inference in LLMs
by: Agarwal, Naman, et al.
Published: (2025)
by: Agarwal, Naman, et al.
Published: (2025)
Evolution Meets Diffusion: Efficient Neural Architecture Generation
by: Zhou, Bingye, et al.
Published: (2025)
by: Zhou, Bingye, et al.
Published: (2025)
Verifier-free Test-Time Sampling for Vision Language Action Models
by: Jang, Suhyeok, et al.
Published: (2025)
by: Jang, Suhyeok, et al.
Published: (2025)
PolyThrottle: Energy-efficient Neural Network Inference on Edge Devices
by: Yan, Minghao, et al.
Published: (2023)
by: Yan, Minghao, et al.
Published: (2023)
Active Inference Meeting Energy-Efficient Control of Parallel and Identical Machines
by: Yeganeh, Yavar Taheri, et al.
Published: (2024)
by: Yeganeh, Yavar Taheri, et al.
Published: (2024)
Towards Embodiment Scaling Laws in Robot Locomotion
by: Ai, Bo, et al.
Published: (2025)
by: Ai, Bo, et al.
Published: (2025)
Scaling Laws for Predicting Downstream Performance in LLMs
by: Chen, Yangyi, et al.
Published: (2024)
by: Chen, Yangyi, et al.
Published: (2024)
Scaling Laws Across Model Architectures: A Comparative Analysis of Dense and MoE Models in Large Language Models
by: Wang, Siqi, et al.
Published: (2024)
by: Wang, Siqi, et al.
Published: (2024)
Sub-Scaling Laws: On the Role of Data Density and Training Strategies in LLMs
by: Chen, Zhengyu, et al.
Published: (2025)
by: Chen, Zhengyu, et al.
Published: (2025)
Scaling Laws for Data-Efficient Visual Transfer Learning
by: Yang, Wenxuan, et al.
Published: (2025)
by: Yang, Wenxuan, et al.
Published: (2025)
Predicting Task Performance with Context-aware Scaling Laws
by: Montgomery, Kyle, et al.
Published: (2025)
by: Montgomery, Kyle, et al.
Published: (2025)
Scaling Law Hypothesis for Multimodal Model
by: Sun, Qingyun, et al.
Published: (2024)
by: Sun, Qingyun, et al.
Published: (2024)
Theoretical Foundations of Scaling Law in Familial Models
by: Song, Huan, et al.
Published: (2025)
by: Song, Huan, et al.
Published: (2025)
Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding
by: Zhao, Yilong, et al.
Published: (2025)
by: Zhao, Yilong, et al.
Published: (2025)
LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws
by: Ouyang, Xu, et al.
Published: (2026)
by: Ouyang, Xu, et al.
Published: (2026)
A Resource Model For Neural Scaling Law
by: Song, Jinyeop, et al.
Published: (2024)
by: Song, Jinyeop, et al.
Published: (2024)
Bayesian Inverse Problems Meet Flow Matching: Efficient and Flexible Inference via Transformers
by: Sherki, Daniil, et al.
Published: (2025)
by: Sherki, Daniil, et al.
Published: (2025)
Gemstones: A Model Suite for Multi-Faceted Scaling Laws
by: McLeish, Sean, et al.
Published: (2025)
by: McLeish, Sean, et al.
Published: (2025)
Do Neural Scaling Laws Exist on Graph Self-Supervised Learning?
by: Ma, Qian, et al.
Published: (2024)
by: Ma, Qian, et al.
Published: (2024)
On-line Learning in Tree MDPs by Treating Policies as Bandit Arms
by: Shah, Anvay, et al.
Published: (2026)
by: Shah, Anvay, et al.
Published: (2026)
GeNeRT: A Physics-Informed Approach to Intelligent Wireless Channel Modeling via Generalizable Neural Ray Tracing
by: Bian, Kejia, et al.
Published: (2025)
by: Bian, Kejia, et al.
Published: (2025)
Incremental IVF Index Maintenance for Streaming Vector Search
by: Mohoney, Jason, et al.
Published: (2024)
by: Mohoney, Jason, et al.
Published: (2024)
Adaptive Training Meets Progressive Scaling: Elevating Efficiency in Diffusion Models
by: Li, Wenhao, et al.
Published: (2023)
by: Li, Wenhao, et al.
Published: (2023)
NLI:Non-uniform Linear Interpolation Approximation of Nonlinear Operations for Efficient LLMs Inference
by: Yu, Jiangyong, et al.
Published: (2026)
by: Yu, Jiangyong, et al.
Published: (2026)
Quant.npu: Enabling Efficient Mobile NPU Inference for on-device LLMs via Fully Static Quantization
by: Zhang, Jinghe, et al.
Published: (2026)
by: Zhang, Jinghe, et al.
Published: (2026)
When LLM Meets Time Series: Can LLMs Perform Multi-Step Time Series Reasoning and Inference
by: Ye, Wen, et al.
Published: (2025)
by: Ye, Wen, et al.
Published: (2025)
Similar Items
-
Scaling Inference-Efficient Language Models
by: Bian, Song, et al.
Published: (2025) -
What Limits Agentic Systems Efficiency?
by: Bian, Song, et al.
Published: (2025) -
LV-XAttn: Distributed Cross-Attention for Long Visual Inputs in Multimodal Large Language Models
by: Chang, Tzu-Tao, et al.
Published: (2025) -
PGT-I: Scaling Spatiotemporal GNNs with Memory-Efficient Distributed Training
by: Ockerman, Seth, et al.
Published: (2025) -
Inference Optimization of Foundation Models on AI Accelerators
by: Park, Youngsuk, et al.
Published: (2024)