:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Bian, Song, Yu, Tao, Venkataraman, Shivaram, Park, Youngsuk
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2510.18245
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Scaling Inference-Efficient Language Models
by: Bian, Song, et al.
Published: (2025)

What Limits Agentic Systems Efficiency?
by: Bian, Song, et al.
Published: (2025)

LV-XAttn: Distributed Cross-Attention for Long Visual Inputs in Multimodal Large Language Models
by: Chang, Tzu-Tao, et al.
Published: (2025)

PGT-I: Scaling Spatiotemporal GNNs with Memory-Efficient Distributed Training
by: Ockerman, Seth, et al.
Published: (2025)

Inference Optimization of Foundation Models on AI Accelerators
by: Park, Youngsuk, et al.
Published: (2024)

Spectra 1.1: Scaling Laws and Efficient Inference for Ternary Language Models
by: Vaidhya, Tejas, et al.
Published: (2025)

RoSTE: An Efficient Quantization-Aware Supervised Fine-Tuning Approach for Large Language Models
by: Wei, Quan, et al.
Published: (2025)

Wukong: Towards a Scaling Law for Large-Scale Recommendation
by: Zhang, Buyun, et al.
Published: (2024)

A Simple Model of Inference Scaling Laws
by: Levi, Noam
Published: (2024)

Training LLMs with MXFP4
by: Tseng, Albert, et al.
Published: (2025)

Towards Neural Scaling Laws on Graphs
by: Liu, Jingzhe, et al.
Published: (2024)

Variance-reduced Zeroth-Order Methods for Fine-Tuning Language Models
by: Gautam, Tanmay, et al.
Published: (2024)

Towards Neural Scaling Laws for Time Series Foundation Models
by: Yao, Qingren, et al.
Published: (2024)

Tesserae: Scalable Placement Policies for Deep Learning Workloads
by: Bian, Song, et al.
Published: (2025)

Geometric Scaling of Bayesian Inference in LLMs
by: Agarwal, Naman, et al.
Published: (2025)

Evolution Meets Diffusion: Efficient Neural Architecture Generation
by: Zhou, Bingye, et al.
Published: (2025)

Verifier-free Test-Time Sampling for Vision Language Action Models
by: Jang, Suhyeok, et al.
Published: (2025)

PolyThrottle: Energy-efficient Neural Network Inference on Edge Devices
by: Yan, Minghao, et al.
Published: (2023)

Active Inference Meeting Energy-Efficient Control of Parallel and Identical Machines
by: Yeganeh, Yavar Taheri, et al.
Published: (2024)

Towards Embodiment Scaling Laws in Robot Locomotion
by: Ai, Bo, et al.
Published: (2025)

Scaling Laws for Predicting Downstream Performance in LLMs
by: Chen, Yangyi, et al.
Published: (2024)

Scaling Laws Across Model Architectures: A Comparative Analysis of Dense and MoE Models in Large Language Models
by: Wang, Siqi, et al.
Published: (2024)

Sub-Scaling Laws: On the Role of Data Density and Training Strategies in LLMs
by: Chen, Zhengyu, et al.
Published: (2025)

Scaling Laws for Data-Efficient Visual Transfer Learning
by: Yang, Wenxuan, et al.
Published: (2025)

Predicting Task Performance with Context-aware Scaling Laws
by: Montgomery, Kyle, et al.
Published: (2025)

Scaling Law Hypothesis for Multimodal Model
by: Sun, Qingyun, et al.
Published: (2024)

Theoretical Foundations of Scaling Law in Familial Models
by: Song, Huan, et al.
Published: (2025)

Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding
by: Zhao, Yilong, et al.
Published: (2025)

LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws
by: Ouyang, Xu, et al.
Published: (2026)

A Resource Model For Neural Scaling Law
by: Song, Jinyeop, et al.
Published: (2024)

Bayesian Inverse Problems Meet Flow Matching: Efficient and Flexible Inference via Transformers
by: Sherki, Daniil, et al.
Published: (2025)

Gemstones: A Model Suite for Multi-Faceted Scaling Laws
by: McLeish, Sean, et al.
Published: (2025)

Do Neural Scaling Laws Exist on Graph Self-Supervised Learning?
by: Ma, Qian, et al.
Published: (2024)

On-line Learning in Tree MDPs by Treating Policies as Bandit Arms
by: Shah, Anvay, et al.
Published: (2026)

GeNeRT: A Physics-Informed Approach to Intelligent Wireless Channel Modeling via Generalizable Neural Ray Tracing
by: Bian, Kejia, et al.
Published: (2025)

Incremental IVF Index Maintenance for Streaming Vector Search
by: Mohoney, Jason, et al.
Published: (2024)

Adaptive Training Meets Progressive Scaling: Elevating Efficiency in Diffusion Models
by: Li, Wenhao, et al.
Published: (2023)

NLI:Non-uniform Linear Interpolation Approximation of Nonlinear Operations for Efficient LLMs Inference
by: Yu, Jiangyong, et al.
Published: (2026)

Quant.npu: Enabling Efficient Mobile NPU Inference for on-device LLMs via Fully Static Quantization
by: Zhang, Jinghe, et al.
Published: (2026)

When LLM Meets Time Series: Can LLMs Perform Multi-Step Time Series Reasoning and Inference
by: Ye, Wen, et al.
Published: (2025)