Saved in:
| Main Author: | Jin, Jidong |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.15871 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SDQ: Sparse Decomposed Quantization for LLM Inference
by: Jeong, Geonhwa, et al.
Published: (2024)
by: Jeong, Geonhwa, et al.
Published: (2024)
Leveraging Large Language Models with Chain-of-Thought and Prompt Engineering for Traffic Crash Severity Analysis and Inference
by: Zhen, Hao, et al.
Published: (2024)
by: Zhen, Hao, et al.
Published: (2024)
NeMo: A Neuron-Level Modularizing-While-Training Approach for Decomposing DNN Models
by: Bi, Xiaohan, et al.
Published: (2025)
by: Bi, Xiaohan, et al.
Published: (2025)
Decomposing and Editing Predictions by Modeling Model Computation
by: Shah, Harshay, et al.
Published: (2024)
by: Shah, Harshay, et al.
Published: (2024)
DL-QAT: Weight-Decomposed Low-Rank Quantization-Aware Training for Large Language Models
by: Ke, Wenjin, et al.
Published: (2025)
by: Ke, Wenjin, et al.
Published: (2025)
Optimized Conformal Selection: Powerful Selective Inference After Conformity Score Optimization
by: Bai, Tian, et al.
Published: (2024)
by: Bai, Tian, et al.
Published: (2024)
Decomposing and Measuring Evaluation Awareness
by: Li, Changling, et al.
Published: (2026)
by: Li, Changling, et al.
Published: (2026)
Feature Group Tabular Transformer: A Novel Approach to Traffic Crash Modeling and Causality Analysis
by: Lares, Oscar, et al.
Published: (2024)
by: Lares, Oscar, et al.
Published: (2024)
BigMac: A Communication-Efficient Mixture-of-Experts Model Structure for Fast Training and Inference
by: Jin, Zewen, et al.
Published: (2025)
by: Jin, Zewen, et al.
Published: (2025)
Why Gradients Rapidly Increase Near the End of Training
by: Defazio, Aaron
Published: (2025)
by: Defazio, Aaron
Published: (2025)
Echo: Decoupling Inference and Training for Large-Scale RL Alignment on Heterogeneous Swarms
by: Xiao, Jie, et al.
Published: (2025)
by: Xiao, Jie, et al.
Published: (2025)
A Decomposable Forward Process in Diffusion Models for Time-Series Forecasting
by: Caldas, Francisco, et al.
Published: (2026)
by: Caldas, Francisco, et al.
Published: (2026)
Encoder-Decoder Diffusion Language Models for Efficient Training and Inference
by: Arriola, Marianne, et al.
Published: (2025)
by: Arriola, Marianne, et al.
Published: (2025)
Revisiting the Relationship between Adversarial and Clean Training: Why Clean Training Can Make Adversarial Training Better
by: Zhou, MingWei, et al.
Published: (2025)
by: Zhou, MingWei, et al.
Published: (2025)
LaMDA: Large Model Fine-Tuning via Spectrally Decomposed Low-Dimensional Adaptation
by: Azizi, Seyedarmin, et al.
Published: (2024)
by: Azizi, Seyedarmin, et al.
Published: (2024)
Model-Distributed Inference for Large Language Models at the Edge
by: Macario, Davide, et al.
Published: (2025)
by: Macario, Davide, et al.
Published: (2025)
Fast Inference for Augmented Large Language Models
by: Shahout, Rana, et al.
Published: (2024)
by: Shahout, Rana, et al.
Published: (2024)
Decomposing Epistemic Uncertainty for Causal Decision Making
by: Rahman, Md Musfiqur, et al.
Published: (2026)
by: Rahman, Md Musfiqur, et al.
Published: (2026)
Training Acceleration of Low-Rank Decomposed Networks using Sequential Freezing and Rank Quantization
by: Hajimolahoseini, Habib, et al.
Published: (2023)
by: Hajimolahoseini, Habib, et al.
Published: (2023)
Bayesian Inference of Training Dataset Membership
by: Huang, Yongchao
Published: (2025)
by: Huang, Yongchao
Published: (2025)
End-to-End On-Device Quantization-Aware Training for LLMs at Inference Cost
by: Tan, Qitao, et al.
Published: (2025)
by: Tan, Qitao, et al.
Published: (2025)
Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention
by: Qiu, Haiquan, et al.
Published: (2025)
by: Qiu, Haiquan, et al.
Published: (2025)
Scaling On-Device GPU Inference for Large Generative Models
by: Tang, Jiuqiang, et al.
Published: (2025)
by: Tang, Jiuqiang, et al.
Published: (2025)
ConformaDecompose: Explaining Uncertainty via Calibration Localization
by: Yapicioglu, Fatima Rabia, et al.
Published: (2026)
by: Yapicioglu, Fatima Rabia, et al.
Published: (2026)
OpenELM: An Efficient Language Model Family with Open Training and Inference Framework
by: Mehta, Sachin, et al.
Published: (2024)
by: Mehta, Sachin, et al.
Published: (2024)
MetaSAEs: Joint Training with a Decomposability Penalty Produces More Atomic Sparse Autoencoder Latents
by: Levinson, Matthew
Published: (2026)
by: Levinson, Matthew
Published: (2026)
Why Do Neural Networks Forget: A Study of Collapse in Continual Learning
by: Zhu, Yunqin, et al.
Published: (2026)
by: Zhu, Yunqin, et al.
Published: (2026)
Efficient Large Language Model Inference with Neural Block Linearization
by: Erdogan, Mete, et al.
Published: (2025)
by: Erdogan, Mete, et al.
Published: (2025)
Automatic Calibration for Membership Inference Attack on Large Language Models
by: Zade, Saleh Zare, et al.
Published: (2025)
by: Zade, Saleh Zare, et al.
Published: (2025)
Purifying Shampoo: Investigating Shampoo's Heuristics by Decomposing its Preconditioner
by: Eschenhagen, Runa, et al.
Published: (2025)
by: Eschenhagen, Runa, et al.
Published: (2025)
TimeMixer: Decomposable Multiscale Mixing for Time Series Forecasting
by: Wang, Shiyu, et al.
Published: (2024)
by: Wang, Shiyu, et al.
Published: (2024)
LiteVLM: A Low-Latency Vision-Language Model Inference Pipeline for Resource-Constrained Environments
by: Huang, Jin, et al.
Published: (2025)
by: Huang, Jin, et al.
Published: (2025)
Refinement Provenance Inference: Detecting LLM-Refined Training Prompts from Model Behavior
by: Yin, Bo, et al.
Published: (2026)
by: Yin, Bo, et al.
Published: (2026)
Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models
by: Pan, Bowen, et al.
Published: (2024)
by: Pan, Bowen, et al.
Published: (2024)
When and Why Adversarial Training Improves PINNs: A Neural Tangent Kernel Perspective
by: Cao, Yuan-dong, et al.
Published: (2026)
by: Cao, Yuan-dong, et al.
Published: (2026)
STAT: Shrinking Transformers After Training
by: Flynn, Megan, et al.
Published: (2024)
by: Flynn, Megan, et al.
Published: (2024)
Research on Low-Latency Inference and Training Efficiency Optimization for Graph Neural Network and Large Language Model-Based Recommendation Systems
by: Zhao, Yushang, et al.
Published: (2025)
by: Zhao, Yushang, et al.
Published: (2025)
Compute Aligned Training: Optimizing for Test Time Inference
by: Ousherovitch, Adam, et al.
Published: (2026)
by: Ousherovitch, Adam, et al.
Published: (2026)
Provable Training Data Identification for Large Language Models
by: Liu, Zhenlong, et al.
Published: (2025)
by: Liu, Zhenlong, et al.
Published: (2025)
Decomposed Diffusion Sampler for Accelerating Large-Scale Inverse Problems
by: Chung, Hyungjin, et al.
Published: (2023)
by: Chung, Hyungjin, et al.
Published: (2023)
Similar Items
-
SDQ: Sparse Decomposed Quantization for LLM Inference
by: Jeong, Geonhwa, et al.
Published: (2024) -
Leveraging Large Language Models with Chain-of-Thought and Prompt Engineering for Traffic Crash Severity Analysis and Inference
by: Zhen, Hao, et al.
Published: (2024) -
NeMo: A Neuron-Level Modularizing-While-Training Approach for Decomposing DNN Models
by: Bi, Xiaohan, et al.
Published: (2025) -
Decomposing and Editing Predictions by Modeling Model Computation
by: Shah, Harshay, et al.
Published: (2024) -
DL-QAT: Weight-Decomposed Low-Rank Quantization-Aware Training for Large Language Models
by: Ke, Wenjin, et al.
Published: (2025)