:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Huang, Xiusheng, Li, Zhe, Yin, Xuanwu, Wang, Lu, Wang, Yequan, Li, Dong, Barsoum, Emad, Liu, Kang
Format:	Preprint
Veröffentlicht:	2026
Schlagworte:	Machine Learning Artificial Intelligence
Online-Zugang:	https://arxiv.org/abs/2605.18800
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

Athena: Enhancing Multimodal Reasoning with Data-efficient Process Reward Models
von: Wang, Shuai, et al.
Veröffentlicht: (2025)

DL-QAT: Weight-Decomposed Low-Rank Quantization-Aware Training for Large Language Models
von: Ke, Wenjin, et al.
Veröffentlicht: (2025)

Gumiho: A Hybrid Architecture to Prioritize Early Tokens in Speculative Decoding
von: Li, Jinze, et al.
Veröffentlicht: (2025)

SparK: Query-Aware Unstructured Sparsity with Recoverable KV Cache Channel Pruning
von: Liao, Huanxuan, et al.
Veröffentlicht: (2025)

Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoE
von: Huang, Haiduo, et al.
Veröffentlicht: (2025)

Commonsense Knowledge Editing Based on Free-Text in LLMs
von: Huang, Xiusheng, et al.
Veröffentlicht: (2024)

Reasons and Solutions for the Decline in Model Performance after Editing
von: Huang, Xiusheng, et al.
Veröffentlicht: (2024)

Mutual Enhancement Between Global Tokens and Patch Tokens: From Theory to Practice
von: Huang, Xiusheng, et al.
Veröffentlicht: (2026)

Týr-the-Pruner: Structural Pruning LLMs via Global Sparsity Distribution Optimization
von: Li, Guanchen, et al.
Veröffentlicht: (2025)

SpecVLM: Fast Speculative Decoding in Vision-Language Models
von: Huang, Haiduo, et al.
Veröffentlicht: (2025)

Tensor-Compressed and Fully-Quantized Training of Neural PDE Solvers
von: Lu, Jinming, et al.
Veröffentlicht: (2025)

Learnable Permutation for Structured Sparsity on Transformer Models
von: Li, Zekai, et al.
Veröffentlicht: (2026)

PR2: Predictive Routing Replay for MoE-Based LLM Reinforcement Learning
von: Dong, Daize, et al.
Veröffentlicht: (2026)

Geak: Introducing Triton Kernel AI Agent & Evaluation Benchmarks
von: Wang, Jianghui, et al.
Veröffentlicht: (2025)

Art and Science of Quantizing Large-Scale Models: A Comprehensive Overview
von: Wang, Yanshu, et al.
Veröffentlicht: (2024)

QuZO: Quantized Zeroth-Order Fine-Tuning for Large Language Models
von: Zhou, Jiajun, et al.
Veröffentlicht: (2025)

Transformer-based Graph Neural Networks for Battery Range Prediction in AIoT Battery-Swap Services
von: Li, Zhao, et al.
Veröffentlicht: (2024)

SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer
von: Chen, Hao, et al.
Veröffentlicht: (2024)

TEON: Tensorized Orthonormalization Beyond Layer-Wise Muon for Large Language Model Pre-Training
von: Zhang, Ruijie, et al.
Veröffentlicht: (2026)

AgentKernelArena: Generalization-Aware Benchmarking of GPU Kernel Optimization Agents
von: Younesian, Sharareh, et al.
Veröffentlicht: (2026)

DC-DiT: Adaptive Compute and Elastic Inference for Visual Generation via Dynamic Chunking
von: Haridas, Akash, et al.
Veröffentlicht: (2026)

MagR: Weight Magnitude Reduction for Enhancing Post-Training Quantization
von: Zhang, Aozhong, et al.
Veröffentlicht: (2024)

ModuLoRA: Finetuning 2-Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers
von: Yin, Junjie, et al.
Veröffentlicht: (2023)

The Curse and Blessing of Mean Bias in FP4-Quantized LLM Training
von: Cao, Hengjie, et al.
Veröffentlicht: (2026)

QUARK: Quantization-Enabled Circuit Sharing for Transformer Acceleration by Exploiting Common Patterns in Nonlinear Operations
von: Zhao, Zhixiong, et al.
Veröffentlicht: (2025)

Quantum-Classical Hybrid Quantized Neural Network
von: Li, Wenxin, et al.
Veröffentlicht: (2025)

Muon$^2$: Boosting Muon via Adaptive Second-Moment Preconditioning
von: Liu, Ziyue, et al.
Veröffentlicht: (2026)

NetGPT: Generative Pretrained Transformer for Network Traffic
von: Meng, Xuying, et al.
Veröffentlicht: (2023)

Gradient Based Method for the Fusion of Lattice Quantizers
von: Zhang, Liyuan, et al.
Veröffentlicht: (2025)

Parking Availability Prediction via Fusing Multi-Source Data with A Self-Supervised Learning Enhanced Spatio-Temporal Inverted Transformer
von: Huang, Yin, et al.
Veröffentlicht: (2025)

Agile-Quant: Activation-Guided Quantization for Faster Inference of LLMs on the Edge
von: Shen, Xuan, et al.
Veröffentlicht: (2023)

End-to-End On-Device Quantization-Aware Training for LLMs at Inference Cost
von: Tan, Qitao, et al.
Veröffentlicht: (2025)

Agent Laboratory: Using LLM Agents as Research Assistants
von: Schmidgall, Samuel, et al.
Veröffentlicht: (2025)

A Fast and Flat Federated Learning Method via Weighted Momentum and Sharpness-Aware Minimization
von: Li, Tianle, et al.
Veröffentlicht: (2025)

Towards Superior Quantization Accuracy: A Layer-sensitive Approach
von: Zhang, Feng, et al.
Veröffentlicht: (2025)

APRIL: Active Partial Rollouts in Reinforcement Learning to Tame Long-tail Generation
von: Zhou, Yuzhen, et al.
Veröffentlicht: (2025)

InfoQuant: Shaping Activation Distributions for Low-Bit LLM Quantization
von: Li, Ke, et al.
Veröffentlicht: (2026)

QSpec: Speculative Decoding with Complementary Quantization Schemes
von: Zhao, Juntao, et al.
Veröffentlicht: (2024)

FlatNAS: optimizing Flatness in Neural Architecture Search for Out-of-Distribution Robustness
von: Gambella, Matteo, et al.
Veröffentlicht: (2024)

Astro: Activation-guided Structured Regularization for Outlier-Robust LLM Post-Training Quantization
von: Chen, Xi, et al.
Veröffentlicht: (2026)