Guardado en:
| Autores principales: | Xiao, He, Yang, Qingyao, Xie, Dirui, Xu, Wendong, Su, Zunhai, yang, Runming, Zhou, Wenyong, Liu, Haobo, Liu, Zhengwu, Wong, Ngai |
|---|---|
| Formato: | Preprint |
| Publicado: |
2025
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2508.03332 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
PTQTP: Post-Training Quantization to Trit-Planes for Large Language Models
por: Xiao, He, et al.
Publicado: (2025)
por: Xiao, He, et al.
Publicado: (2025)
Binary Weight Multi-Bit Activation Quantization for Compute-in-Memory CNN Accelerators
por: Zhou, Wenyong, et al.
Publicado: (2025)
por: Zhou, Wenyong, et al.
Publicado: (2025)
Beyond Outliers: A Data-Free Layer-wise Mixed-Precision Quantization Approach Driven by Numerical and Structural Dual-Sensitivity
por: Zhang, Hengyuan, et al.
Publicado: (2026)
por: Zhang, Hengyuan, et al.
Publicado: (2026)
Distribution-Aware Hadamard Quantization for Hardware-Efficient Implicit Neural Representations
por: Zhou, Wenyong, et al.
Publicado: (2025)
por: Zhou, Wenyong, et al.
Publicado: (2025)
Can We Trust LLMs on Memristors? Diving into Reasoning Ability under Non-Ideality
por: Wu, Taiqiang, et al.
Publicado: (2026)
por: Wu, Taiqiang, et al.
Publicado: (2026)
Quantization Meets Reasoning: Exploring LLM Low-Bit Quantization Degradation for Mathematical Reasoning
por: Li, Zhen, et al.
Publicado: (2025)
por: Li, Zhen, et al.
Publicado: (2025)
XStreamVGGT: Extremely Memory-Efficient Streaming Vision Geometry Grounded Transformer with KV Cache Compression
por: Su, Zunhai, et al.
Publicado: (2026)
por: Su, Zunhai, et al.
Publicado: (2026)
XStreamVGGT: Extremely Memory-Efficient Streaming Vision Geometry Grounded Transformer with KV Cache Compression
por: Su, Zunhai, et al.
Publicado: (2026)
por: Su, Zunhai, et al.
Publicado: (2026)
Enhancing Robustness of Implicit Neural Representations Against Weight Perturbations
por: Zhou, Wenyong, et al.
Publicado: (2025)
por: Zhou, Wenyong, et al.
Publicado: (2025)
MINR: Efficient Implicit Neural Representations for Multi-Image Encoding
por: Zhou, Wenyong, et al.
Publicado: (2025)
por: Zhou, Wenyong, et al.
Publicado: (2025)
ROMER: Expert Replacement and Router Calibration for Robust MoE LLMs on Analog Compute-in-Memory Systems
por: Zhou, Wenyong, et al.
Publicado: (2026)
por: Zhou, Wenyong, et al.
Publicado: (2026)
Timber: Training-free Instruct Model Refining with Base via Effective Rank
por: Wu, Taiqiang, et al.
Publicado: (2025)
por: Wu, Taiqiang, et al.
Publicado: (2025)
Decomposing Densification in Gaussian Splatting for Faster 3D Scene Reconstruction
por: Huang, Binxiao, et al.
Publicado: (2025)
por: Huang, Binxiao, et al.
Publicado: (2025)
HPD: Hybrid Projection Decomposition for Robust State Space Models on Analog CIM Hardware
por: Feng, Yuannuo, et al.
Publicado: (2025)
por: Feng, Yuannuo, et al.
Publicado: (2025)
HaLoRA: Hardware-aware Low-Rank Adaptation for Large Language Models Based on Hybrid Compute-in-Memory Architecture
por: Wu, Taiqiang, et al.
Publicado: (2025)
por: Wu, Taiqiang, et al.
Publicado: (2025)
Quantization Meets Reasoning: Exploring and Mitigating Degradation of Low-Bit LLMs in Mathematical Reasoning
por: Li, Zhen, et al.
Publicado: (2025)
por: Li, Zhen, et al.
Publicado: (2025)
Extending Straight-Through Estimation for Robust Neural Networks on Analog CIM Hardware
por: Feng, Yuannuo, et al.
Publicado: (2025)
por: Feng, Yuannuo, et al.
Publicado: (2025)
Re-Activating Frozen Primitives for 3D Gaussian Splatting
por: Cheng, Yuxin, et al.
Publicado: (2025)
por: Cheng, Yuxin, et al.
Publicado: (2025)
QuadINR: Hardware-Efficient Implicit Neural Representations Through Quadratic Activation
por: Zhou, Wenyong, et al.
Publicado: (2025)
por: Zhou, Wenyong, et al.
Publicado: (2025)
CktFormalizer: Autoformalization of Natural Language into Circuit Representations
por: Xiong, Jing, et al.
Publicado: (2026)
por: Xiong, Jing, et al.
Publicado: (2026)
KVSink: Understanding and Enhancing the Preservation of Attention Sinks in KV Cache Quantization for LLMs
por: Su, Zunhai, et al.
Publicado: (2025)
por: Su, Zunhai, et al.
Publicado: (2025)
Perspective-aware 3D Gaussian Inpainting with Multi-view Consistency
por: Cheng, Yuxin, et al.
Publicado: (2025)
por: Cheng, Yuxin, et al.
Publicado: (2025)
APTQ: Attention-aware Post-Training Mixed-Precision Quantization for Large Language Models
por: Guan, Ziyi, et al.
Publicado: (2024)
por: Guan, Ziyi, et al.
Publicado: (2024)
Revisiting Model Interpolation for Efficient Reasoning
por: Wu, Taiqiang, et al.
Publicado: (2025)
por: Wu, Taiqiang, et al.
Publicado: (2025)
DoPE: Denoising Rotary Position Embedding
por: Xiong, Jing, et al.
Publicado: (2025)
por: Xiong, Jing, et al.
Publicado: (2025)
Comparing point‐wise and pair‐wise relevance judgment with brain signals
por: Shuqi Zhu, et al.
Publicado: (2024)
por: Shuqi Zhu, et al.
Publicado: (2024)
P$^2$-ViT: Power-of-Two Post-Training Quantization and Acceleration for Fully Quantized Vision Transformer
por: Shi, Huihong, et al.
Publicado: (2024)
por: Shi, Huihong, et al.
Publicado: (2024)
AnchorTP: Resilient LLM Inference with State-Preserving Elastic Tensor Parallelism
por: Xu, Wendong, et al.
Publicado: (2025)
por: Xu, Wendong, et al.
Publicado: (2025)
Shadow-FT: Tuning Instruct Model via Training on Paired Base Model
por: Wu, Taiqiang, et al.
Publicado: (2025)
por: Wu, Taiqiang, et al.
Publicado: (2025)
Nonparametric Teaching for Graph Property Learners
por: Zhang, Chen, et al.
Publicado: (2025)
por: Zhang, Chen, et al.
Publicado: (2025)
Trio-ViT: Post-Training Quantization and Acceleration for Softmax-Free Efficient Vision Transformer
por: Shi, Huihong, et al.
Publicado: (2024)
por: Shi, Huihong, et al.
Publicado: (2024)
Unveiling Super Experts in Mixture-of-Experts Large Language Models
por: Su, Zunhai, et al.
Publicado: (2025)
por: Su, Zunhai, et al.
Publicado: (2025)
A Time- and Energy-Efficient CNN with Dense Connections on Memristor-Based Chips
por: Zhou, Wenyong, et al.
Publicado: (2025)
por: Zhou, Wenyong, et al.
Publicado: (2025)
Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language Models
por: Wu, Taiqiang, et al.
Publicado: (2024)
por: Wu, Taiqiang, et al.
Publicado: (2024)
Weight Group-wise Post-Training Quantization for Medical Foundation Model
por: Chen, Yineng, et al.
Publicado: (2026)
por: Chen, Yineng, et al.
Publicado: (2026)
Layer-wise Quantization for Quantized Optimistic Dual Averaging
por: Nguyen, Anh Duc, et al.
Publicado: (2025)
por: Nguyen, Anh Duc, et al.
Publicado: (2025)
Quantization Error Propagation: Revisiting Layer-Wise Post-Training Quantization
por: Arai, Yamato, et al.
Publicado: (2025)
por: Arai, Yamato, et al.
Publicado: (2025)
Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation
por: Su, Zunhai, et al.
Publicado: (2026)
por: Su, Zunhai, et al.
Publicado: (2026)
FlexRound: Learnable Rounding based on Element-wise Division for Post-Training Quantization
por: Lee, Jung Hyun, et al.
Publicado: (2023)
por: Lee, Jung Hyun, et al.
Publicado: (2023)
EPTQ: Enhanced Post-Training Quantization via Hessian-guided Network-wise Optimization
por: Gordon, Ofir, et al.
Publicado: (2023)
por: Gordon, Ofir, et al.
Publicado: (2023)
Ejemplares similares
-
PTQTP: Post-Training Quantization to Trit-Planes for Large Language Models
por: Xiao, He, et al.
Publicado: (2025) -
Binary Weight Multi-Bit Activation Quantization for Compute-in-Memory CNN Accelerators
por: Zhou, Wenyong, et al.
Publicado: (2025) -
Beyond Outliers: A Data-Free Layer-wise Mixed-Precision Quantization Approach Driven by Numerical and Structural Dual-Sensitivity
por: Zhang, Hengyuan, et al.
Publicado: (2026) -
Distribution-Aware Hadamard Quantization for Hardware-Efficient Implicit Neural Representations
por: Zhou, Wenyong, et al.
Publicado: (2025) -
Can We Trust LLMs on Memristors? Diving into Reasoning Ability under Non-Ideality
por: Wu, Taiqiang, et al.
Publicado: (2026)