Saved in:
| Main Authors: | Yu, Jiangyong, Han, Xiaomeng, Hu, Xing, Xu, Chen, Jiang, Zhe, Yang, Dawei |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.02988 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
I-LLM: Efficient Integer-Only Inference for Fully-Quantized Low-Bit Large Language Models
by: Hu, Xing, et al.
Published: (2024)
by: Hu, Xing, et al.
Published: (2024)
OstQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitting
by: Hu, Xing, et al.
Published: (2025)
by: Hu, Xing, et al.
Published: (2025)
Pushing the Limits of BFP on Narrow Precision LLM Inference
by: Wang, Hui, et al.
Published: (2025)
by: Wang, Hui, et al.
Published: (2025)
MoEQuant: Enhancing Quantization for Mixture-of-Experts Large Language Models via Expert-Balanced Sampling and Affinity Guidance
by: Hu, Xing, et al.
Published: (2025)
by: Hu, Xing, et al.
Published: (2025)
FQ-PETR: Fully Quantized Position Embedding Transformation for Multi-View 3D Object Detection
by: Yu, Jiangyong, et al.
Published: (2025)
by: Yu, Jiangyong, et al.
Published: (2025)
RWKVQuant: Quantizing the RWKV Family with Proxy Guided Hybrid of Scalar and Vector Quantization
by: Xu, Chen, et al.
Published: (2025)
by: Xu, Chen, et al.
Published: (2025)
MambaQuant: Quantizing the Mamba Family with Variance Aligned Rotation Methods
by: Xu, Zukang, et al.
Published: (2025)
by: Xu, Zukang, et al.
Published: (2025)
From Retinal Evidence to Safe Decisions: RETINA-SAFE and ECRT for Hallucination Risk Triage in Medical LLMs
by: Yu, Zhe, et al.
Published: (2026)
by: Yu, Zhe, et al.
Published: (2026)
TORQ: Two-Level Orthogonal Rotation for MXFP4 Quantization
by: Xu, Zukang, et al.
Published: (2026)
by: Xu, Zukang, et al.
Published: (2026)
MoBiE: Efficient Inference of Mixture of Binary Experts under Post-Training Quantization
by: Zhao, Zhixiong, et al.
Published: (2026)
by: Zhao, Zhixiong, et al.
Published: (2026)
Compartmentalised Agentic Reasoning for Clinical NLI
by: Jullien, Maël, et al.
Published: (2025)
by: Jullien, Maël, et al.
Published: (2025)
MQuant: Unleashing the Inference Potential of Multimodal Large Language Models via Full Static Quantization
by: Yu, JiangYong, et al.
Published: (2025)
by: Yu, JiangYong, et al.
Published: (2025)
Neural Operators as Efficient Function Interpolators
by: Niarchos, Vasilis, et al.
Published: (2026)
by: Niarchos, Vasilis, et al.
Published: (2026)
Affective-NLI: Towards Accurate and Interpretable Personality Recognition in Conversation
by: Wen, Zhiyuan, et al.
Published: (2024)
by: Wen, Zhiyuan, et al.
Published: (2024)
Attention Slipping: A Mechanistic Understanding of Jailbreak Attacks and Defenses in LLMs
by: Hu, Xiaomeng, et al.
Published: (2025)
by: Hu, Xiaomeng, et al.
Published: (2025)
ViLegalNLI: Natural Language Inference for Vietnamese Legal Texts
by: Duong, Nhung Thi-Hong, et al.
Published: (2026)
by: Duong, Nhung Thi-Hong, et al.
Published: (2026)
When Benchmarks Leak: Inference-Time Decontamination for LLMs
by: Chai, Jianzhe, et al.
Published: (2026)
by: Chai, Jianzhe, et al.
Published: (2026)
KBVQ-MoE: KLT-guided SVD with Bias-Corrected Vector Quantization for MoE Large Language Models
by: Xu, Zukang, et al.
Published: (2026)
by: Xu, Zukang, et al.
Published: (2026)
Draft-based Approximate Inference for LLMs
by: Galim, Kevin, et al.
Published: (2025)
by: Galim, Kevin, et al.
Published: (2025)
MorphNLI: A Stepwise Approach to Natural Language Inference Using Text Morphing
by: Negru, Vlad Andrei, et al.
Published: (2025)
by: Negru, Vlad Andrei, et al.
Published: (2025)
Universal Approximation of Nonlinear Operators and Their Derivatives
by: de Feo, Filippo
Published: (2026)
by: de Feo, Filippo
Published: (2026)
FQ-PETR: Fully Quantized Position Embedding Transformation for Multi-View 3D Object Detection
by: Yu, Jiangyong, et al.
Published: (2025)
by: Yu, Jiangyong, et al.
Published: (2025)
FinNLI: Novel Dataset for Multi-Genre Financial Natural Language Inference Benchmarking
by: Magomere, Jabez, et al.
Published: (2025)
by: Magomere, Jabez, et al.
Published: (2025)
BWLA: Breaking the Barrier of W1AX Post-Training Quantization for LLMs
by: Zhao, Zhixiong, et al.
Published: (2026)
by: Zhao, Zhixiong, et al.
Published: (2026)
NoiseDiffusion: Correcting Noise for Image Interpolation with Diffusion Models beyond Spherical Linear Interpolation
by: Zheng, PengFei, et al.
Published: (2024)
by: Zheng, PengFei, et al.
Published: (2024)
Revisiting NLI: Towards Cost-Effective and Human-Aligned Metrics for Evaluating LLMs in Question Answering
by: Balamurali, Sai Shridhar, et al.
Published: (2025)
by: Balamurali, Sai Shridhar, et al.
Published: (2025)
PEANO-ViT: Power-Efficient Approximations of Non-Linearities in Vision Transformers
by: Sadeghi, Mohammad Erfan, et al.
Published: (2024)
by: Sadeghi, Mohammad Erfan, et al.
Published: (2024)
OpInf-LLM: Parametric PDE Solving with LLMs via Operator Inference
by: Wang, Zhuoyuan, et al.
Published: (2026)
by: Wang, Zhuoyuan, et al.
Published: (2026)
NeuralMatrix: Compute the Entire Neural Networks with Linear Matrix Operations for Efficient Inference
by: Sun, Ruiqi, et al.
Published: (2023)
by: Sun, Ruiqi, et al.
Published: (2023)
LatentAudit: Real-Time White-Box Faithfulness Monitoring for Retrieval-Augmented Generation with Verifiable Deployment
by: Yu, Zhe, et al.
Published: (2026)
by: Yu, Zhe, et al.
Published: (2026)
NL-Eye: Abductive NLI for Images
by: Ventura, Mor, et al.
Published: (2024)
by: Ventura, Mor, et al.
Published: (2024)
DLLMQuant: Quantizing Diffusion-based Large Language Models
by: Xu, Chen, et al.
Published: (2025)
by: Xu, Chen, et al.
Published: (2025)
KVNAND: Efficient On-Device Large Language Model Inference Using DRAM-Free In-Flash Computing
by: Deng, Lishuo, et al.
Published: (2025)
by: Deng, Lishuo, et al.
Published: (2025)
DLO: Dynamic Layer Operation for Efficient Vertical Scaling of LLMs
by: Tan, Zhen, et al.
Published: (2024)
by: Tan, Zhen, et al.
Published: (2024)
Revisiting Model Interpolation for Efficient Reasoning
by: Wu, Taiqiang, et al.
Published: (2025)
by: Wu, Taiqiang, et al.
Published: (2025)
CAR-SAM: Cross-Attention Reconstruction for Post-Training Quantization of the Segment Anything Model
by: Wen, Houji, et al.
Published: (2026)
by: Wen, Houji, et al.
Published: (2026)
Empowering LLMs for Structure-Based Drug Design via Exploration-Augmented Latent Inference
by: Hu, Xuanning, et al.
Published: (2026)
by: Hu, Xuanning, et al.
Published: (2026)
An Efficient Multilevel Preconditioned Nonlinear Conjugate Gradient Method for Incremental Potential Contact
by: Zhang, Yu, et al.
Published: (2026)
by: Zhang, Yu, et al.
Published: (2026)
PrefixNLI: Detecting Factual Inconsistencies as Soon as They Arise
by: Harary, Sapir, et al.
Published: (2025)
by: Harary, Sapir, et al.
Published: (2025)
XNLIeu: a dataset for cross-lingual NLI in Basque
by: Heredia, Maite, et al.
Published: (2024)
by: Heredia, Maite, et al.
Published: (2024)
Similar Items
-
I-LLM: Efficient Integer-Only Inference for Fully-Quantized Low-Bit Large Language Models
by: Hu, Xing, et al.
Published: (2024) -
OstQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitting
by: Hu, Xing, et al.
Published: (2025) -
Pushing the Limits of BFP on Narrow Precision LLM Inference
by: Wang, Hui, et al.
Published: (2025) -
MoEQuant: Enhancing Quantization for Mixture-of-Experts Large Language Models via Expert-Balanced Sampling and Affinity Guidance
by: Hu, Xing, et al.
Published: (2025) -
FQ-PETR: Fully Quantized Position Embedding Transformation for Multi-View 3D Object Detection
by: Yu, Jiangyong, et al.
Published: (2025)