:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yu, Jiangyong, Han, Xiaomeng, Hu, Xing, Xu, Chen, Jiang, Zhe, Yang, Dawei
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2602.02988
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

I-LLM: Efficient Integer-Only Inference for Fully-Quantized Low-Bit Large Language Models
by: Hu, Xing, et al.
Published: (2024)

OstQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitting
by: Hu, Xing, et al.
Published: (2025)

Pushing the Limits of BFP on Narrow Precision LLM Inference
by: Wang, Hui, et al.
Published: (2025)

MoEQuant: Enhancing Quantization for Mixture-of-Experts Large Language Models via Expert-Balanced Sampling and Affinity Guidance
by: Hu, Xing, et al.
Published: (2025)

FQ-PETR: Fully Quantized Position Embedding Transformation for Multi-View 3D Object Detection
by: Yu, Jiangyong, et al.
Published: (2025)

RWKVQuant: Quantizing the RWKV Family with Proxy Guided Hybrid of Scalar and Vector Quantization
by: Xu, Chen, et al.
Published: (2025)

MambaQuant: Quantizing the Mamba Family with Variance Aligned Rotation Methods
by: Xu, Zukang, et al.
Published: (2025)

From Retinal Evidence to Safe Decisions: RETINA-SAFE and ECRT for Hallucination Risk Triage in Medical LLMs
by: Yu, Zhe, et al.
Published: (2026)

TORQ: Two-Level Orthogonal Rotation for MXFP4 Quantization
by: Xu, Zukang, et al.
Published: (2026)

MoBiE: Efficient Inference of Mixture of Binary Experts under Post-Training Quantization
by: Zhao, Zhixiong, et al.
Published: (2026)

Compartmentalised Agentic Reasoning for Clinical NLI
by: Jullien, Maël, et al.
Published: (2025)

MQuant: Unleashing the Inference Potential of Multimodal Large Language Models via Full Static Quantization
by: Yu, JiangYong, et al.
Published: (2025)

Neural Operators as Efficient Function Interpolators
by: Niarchos, Vasilis, et al.
Published: (2026)

Affective-NLI: Towards Accurate and Interpretable Personality Recognition in Conversation
by: Wen, Zhiyuan, et al.
Published: (2024)

Attention Slipping: A Mechanistic Understanding of Jailbreak Attacks and Defenses in LLMs
by: Hu, Xiaomeng, et al.
Published: (2025)

ViLegalNLI: Natural Language Inference for Vietnamese Legal Texts
by: Duong, Nhung Thi-Hong, et al.
Published: (2026)

When Benchmarks Leak: Inference-Time Decontamination for LLMs
by: Chai, Jianzhe, et al.
Published: (2026)

KBVQ-MoE: KLT-guided SVD with Bias-Corrected Vector Quantization for MoE Large Language Models
by: Xu, Zukang, et al.
Published: (2026)

Draft-based Approximate Inference for LLMs
by: Galim, Kevin, et al.
Published: (2025)

MorphNLI: A Stepwise Approach to Natural Language Inference Using Text Morphing
by: Negru, Vlad Andrei, et al.
Published: (2025)

Universal Approximation of Nonlinear Operators and Their Derivatives
by: de Feo, Filippo
Published: (2026)

FQ-PETR: Fully Quantized Position Embedding Transformation for Multi-View 3D Object Detection
by: Yu, Jiangyong, et al.
Published: (2025)

FinNLI: Novel Dataset for Multi-Genre Financial Natural Language Inference Benchmarking
by: Magomere, Jabez, et al.
Published: (2025)

BWLA: Breaking the Barrier of W1AX Post-Training Quantization for LLMs
by: Zhao, Zhixiong, et al.
Published: (2026)

NoiseDiffusion: Correcting Noise for Image Interpolation with Diffusion Models beyond Spherical Linear Interpolation
by: Zheng, PengFei, et al.
Published: (2024)

Revisiting NLI: Towards Cost-Effective and Human-Aligned Metrics for Evaluating LLMs in Question Answering
by: Balamurali, Sai Shridhar, et al.
Published: (2025)

PEANO-ViT: Power-Efficient Approximations of Non-Linearities in Vision Transformers
by: Sadeghi, Mohammad Erfan, et al.
Published: (2024)

OpInf-LLM: Parametric PDE Solving with LLMs via Operator Inference
by: Wang, Zhuoyuan, et al.
Published: (2026)

NeuralMatrix: Compute the Entire Neural Networks with Linear Matrix Operations for Efficient Inference
by: Sun, Ruiqi, et al.
Published: (2023)

LatentAudit: Real-Time White-Box Faithfulness Monitoring for Retrieval-Augmented Generation with Verifiable Deployment
by: Yu, Zhe, et al.
Published: (2026)

NL-Eye: Abductive NLI for Images
by: Ventura, Mor, et al.
Published: (2024)

DLLMQuant: Quantizing Diffusion-based Large Language Models
by: Xu, Chen, et al.
Published: (2025)

KVNAND: Efficient On-Device Large Language Model Inference Using DRAM-Free In-Flash Computing
by: Deng, Lishuo, et al.
Published: (2025)

DLO: Dynamic Layer Operation for Efficient Vertical Scaling of LLMs
by: Tan, Zhen, et al.
Published: (2024)

Revisiting Model Interpolation for Efficient Reasoning
by: Wu, Taiqiang, et al.
Published: (2025)

CAR-SAM: Cross-Attention Reconstruction for Post-Training Quantization of the Segment Anything Model
by: Wen, Houji, et al.
Published: (2026)

Empowering LLMs for Structure-Based Drug Design via Exploration-Augmented Latent Inference
by: Hu, Xuanning, et al.
Published: (2026)

An Efficient Multilevel Preconditioned Nonlinear Conjugate Gradient Method for Incremental Potential Contact
by: Zhang, Yu, et al.
Published: (2026)

PrefixNLI: Detecting Factual Inconsistencies as Soon as They Arise
by: Harary, Sapir, et al.
Published: (2025)

XNLIeu: a dataset for cross-lingual NLI in Basque
by: Heredia, Maite, et al.
Published: (2024)