:: Library Catalog

Imagen de Portada

Guardado en:

Detalles Bibliográficos
Autores principales:	Zhang, Haochen, Yin, Junze, Wang, Guanchu, Liu, Zirui, Yang, Lin F., Zhang, Tianyi, Shrivastava, Anshumali, Braverman, Vladimir
Formato:	Preprint
Publicado:	2025
Materias:	Machine Learning
Acceso en línea:	https://arxiv.org/abs/2502.05790
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

Ejemplares similares

CoVE: Compressed Vocabulary Expansion Makes Better LLM-based Recommender Systems
por: Zhang, Haochen, et al.
Publicado: (2025)

LeanQuant: Accurate and Scalable Large Language Model Quantization with Loss-error-aware Grid
por: Zhang, Tianyi, et al.
Publicado: (2024)

Support Basis: Fast Attention Beyond Bounded Entries
por: Aliakbarpour, Maryam, et al.
Publicado: (2025)

Learning Scalable Structural Representations for Link Prediction with Bloom Signatures
por: Zhang, Tianyi, et al.
Publicado: (2023)

KV Cache is 1 Bit Per Channel: Efficient Large Language Model Inference with Coupled Quantization
por: Zhang, Tianyi, et al.
Publicado: (2024)

IDentity with Locality: An ideal hash for gene sequence search
por: Desai, Aditya, et al.
Publicado: (2024)

NoMAD-Attention: Efficient LLM Inference on CPUs Through Multiply-add-free Attention
por: Zhang, Tianyi, et al.
Publicado: (2024)

High-Dimensional Robust Mean Estimation with Untrusted Batches
por: Aliakbarpour, Maryam, et al.
Publicado: (2026)

Sketch to Adapt: Fine-Tunable Sketches for Efficient LLM Adaptation
por: Zhang, Tianyi, et al.
Publicado: (2024)

Confident or Seek Stronger: Exploring Uncertainty-Based On-device LLM Routing From Benchmarking to Generalization
por: Chuang, Yu-Neng, et al.
Publicado: (2025)

70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float (DFloat11)
por: Zhang, Tianyi, et al.
Publicado: (2025)

To Compress or Not? Pushing the Frontier of Lossless GenAI Model Weights Compression with Exponent Concentration
por: Yang, Zeyu, et al.
Publicado: (2025)

Demystifying OPD: Length Inflation and Stabilization Strategies for Large Language Models
por: Luo, Feng, et al.
Publicado: (2026)

Efficient Alternating Minimization with Applications to Weighted Low Rank Approximation
por: Song, Zhao, et al.
Publicado: (2023)

Superintelligent Retrieval Agent: The Next Frontier of Information Retrieval
por: Yang, Zeyu, et al.
Publicado: (2026)

Low Rank Matrix Completion via Robust Alternating Minimization in Nearly Linear Time
por: Gu, Yuzhou, et al.
Publicado: (2023)

A Dynamic Low-Rank Fast Gaussian Transform
por: Huang, Baihe, et al.
Publicado: (2022)

DTS: Enhancing Large Reasoning Models via Decoding Tree Sketching
por: Xu, Zicheng, et al.
Publicado: (2025)

Scout Before You Attend: Sketch-and-Walk Sparse Attention for Efficient LLM Inference
por: Le, Hoang Anh Duy, et al.
Publicado: (2026)

Personalizing Low-Rank Bayesian Neural Networks Via Federated Learning
por: Zhang, Boning, et al.
Publicado: (2024)

Low-Rank Compression of Pretrained Models via Randomized Subspace Iteration
por: Pourkamali-Anaraki, Farhad
Publicado: (2026)

SRLoRA: Subspace Recomposition in Low-Rank Adaptation via Importance-Based Fusion and Reinitialization
por: Yang, Haodong, et al.
Publicado: (2025)

Assessing and Enhancing Large Language Models in Rare Disease Question-answering
por: Wang, Guanchu, et al.
Publicado: (2024)

GIPO: Gaussian Importance Sampling Policy Optimization
por: Lu, Chengxuan, et al.
Publicado: (2026)

Stabilizing Native Low-Rank LLM Pretraining
por: Janson, Paul, et al.
Publicado: (2026)

REFRAG: Rethinking RAG based Decoding
por: Lin, Xiaoqiang, et al.
Publicado: (2025)

Self-ensemble: Mitigating Confidence Mis-calibration for Large Language Models
por: Xu, Zicheng, et al.
Publicado: (2025)

Regularizing Subspace Redundancy of Low-Rank Adaptation
por: Zhu, Yue, et al.
Publicado: (2025)

From Low Rank Gradient Subspace Stabilization to Low-Rank Weights: Observations, Theories, and Applications
por: Jaiswal, Ajay, et al.
Publicado: (2024)

Lotus: Efficient LLM Training by Randomized Low-Rank Gradient Projection with Adaptive Subspace Switching
por: Miao, Tianhao, et al.
Publicado: (2026)

CARAMEL: A Succinct Read-Only Lookup Table via Compressed Static Functions
por: Coleman, Benjamin, et al.
Publicado: (2023)

Mixture-of-Subspaces in Low-Rank Adaptation
por: Wu, Taiqiang, et al.
Publicado: (2024)

Scalable Importance Sampling in High Dimensions with Low-Rank Mixture Proposals
por: Kruse, Liam A., et al.
Publicado: (2025)

CoRA: Optimizing Low-Rank Adaptation with Common Subspace of Large Language Models
por: Xiao, Xiaojun, et al.
Publicado: (2024)

RACE Attention: A Strictly Linear-Time Attention Layer for Training on Outrageously Large Contexts
por: Joshi, Sahil, et al.
Publicado: (2025)

Empowering Distributed Training with Sparsity-driven Data Synchronization
por: Wang, Zhuang, et al.
Publicado: (2023)

Borrowed Geometry: Cross-Distribution Head-Importance Fingerprints of Frozen Pretrained Gemma 4 31B
por: Bektursun, Abay
Publicado: (2026)

Breaking the Blocks: Continuous Low-Rank Decomposed Scaling for Unified LLM Quantization and Adaptation
por: Tang, Pingzhi, et al.
Publicado: (2026)

ESPO: Entropy Importance Sampling Policy Optimization
por: Sheng, Yuepeng, et al.
Publicado: (2025)

How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression?
por: Wu, Jingfeng, et al.
Publicado: (2023)