Guardado en:
| Autores principales: | Zhang, Haochen, Yin, Junze, Wang, Guanchu, Liu, Zirui, Yang, Lin F., Zhang, Tianyi, Shrivastava, Anshumali, Braverman, Vladimir |
|---|---|
| Formato: | Preprint |
| Publicado: |
2025
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2502.05790 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
CoVE: Compressed Vocabulary Expansion Makes Better LLM-based Recommender Systems
por: Zhang, Haochen, et al.
Publicado: (2025)
por: Zhang, Haochen, et al.
Publicado: (2025)
LeanQuant: Accurate and Scalable Large Language Model Quantization with Loss-error-aware Grid
por: Zhang, Tianyi, et al.
Publicado: (2024)
por: Zhang, Tianyi, et al.
Publicado: (2024)
Support Basis: Fast Attention Beyond Bounded Entries
por: Aliakbarpour, Maryam, et al.
Publicado: (2025)
por: Aliakbarpour, Maryam, et al.
Publicado: (2025)
Learning Scalable Structural Representations for Link Prediction with Bloom Signatures
por: Zhang, Tianyi, et al.
Publicado: (2023)
por: Zhang, Tianyi, et al.
Publicado: (2023)
KV Cache is 1 Bit Per Channel: Efficient Large Language Model Inference with Coupled Quantization
por: Zhang, Tianyi, et al.
Publicado: (2024)
por: Zhang, Tianyi, et al.
Publicado: (2024)
IDentity with Locality: An ideal hash for gene sequence search
por: Desai, Aditya, et al.
Publicado: (2024)
por: Desai, Aditya, et al.
Publicado: (2024)
NoMAD-Attention: Efficient LLM Inference on CPUs Through Multiply-add-free Attention
por: Zhang, Tianyi, et al.
Publicado: (2024)
por: Zhang, Tianyi, et al.
Publicado: (2024)
High-Dimensional Robust Mean Estimation with Untrusted Batches
por: Aliakbarpour, Maryam, et al.
Publicado: (2026)
por: Aliakbarpour, Maryam, et al.
Publicado: (2026)
Sketch to Adapt: Fine-Tunable Sketches for Efficient LLM Adaptation
por: Zhang, Tianyi, et al.
Publicado: (2024)
por: Zhang, Tianyi, et al.
Publicado: (2024)
Confident or Seek Stronger: Exploring Uncertainty-Based On-device LLM Routing From Benchmarking to Generalization
por: Chuang, Yu-Neng, et al.
Publicado: (2025)
por: Chuang, Yu-Neng, et al.
Publicado: (2025)
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float (DFloat11)
por: Zhang, Tianyi, et al.
Publicado: (2025)
por: Zhang, Tianyi, et al.
Publicado: (2025)
To Compress or Not? Pushing the Frontier of Lossless GenAI Model Weights Compression with Exponent Concentration
por: Yang, Zeyu, et al.
Publicado: (2025)
por: Yang, Zeyu, et al.
Publicado: (2025)
Demystifying OPD: Length Inflation and Stabilization Strategies for Large Language Models
por: Luo, Feng, et al.
Publicado: (2026)
por: Luo, Feng, et al.
Publicado: (2026)
Efficient Alternating Minimization with Applications to Weighted Low Rank Approximation
por: Song, Zhao, et al.
Publicado: (2023)
por: Song, Zhao, et al.
Publicado: (2023)
Superintelligent Retrieval Agent: The Next Frontier of Information Retrieval
por: Yang, Zeyu, et al.
Publicado: (2026)
por: Yang, Zeyu, et al.
Publicado: (2026)
Low Rank Matrix Completion via Robust Alternating Minimization in Nearly Linear Time
por: Gu, Yuzhou, et al.
Publicado: (2023)
por: Gu, Yuzhou, et al.
Publicado: (2023)
A Dynamic Low-Rank Fast Gaussian Transform
por: Huang, Baihe, et al.
Publicado: (2022)
por: Huang, Baihe, et al.
Publicado: (2022)
DTS: Enhancing Large Reasoning Models via Decoding Tree Sketching
por: Xu, Zicheng, et al.
Publicado: (2025)
por: Xu, Zicheng, et al.
Publicado: (2025)
Scout Before You Attend: Sketch-and-Walk Sparse Attention for Efficient LLM Inference
por: Le, Hoang Anh Duy, et al.
Publicado: (2026)
por: Le, Hoang Anh Duy, et al.
Publicado: (2026)
Personalizing Low-Rank Bayesian Neural Networks Via Federated Learning
por: Zhang, Boning, et al.
Publicado: (2024)
por: Zhang, Boning, et al.
Publicado: (2024)
Low-Rank Compression of Pretrained Models via Randomized Subspace Iteration
por: Pourkamali-Anaraki, Farhad
Publicado: (2026)
por: Pourkamali-Anaraki, Farhad
Publicado: (2026)
SRLoRA: Subspace Recomposition in Low-Rank Adaptation via Importance-Based Fusion and Reinitialization
por: Yang, Haodong, et al.
Publicado: (2025)
por: Yang, Haodong, et al.
Publicado: (2025)
Assessing and Enhancing Large Language Models in Rare Disease Question-answering
por: Wang, Guanchu, et al.
Publicado: (2024)
por: Wang, Guanchu, et al.
Publicado: (2024)
GIPO: Gaussian Importance Sampling Policy Optimization
por: Lu, Chengxuan, et al.
Publicado: (2026)
por: Lu, Chengxuan, et al.
Publicado: (2026)
Stabilizing Native Low-Rank LLM Pretraining
por: Janson, Paul, et al.
Publicado: (2026)
por: Janson, Paul, et al.
Publicado: (2026)
REFRAG: Rethinking RAG based Decoding
por: Lin, Xiaoqiang, et al.
Publicado: (2025)
por: Lin, Xiaoqiang, et al.
Publicado: (2025)
Self-ensemble: Mitigating Confidence Mis-calibration for Large Language Models
por: Xu, Zicheng, et al.
Publicado: (2025)
por: Xu, Zicheng, et al.
Publicado: (2025)
Regularizing Subspace Redundancy of Low-Rank Adaptation
por: Zhu, Yue, et al.
Publicado: (2025)
por: Zhu, Yue, et al.
Publicado: (2025)
From Low Rank Gradient Subspace Stabilization to Low-Rank Weights: Observations, Theories, and Applications
por: Jaiswal, Ajay, et al.
Publicado: (2024)
por: Jaiswal, Ajay, et al.
Publicado: (2024)
Lotus: Efficient LLM Training by Randomized Low-Rank Gradient Projection with Adaptive Subspace Switching
por: Miao, Tianhao, et al.
Publicado: (2026)
por: Miao, Tianhao, et al.
Publicado: (2026)
CARAMEL: A Succinct Read-Only Lookup Table via Compressed Static Functions
por: Coleman, Benjamin, et al.
Publicado: (2023)
por: Coleman, Benjamin, et al.
Publicado: (2023)
Mixture-of-Subspaces in Low-Rank Adaptation
por: Wu, Taiqiang, et al.
Publicado: (2024)
por: Wu, Taiqiang, et al.
Publicado: (2024)
Scalable Importance Sampling in High Dimensions with Low-Rank Mixture Proposals
por: Kruse, Liam A., et al.
Publicado: (2025)
por: Kruse, Liam A., et al.
Publicado: (2025)
CoRA: Optimizing Low-Rank Adaptation with Common Subspace of Large Language Models
por: Xiao, Xiaojun, et al.
Publicado: (2024)
por: Xiao, Xiaojun, et al.
Publicado: (2024)
RACE Attention: A Strictly Linear-Time Attention Layer for Training on Outrageously Large Contexts
por: Joshi, Sahil, et al.
Publicado: (2025)
por: Joshi, Sahil, et al.
Publicado: (2025)
Empowering Distributed Training with Sparsity-driven Data Synchronization
por: Wang, Zhuang, et al.
Publicado: (2023)
por: Wang, Zhuang, et al.
Publicado: (2023)
Borrowed Geometry: Cross-Distribution Head-Importance Fingerprints of Frozen Pretrained Gemma 4 31B
por: Bektursun, Abay
Publicado: (2026)
por: Bektursun, Abay
Publicado: (2026)
Breaking the Blocks: Continuous Low-Rank Decomposed Scaling for Unified LLM Quantization and Adaptation
por: Tang, Pingzhi, et al.
Publicado: (2026)
por: Tang, Pingzhi, et al.
Publicado: (2026)
ESPO: Entropy Importance Sampling Policy Optimization
por: Sheng, Yuepeng, et al.
Publicado: (2025)
por: Sheng, Yuepeng, et al.
Publicado: (2025)
How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression?
por: Wu, Jingfeng, et al.
Publicado: (2023)
por: Wu, Jingfeng, et al.
Publicado: (2023)
Ejemplares similares
-
CoVE: Compressed Vocabulary Expansion Makes Better LLM-based Recommender Systems
por: Zhang, Haochen, et al.
Publicado: (2025) -
LeanQuant: Accurate and Scalable Large Language Model Quantization with Loss-error-aware Grid
por: Zhang, Tianyi, et al.
Publicado: (2024) -
Support Basis: Fast Attention Beyond Bounded Entries
por: Aliakbarpour, Maryam, et al.
Publicado: (2025) -
Learning Scalable Structural Representations for Link Prediction with Bloom Signatures
por: Zhang, Tianyi, et al.
Publicado: (2023) -
KV Cache is 1 Bit Per Channel: Efficient Large Language Model Inference with Coupled Quantization
por: Zhang, Tianyi, et al.
Publicado: (2024)