Na minha lista:
| Main Authors: | Nguyen, Huy, Ho, Nhat, Rinaldo, Alessandro |
|---|---|
| Formato: | Preprint |
| Publicado em: |
2025
|
| Assuntos: | |
| Acesso em linha: | https://arxiv.org/abs/2503.03213 |
| Tags: |
Adicionar Tag
Sem tags, seja o primeiro a adicionar uma tag!
|
Registos relacionados
On Least Square Estimation in Softmax Gating Mixture of Experts
Por: Nguyen, Huy, et al.
Publicado em: (2024)
Por: Nguyen, Huy, et al.
Publicado em: (2024)
Sigmoid Gating is More Sample Efficient than Softmax Gating in Mixture of Experts
Por: Nguyen, Huy, et al.
Publicado em: (2024)
Por: Nguyen, Huy, et al.
Publicado em: (2024)
On Bayesian Softmax-Gated Mixture-of-Experts Models
Por: Bariletto, Nicola, et al.
Publicado em: (2026)
Por: Bariletto, Nicola, et al.
Publicado em: (2026)
On Minimax Estimation of Parameters in Softmax-Contaminated Mixture of Experts
Por: Yan, Fanqi, et al.
Publicado em: (2025)
Por: Yan, Fanqi, et al.
Publicado em: (2025)
A General Theory for Softmax Gating Multinomial Logistic Mixture of Experts
Por: Nguyen, Huy, et al.
Publicado em: (2023)
Por: Nguyen, Huy, et al.
Publicado em: (2023)
Statistical Perspective of Top-K Sparse Softmax Gating Mixture of Experts
Por: Nguyen, Huy, et al.
Publicado em: (2023)
Por: Nguyen, Huy, et al.
Publicado em: (2023)
On Expert Estimation in Hierarchical Mixture of Experts: Beyond Softmax Gating Functions
Por: Nguyen, Huy, et al.
Publicado em: (2024)
Por: Nguyen, Huy, et al.
Publicado em: (2024)
Sigmoid Self-Attention has Lower Sample Complexity than Softmax Self-Attention: A Mixture-of-Experts Perspective
Por: Yan, Fanqi, et al.
Publicado em: (2025)
Por: Yan, Fanqi, et al.
Publicado em: (2025)
Is Temperature Sample Efficient for Softmax Gaussian Mixture of Experts?
Por: Nguyen, Huy, et al.
Publicado em: (2024)
Por: Nguyen, Huy, et al.
Publicado em: (2024)
Rethinking Multinomial Logistic Mixture of Experts with Sigmoid Gating Function
Por: Pham, Tuan Minh, et al.
Publicado em: (2026)
Por: Pham, Tuan Minh, et al.
Publicado em: (2026)
A Statistical Theory of Gated Attention through the Lens of Hierarchical Mixture of Experts
Por: Nguyen, Viet, et al.
Publicado em: (2026)
Por: Nguyen, Viet, et al.
Publicado em: (2026)
Towards Convergence Rates for Parameter Estimation in Gaussian-gated Mixture of Experts
Por: Nguyen, Huy, et al.
Publicado em: (2023)
Por: Nguyen, Huy, et al.
Publicado em: (2023)
Quadratic Gating Mixture of Experts: Statistical Insights into Self-Attention
Por: Akbarian, Pedram, et al.
Publicado em: (2024)
Por: Akbarian, Pedram, et al.
Publicado em: (2024)
On Parameter Estimation in Deviated Gaussian Mixture of Experts
Por: Nguyen, Huy, et al.
Publicado em: (2024)
Por: Nguyen, Huy, et al.
Publicado em: (2024)
On DeepSeekMoE: Statistical Benefits of Shared Experts and Normalized Sigmoid Gating
Por: Nguyen, Huy, et al.
Publicado em: (2025)
Por: Nguyen, Huy, et al.
Publicado em: (2025)
Dendrograms of Mixing Measures for Softmax-Gated Gaussian Mixture of Experts: Consistency without Model Sweeps
Por: Hai, Do Tien, et al.
Publicado em: (2025)
Por: Hai, Do Tien, et al.
Publicado em: (2025)
Improving Minimax Estimation Rates for Contaminated Mixture of Multinomial Logistic Experts via Expert Heterogeneity
Por: Yan, Fanqi, et al.
Publicado em: (2026)
Por: Yan, Fanqi, et al.
Publicado em: (2026)
Fast Model Selection and Stable Optimization for Softmax-Gated Multinomial-Logistic Mixture of Experts Models
Por: Tran, TrungKhang, et al.
Publicado em: (2026)
Por: Tran, TrungKhang, et al.
Publicado em: (2026)
Understanding Expert Structures on Minimax Parameter Estimation in Contaminated Mixture of Experts
Por: Yan, Fanqi, et al.
Publicado em: (2024)
Por: Yan, Fanqi, et al.
Publicado em: (2024)
Statistical Advantages of Perturbing Cosine Router in Mixture of Experts
Por: Nguyen, Huy, et al.
Publicado em: (2024)
Por: Nguyen, Huy, et al.
Publicado em: (2024)
FuseMoE: Mixture-of-Experts Transformers for Fleximodal Fusion
Por: Han, Xing, et al.
Publicado em: (2024)
Por: Han, Xing, et al.
Publicado em: (2024)
Mixture of Experts Meets Prompt-Based Continual Learning
Por: Le, Minh, et al.
Publicado em: (2024)
Por: Le, Minh, et al.
Publicado em: (2024)
RepLoRA: Reparameterizing Low-Rank Adaptation via the Perspective of Mixture of Experts
Por: Truong, Tuan, et al.
Publicado em: (2025)
Por: Truong, Tuan, et al.
Publicado em: (2025)
One-Prompt Strikes Back: Sparse Mixture of Experts for Prompt-based Continual Learning
Por: Le, Minh, et al.
Publicado em: (2025)
Por: Le, Minh, et al.
Publicado em: (2025)
Convergence Rates for Latent Mixing Measures in Infinite Homoscedastic Location-Scale Mixture Models
Por: Bariletto, Nicola, et al.
Publicado em: (2026)
Por: Bariletto, Nicola, et al.
Publicado em: (2026)
CompeteSMoE -- Effective Training of Sparse Mixture of Experts via Competition
Por: Pham, Quang, et al.
Publicado em: (2024)
Por: Pham, Quang, et al.
Publicado em: (2024)
Revisit Visual Prompt Tuning: The Expressiveness of Prompt Experts
Por: Le, Minh, et al.
Publicado em: (2025)
Por: Le, Minh, et al.
Publicado em: (2025)
On Zero-Initialized Attention: Optimal Prompt and Gating Factor Estimation
Por: Diep, Nghiem T., et al.
Publicado em: (2025)
Por: Diep, Nghiem T., et al.
Publicado em: (2025)
Model Selection for Gaussian-gated Gaussian Mixture of Experts Using Dendrograms of Mixing Measures
Por: Thai, Tuan, et al.
Publicado em: (2025)
Por: Thai, Tuan, et al.
Publicado em: (2025)
Modeling Expert Interactions in Sparse Mixture of Experts via Graph Structures
Por: Nguyen-Nhat, Minh-Khoi, et al.
Publicado em: (2025)
Por: Nguyen-Nhat, Minh-Khoi, et al.
Publicado em: (2025)
On the Convergence and Straightness of Rectified Flow
Por: Bansal, Vansh, et al.
Publicado em: (2024)
Por: Bansal, Vansh, et al.
Publicado em: (2024)
Revisiting Prefix-tuning: Statistical Benefits of Reparameterization among Prompts
Por: Le, Minh, et al.
Publicado em: (2024)
Por: Le, Minh, et al.
Publicado em: (2024)
On Global Convergence Rates for Federated Softmax Policy Gradient under Heterogeneous Environments
Por: Labbi, Safwan, et al.
Publicado em: (2025)
Por: Labbi, Safwan, et al.
Publicado em: (2025)
Mixture-of-Experts under Finite-Rate Gating: Communication--Generalization Trade-offs
Por: Khalesi, Ali, et al.
Publicado em: (2026)
Por: Khalesi, Ali, et al.
Publicado em: (2026)
Optimal Transport Aggregation for Distributed Mixture-of-Experts
Por: Chamroukhi, Faïcel, et al.
Publicado em: (2023)
Por: Chamroukhi, Faïcel, et al.
Publicado em: (2023)
Gaussian Process-Gated Hierarchical Mixtures of Experts
Por: Liu, Yuhao, et al.
Publicado em: (2023)
Por: Liu, Yuhao, et al.
Publicado em: (2023)
Hypernetwork-Driven Low-Rank Adaptation Across Attention Heads
Por: Diep, Nghiem T., et al.
Publicado em: (2025)
Por: Diep, Nghiem T., et al.
Publicado em: (2025)
Fast Estimation of Wasserstein Distances via Regression on Sliced Wasserstein Distances
Por: Nguyen, Khai, et al.
Publicado em: (2025)
Por: Nguyen, Khai, et al.
Publicado em: (2025)
Expert Merging in Sparse Mixture of Experts with Nash Bargaining
Por: Nguyen, Dung V., et al.
Publicado em: (2025)
Por: Nguyen, Dung V., et al.
Publicado em: (2025)
A Minimal Bifurcation Model of Load Imbalance in a Softmax Mixture-of-Experts Router
Por: Kiselev, O. M.
Publicado em: (2026)
Por: Kiselev, O. M.
Publicado em: (2026)
Registos relacionados
-
On Least Square Estimation in Softmax Gating Mixture of Experts
Por: Nguyen, Huy, et al.
Publicado em: (2024) -
Sigmoid Gating is More Sample Efficient than Softmax Gating in Mixture of Experts
Por: Nguyen, Huy, et al.
Publicado em: (2024) -
On Bayesian Softmax-Gated Mixture-of-Experts Models
Por: Bariletto, Nicola, et al.
Publicado em: (2026) -
On Minimax Estimation of Parameters in Softmax-Contaminated Mixture of Experts
Por: Yan, Fanqi, et al.
Publicado em: (2025) -
A General Theory for Softmax Gating Multinomial Logistic Mixture of Experts
Por: Nguyen, Huy, et al.
Publicado em: (2023)