:: Library Catalog

Imagem da capa

Na minha lista:

Detalhes bibliográficos
Main Authors:	Nguyen, Huy, Ho, Nhat, Rinaldo, Alessandro
Formato:	Preprint
Publicado em:	2025
Assuntos:	Machine Learning
Acesso em linha:	https://arxiv.org/abs/2503.03213
Tags:	Adicionar Tag Sem tags, seja o primeiro a adicionar uma tag!

Registos relacionados

On Least Square Estimation in Softmax Gating Mixture of Experts
Por: Nguyen, Huy, et al.
Publicado em: (2024)

Sigmoid Gating is More Sample Efficient than Softmax Gating in Mixture of Experts
Por: Nguyen, Huy, et al.
Publicado em: (2024)

On Bayesian Softmax-Gated Mixture-of-Experts Models
Por: Bariletto, Nicola, et al.
Publicado em: (2026)

On Minimax Estimation of Parameters in Softmax-Contaminated Mixture of Experts
Por: Yan, Fanqi, et al.
Publicado em: (2025)

A General Theory for Softmax Gating Multinomial Logistic Mixture of Experts
Por: Nguyen, Huy, et al.
Publicado em: (2023)

Statistical Perspective of Top-K Sparse Softmax Gating Mixture of Experts
Por: Nguyen, Huy, et al.
Publicado em: (2023)

On Expert Estimation in Hierarchical Mixture of Experts: Beyond Softmax Gating Functions
Por: Nguyen, Huy, et al.
Publicado em: (2024)

Sigmoid Self-Attention has Lower Sample Complexity than Softmax Self-Attention: A Mixture-of-Experts Perspective
Por: Yan, Fanqi, et al.
Publicado em: (2025)

Is Temperature Sample Efficient for Softmax Gaussian Mixture of Experts?
Por: Nguyen, Huy, et al.
Publicado em: (2024)

Rethinking Multinomial Logistic Mixture of Experts with Sigmoid Gating Function
Por: Pham, Tuan Minh, et al.
Publicado em: (2026)

A Statistical Theory of Gated Attention through the Lens of Hierarchical Mixture of Experts
Por: Nguyen, Viet, et al.
Publicado em: (2026)

Towards Convergence Rates for Parameter Estimation in Gaussian-gated Mixture of Experts
Por: Nguyen, Huy, et al.
Publicado em: (2023)

Quadratic Gating Mixture of Experts: Statistical Insights into Self-Attention
Por: Akbarian, Pedram, et al.
Publicado em: (2024)

On Parameter Estimation in Deviated Gaussian Mixture of Experts
Por: Nguyen, Huy, et al.
Publicado em: (2024)

On DeepSeekMoE: Statistical Benefits of Shared Experts and Normalized Sigmoid Gating
Por: Nguyen, Huy, et al.
Publicado em: (2025)

Dendrograms of Mixing Measures for Softmax-Gated Gaussian Mixture of Experts: Consistency without Model Sweeps
Por: Hai, Do Tien, et al.
Publicado em: (2025)

Improving Minimax Estimation Rates for Contaminated Mixture of Multinomial Logistic Experts via Expert Heterogeneity
Por: Yan, Fanqi, et al.
Publicado em: (2026)

Fast Model Selection and Stable Optimization for Softmax-Gated Multinomial-Logistic Mixture of Experts Models
Por: Tran, TrungKhang, et al.
Publicado em: (2026)

Understanding Expert Structures on Minimax Parameter Estimation in Contaminated Mixture of Experts
Por: Yan, Fanqi, et al.
Publicado em: (2024)

Statistical Advantages of Perturbing Cosine Router in Mixture of Experts
Por: Nguyen, Huy, et al.
Publicado em: (2024)

FuseMoE: Mixture-of-Experts Transformers for Fleximodal Fusion
Por: Han, Xing, et al.
Publicado em: (2024)

Mixture of Experts Meets Prompt-Based Continual Learning
Por: Le, Minh, et al.
Publicado em: (2024)

RepLoRA: Reparameterizing Low-Rank Adaptation via the Perspective of Mixture of Experts
Por: Truong, Tuan, et al.
Publicado em: (2025)

One-Prompt Strikes Back: Sparse Mixture of Experts for Prompt-based Continual Learning
Por: Le, Minh, et al.
Publicado em: (2025)

Convergence Rates for Latent Mixing Measures in Infinite Homoscedastic Location-Scale Mixture Models
Por: Bariletto, Nicola, et al.
Publicado em: (2026)

CompeteSMoE -- Effective Training of Sparse Mixture of Experts via Competition
Por: Pham, Quang, et al.
Publicado em: (2024)

Revisit Visual Prompt Tuning: The Expressiveness of Prompt Experts
Por: Le, Minh, et al.
Publicado em: (2025)

On Zero-Initialized Attention: Optimal Prompt and Gating Factor Estimation
Por: Diep, Nghiem T., et al.
Publicado em: (2025)

Model Selection for Gaussian-gated Gaussian Mixture of Experts Using Dendrograms of Mixing Measures
Por: Thai, Tuan, et al.
Publicado em: (2025)

Modeling Expert Interactions in Sparse Mixture of Experts via Graph Structures
Por: Nguyen-Nhat, Minh-Khoi, et al.
Publicado em: (2025)

On the Convergence and Straightness of Rectified Flow
Por: Bansal, Vansh, et al.
Publicado em: (2024)

Revisiting Prefix-tuning: Statistical Benefits of Reparameterization among Prompts
Por: Le, Minh, et al.
Publicado em: (2024)

On Global Convergence Rates for Federated Softmax Policy Gradient under Heterogeneous Environments
Por: Labbi, Safwan, et al.
Publicado em: (2025)

Mixture-of-Experts under Finite-Rate Gating: Communication--Generalization Trade-offs
Por: Khalesi, Ali, et al.
Publicado em: (2026)

Optimal Transport Aggregation for Distributed Mixture-of-Experts
Por: Chamroukhi, Faïcel, et al.
Publicado em: (2023)

Gaussian Process-Gated Hierarchical Mixtures of Experts
Por: Liu, Yuhao, et al.
Publicado em: (2023)

Hypernetwork-Driven Low-Rank Adaptation Across Attention Heads
Por: Diep, Nghiem T., et al.
Publicado em: (2025)

Fast Estimation of Wasserstein Distances via Regression on Sliced Wasserstein Distances
Por: Nguyen, Khai, et al.
Publicado em: (2025)

Expert Merging in Sparse Mixture of Experts with Nash Bargaining
Por: Nguyen, Dung V., et al.
Publicado em: (2025)

A Minimal Bifurcation Model of Load Imbalance in a Softmax Mixture-of-Experts Router
Por: Kiselev, O. M.
Publicado em: (2026)