Saved in:
| Main Authors: | Jawahar, Ganesh, Yang, Haichuan, Xiong, Yunyang, Liu, Zechun, Wang, Dilin, Sun, Fei, Li, Meng, Pappu, Aasish, Oguz, Barlas, Abdul-Mageed, Muhammad, Lakshmanan, Laks V. S., Krishnamoorthi, Raghuraman, Chandra, Vikas |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2306.04845 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
LLM Performance Predictors are good initializers for Architecture Search
by: Jawahar, Ganesh, et al.
Published: (2023)
by: Jawahar, Ganesh, et al.
Published: (2023)
MobileMoE: Scaling On-Device Mixture of Experts
by: Chen, Yanbei, et al.
Published: (2026)
by: Chen, Yanbei, et al.
Published: (2026)
PathFusion: Path-consistent Lidar-Camera Deep Feature Fusion
by: Wu, Lemeng, et al.
Published: (2022)
by: Wu, Lemeng, et al.
Published: (2022)
Multi-agent Architecture Search via Agentic Supernet
by: Zhang, Guibin, et al.
Published: (2025)
by: Zhang, Guibin, et al.
Published: (2025)
Autoregressive + Chain of Thought = Recurrent: Recurrence's Role in Language Models' Computability and a Revisit of Recurrent Transformer
by: Zhang, Xiang, et al.
Published: (2024)
by: Zhang, Xiang, et al.
Published: (2024)
Post-training an LLM for RAG? Train on Self-Generated Demonstrations
by: Finlayson, Matthew, et al.
Published: (2025)
by: Finlayson, Matthew, et al.
Published: (2025)
Subnet-Aware Dynamic Supernet Training for Neural Architecture Search
by: Jeon, Jeimin, et al.
Published: (2025)
by: Jeon, Jeimin, et al.
Published: (2025)
On Supernet Transfer Learning for Effective Task Adaptation
by: Singh, Prabhant, et al.
Published: (2024)
by: Singh, Prabhant, et al.
Published: (2024)
DetoxLLM: A Framework for Detoxification with Explanations
by: Khondaker, Md Tawkat Islam, et al.
Published: (2024)
by: Khondaker, Md Tawkat Islam, et al.
Published: (2024)
Fast Data Aware Neural Architecture Search via Supernet Accelerated Evaluation
by: Njor, Emil, et al.
Published: (2025)
by: Njor, Emil, et al.
Published: (2025)
Progressive Supernet Training for Efficient Visual Autoregressive Modeling
by: Chen, Xiaoyue, et al.
Published: (2025)
by: Chen, Xiaoyue, et al.
Published: (2025)
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
by: Liu, Zechun, et al.
Published: (2024)
by: Liu, Zechun, et al.
Published: (2024)
Efficient Supernet Training with Orthogonal Softmax for Scalable ASR Model Compression
by: Xu, Jingjing, et al.
Published: (2025)
by: Xu, Jingjing, et al.
Published: (2025)
DepthShrinker: A New Compression Paradigm Towards Boosting Real-Hardware Efficiency of Compact Neural Networks
by: Fu, Yonggan, et al.
Published: (2022)
by: Fu, Yonggan, et al.
Published: (2022)
SpinQuant: LLM quantization with learned rotations
by: Liu, Zechun, et al.
Published: (2024)
by: Liu, Zechun, et al.
Published: (2024)
MedNNS: Supernet-based Medical Task-Adaptive Neural Network Search
by: Mecharbat, Lotfi Abdelkrim, et al.
Published: (2025)
by: Mecharbat, Lotfi Abdelkrim, et al.
Published: (2025)
SqueezeSAM: User friendly mobile interactive segmentation
by: Varadarajan, Balakrishnan, et al.
Published: (2023)
by: Varadarajan, Balakrishnan, et al.
Published: (2023)
Efficient Track Anything
by: Xiong, Yunyang, et al.
Published: (2024)
by: Xiong, Yunyang, et al.
Published: (2024)
Agent-as-a-Judge: Evaluate Agents with Agents
by: Zhuge, Mingchen, et al.
Published: (2024)
by: Zhuge, Mingchen, et al.
Published: (2024)
PASS: Probabilistic Agentic Supernet Sampling for Interpretable and Adaptive Chest X-Ray Reasoning
by: Feng, Yushi, et al.
Published: (2025)
by: Feng, Yushi, et al.
Published: (2025)
SuperTickets: Drawing Task-Agnostic Lottery Tickets from Supernets via Jointly Architecture Searching and Parameter Pruning
by: You, Haoran, et al.
Published: (2022)
by: You, Haoran, et al.
Published: (2022)
Mixed-precision Supernet Training from Vision Foundation Models using Low Rank Adapter
by: Sakuma, Yuiko, et al.
Published: (2024)
by: Sakuma, Yuiko, et al.
Published: (2024)
DeepFedNAS: Efficient Hardware-Aware Architecture Adaptation for Heterogeneous IoT Federations via Pareto-Guided Supernet Training
by: Khan, Bostan, et al.
Published: (2026)
by: Khan, Bostan, et al.
Published: (2026)
AdaS&S: a One-Shot Supernet Approach for Automatic Embedding Size Search in Deep Recommender System
by: Wei, He, et al.
Published: (2024)
by: Wei, He, et al.
Published: (2024)
Routing-Free Mixture-of-Experts
by: Liu, Yilun, et al.
Published: (2026)
by: Liu, Yilun, et al.
Published: (2026)
Multilingual Routing in Mixture-of-Experts
by: Bandarkar, Lucas, et al.
Published: (2025)
by: Bandarkar, Lucas, et al.
Published: (2025)
MobileLLM-R1: Exploring the Limits of Sub-Billion Language Model Reasoners with Open Training Recipes
by: Zhao, Changsheng, et al.
Published: (2025)
by: Zhao, Changsheng, et al.
Published: (2025)
Exploring Expert Specialization through Unsupervised Training in Sparse Mixture of Experts
by: Nikolic, Strahinja, et al.
Published: (2025)
by: Nikolic, Strahinja, et al.
Published: (2025)
Efficient Universal Perception Encoder
by: Zhu, Chenchen, et al.
Published: (2026)
by: Zhu, Chenchen, et al.
Published: (2026)
Spatio-Semantic Expert Routing Architecture with Mixture-of-Experts for Referring Image Segmentation
by: Dalaq, Alaa, et al.
Published: (2026)
by: Dalaq, Alaa, et al.
Published: (2026)
KRAFT: A Knowledge Graph-Based Framework for Automated Map Conflation
by: Hashemi, Farnoosh, et al.
Published: (2025)
by: Hashemi, Farnoosh, et al.
Published: (2025)
ParetoQ: Improving Scaling Laws in Extremely Low-bit LLM Quantization
by: Liu, Zechun, et al.
Published: (2025)
by: Liu, Zechun, et al.
Published: (2025)
Small Vision-Language Models are Smart Compressors for Long Video Understanding
by: Fei, Junjie, et al.
Published: (2026)
by: Fei, Junjie, et al.
Published: (2026)
EdgeTAM: On-Device Track Anything Model
by: Zhou, Chong, et al.
Published: (2025)
by: Zhou, Chong, et al.
Published: (2025)
Omni-Router: Sharing Routing Decisions in Sparse Mixture-of-Experts for Speech Recognition
by: Gu, Zijin, et al.
Published: (2025)
by: Gu, Zijin, et al.
Published: (2025)
Maximum Score Routing For Mixture-of-Experts
by: Dong, Bowen, et al.
Published: (2025)
by: Dong, Bowen, et al.
Published: (2025)
Cross-Modal Consistency in Multimodal Large Language Models
by: Zhang, Xiang, et al.
Published: (2024)
by: Zhang, Xiang, et al.
Published: (2024)
LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding
by: Shen, Xiaoqian, et al.
Published: (2024)
by: Shen, Xiaoqian, et al.
Published: (2024)
RouteHijack: Routing-Aware Attack on Mixture-of-Experts LLMs
by: Xu, Zhiyuan, et al.
Published: (2026)
by: Xu, Zhiyuan, et al.
Published: (2026)
Geometric Routing Enables Causal Expert Control in Mixture of Experts
by: Ternovtsii, Ivan, et al.
Published: (2026)
by: Ternovtsii, Ivan, et al.
Published: (2026)
Similar Items
-
LLM Performance Predictors are good initializers for Architecture Search
by: Jawahar, Ganesh, et al.
Published: (2023) -
MobileMoE: Scaling On-Device Mixture of Experts
by: Chen, Yanbei, et al.
Published: (2026) -
PathFusion: Path-consistent Lidar-Camera Deep Feature Fusion
by: Wu, Lemeng, et al.
Published: (2022) -
Multi-agent Architecture Search via Agentic Supernet
by: Zhang, Guibin, et al.
Published: (2025) -
Autoregressive + Chain of Thought = Recurrent: Recurrence's Role in Language Models' Computability and a Revisit of Recurrent Transformer
by: Zhang, Xiang, et al.
Published: (2024)