:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Jawahar, Ganesh, Yang, Haichuan, Xiong, Yunyang, Liu, Zechun, Wang, Dilin, Sun, Fei, Li, Meng, Pappu, Aasish, Oguz, Barlas, Abdul-Mageed, Muhammad, Lakshmanan, Laks V. S., Krishnamoorthi, Raghuraman, Chandra, Vikas
Format:	Preprint
Published:	2023
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2306.04845
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

LLM Performance Predictors are good initializers for Architecture Search
by: Jawahar, Ganesh, et al.
Published: (2023)

MobileMoE: Scaling On-Device Mixture of Experts
by: Chen, Yanbei, et al.
Published: (2026)

PathFusion: Path-consistent Lidar-Camera Deep Feature Fusion
by: Wu, Lemeng, et al.
Published: (2022)

Multi-agent Architecture Search via Agentic Supernet
by: Zhang, Guibin, et al.
Published: (2025)

Autoregressive + Chain of Thought = Recurrent: Recurrence's Role in Language Models' Computability and a Revisit of Recurrent Transformer
by: Zhang, Xiang, et al.
Published: (2024)

Post-training an LLM for RAG? Train on Self-Generated Demonstrations
by: Finlayson, Matthew, et al.
Published: (2025)

Subnet-Aware Dynamic Supernet Training for Neural Architecture Search
by: Jeon, Jeimin, et al.
Published: (2025)

On Supernet Transfer Learning for Effective Task Adaptation
by: Singh, Prabhant, et al.
Published: (2024)

DetoxLLM: A Framework for Detoxification with Explanations
by: Khondaker, Md Tawkat Islam, et al.
Published: (2024)

Fast Data Aware Neural Architecture Search via Supernet Accelerated Evaluation
by: Njor, Emil, et al.
Published: (2025)

Progressive Supernet Training for Efficient Visual Autoregressive Modeling
by: Chen, Xiaoyue, et al.
Published: (2025)

MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
by: Liu, Zechun, et al.
Published: (2024)

Efficient Supernet Training with Orthogonal Softmax for Scalable ASR Model Compression
by: Xu, Jingjing, et al.
Published: (2025)

DepthShrinker: A New Compression Paradigm Towards Boosting Real-Hardware Efficiency of Compact Neural Networks
by: Fu, Yonggan, et al.
Published: (2022)

SpinQuant: LLM quantization with learned rotations
by: Liu, Zechun, et al.
Published: (2024)

MedNNS: Supernet-based Medical Task-Adaptive Neural Network Search
by: Mecharbat, Lotfi Abdelkrim, et al.
Published: (2025)

SqueezeSAM: User friendly mobile interactive segmentation
by: Varadarajan, Balakrishnan, et al.
Published: (2023)

Efficient Track Anything
by: Xiong, Yunyang, et al.
Published: (2024)

Agent-as-a-Judge: Evaluate Agents with Agents
by: Zhuge, Mingchen, et al.
Published: (2024)

PASS: Probabilistic Agentic Supernet Sampling for Interpretable and Adaptive Chest X-Ray Reasoning
by: Feng, Yushi, et al.
Published: (2025)

SuperTickets: Drawing Task-Agnostic Lottery Tickets from Supernets via Jointly Architecture Searching and Parameter Pruning
by: You, Haoran, et al.
Published: (2022)

Mixed-precision Supernet Training from Vision Foundation Models using Low Rank Adapter
by: Sakuma, Yuiko, et al.
Published: (2024)

DeepFedNAS: Efficient Hardware-Aware Architecture Adaptation for Heterogeneous IoT Federations via Pareto-Guided Supernet Training
by: Khan, Bostan, et al.
Published: (2026)

AdaS&S: a One-Shot Supernet Approach for Automatic Embedding Size Search in Deep Recommender System
by: Wei, He, et al.
Published: (2024)

Routing-Free Mixture-of-Experts
by: Liu, Yilun, et al.
Published: (2026)

Multilingual Routing in Mixture-of-Experts
by: Bandarkar, Lucas, et al.
Published: (2025)

MobileLLM-R1: Exploring the Limits of Sub-Billion Language Model Reasoners with Open Training Recipes
by: Zhao, Changsheng, et al.
Published: (2025)

Exploring Expert Specialization through Unsupervised Training in Sparse Mixture of Experts
by: Nikolic, Strahinja, et al.
Published: (2025)

Efficient Universal Perception Encoder
by: Zhu, Chenchen, et al.
Published: (2026)

Spatio-Semantic Expert Routing Architecture with Mixture-of-Experts for Referring Image Segmentation
by: Dalaq, Alaa, et al.
Published: (2026)

KRAFT: A Knowledge Graph-Based Framework for Automated Map Conflation
by: Hashemi, Farnoosh, et al.
Published: (2025)

ParetoQ: Improving Scaling Laws in Extremely Low-bit LLM Quantization
by: Liu, Zechun, et al.
Published: (2025)

Small Vision-Language Models are Smart Compressors for Long Video Understanding
by: Fei, Junjie, et al.
Published: (2026)

EdgeTAM: On-Device Track Anything Model
by: Zhou, Chong, et al.
Published: (2025)

Omni-Router: Sharing Routing Decisions in Sparse Mixture-of-Experts for Speech Recognition
by: Gu, Zijin, et al.
Published: (2025)

Maximum Score Routing For Mixture-of-Experts
by: Dong, Bowen, et al.
Published: (2025)

Cross-Modal Consistency in Multimodal Large Language Models
by: Zhang, Xiang, et al.
Published: (2024)

LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding
by: Shen, Xiaoqian, et al.
Published: (2024)

RouteHijack: Routing-Aware Attack on Mixture-of-Experts LLMs
by: Xu, Zhiyuan, et al.
Published: (2026)

Geometric Routing Enables Causal Expert Control in Mixture of Experts
by: Ternovtsii, Ivan, et al.
Published: (2026)