Saved in:
| Main Authors: | Chang, Ernie, Paltenghi, Matteo, Li, Yang, Lin, Pin-Jie, Zhao, Changsheng, Huber, Patrick, Liu, Zechun, Rabatin, Rastislav, Shi, Yangyang, Chandra, Vikas |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2410.03083 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Target-Aware Language Modeling via Granular Data Sampling
by: Chang, Ernie, et al.
Published: (2024)
by: Chang, Ernie, et al.
Published: (2024)
Navigating the Minefield of MT Beam Search in Cascaded Streaming Speech Translation
by: Rabatin, Rastislav, et al.
Published: (2024)
by: Rabatin, Rastislav, et al.
Published: (2024)
Self-Vocabularizing Training for Neural Machine Translation
by: Lin, Pin-Jie, et al.
Published: (2025)
by: Lin, Pin-Jie, et al.
Published: (2025)
Basis Selection: Low-Rank Decomposition of Pretrained Large Language Models for Target Applications
by: Li, Yang, et al.
Published: (2024)
by: Li, Yang, et al.
Published: (2024)
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
by: Liu, Zechun, et al.
Published: (2024)
by: Liu, Zechun, et al.
Published: (2024)
AutoMixer: Checkpoint Artifacts as Automatic Data Mixers
by: Chang, Ernie, et al.
Published: (2025)
by: Chang, Ernie, et al.
Published: (2025)
MobileLLM-R1: Exploring the Limits of Sub-Billion Language Model Reasoners with Open Training Recipes
by: Zhao, Changsheng, et al.
Published: (2025)
by: Zhao, Changsheng, et al.
Published: (2025)
MobileMoE: Scaling On-Device Mixture of Experts
by: Chen, Yanbei, et al.
Published: (2026)
by: Chen, Yanbei, et al.
Published: (2026)
RPRA: Predicting an LLM-Judge for Efficient but Performant Inference
by: Ashley, Dylan R., et al.
Published: (2026)
by: Ashley, Dylan R., et al.
Published: (2026)
ParetoQ: Improving Scaling Laws in Extremely Low-bit LLM Quantization
by: Liu, Zechun, et al.
Published: (2025)
by: Liu, Zechun, et al.
Published: (2025)
Breaking Down Power Barriers in On-Device Streaming ASR: Insights and Solutions
by: Li, Yang, et al.
Published: (2024)
by: Li, Yang, et al.
Published: (2024)
SpinQuant: LLM quantization with learned rotations
by: Liu, Zechun, et al.
Published: (2024)
by: Liu, Zechun, et al.
Published: (2024)
Wink: Recovering from Misbehaviors in Coding Agents
by: Nanda, Rahul, et al.
Published: (2026)
by: Nanda, Rahul, et al.
Published: (2026)
Agent-as-a-Judge: Evaluate Agents with Agents
by: Zhuge, Mingchen, et al.
Published: (2024)
by: Zhuge, Mingchen, et al.
Published: (2024)
Non-Monotonic Attention-based Read/Write Policy Learning for Simultaneous Translation
by: Ahmed, Zeeshan, et al.
Published: (2025)
by: Ahmed, Zeeshan, et al.
Published: (2025)
Short Data, Long Context: Distilling Positional Knowledge in Transformers
by: Huber, Patrick, et al.
Published: (2026)
by: Huber, Patrick, et al.
Published: (2026)
CoSMoEs: Compact Sparse Mixture of Experts
by: Huber, Patrick, et al.
Published: (2025)
by: Huber, Patrick, et al.
Published: (2025)
dTRPO: Trajectory Reduction in Policy Optimization of Diffusion Large Language Models
by: Zhang, Wenxuan, et al.
Published: (2026)
by: Zhang, Wenxuan, et al.
Published: (2026)
A Survey on Testing and Analysis of Quantum Software
by: Paltenghi, Matteo, et al.
Published: (2024)
by: Paltenghi, Matteo, et al.
Published: (2024)
QITE: Assembly-Level, Cross-Platform Testing of Quantum Computing Platforms
by: Paltenghi, Matteo, et al.
Published: (2025)
by: Paltenghi, Matteo, et al.
Published: (2025)
Analyzing Quantum Programs with LintQ: A Static Analysis Framework for Qiskit
by: Paltenghi, Matteo, et al.
Published: (2023)
by: Paltenghi, Matteo, et al.
Published: (2023)
Scaling Data-Constrained Language Models
by: Muennighoff, Niklas, et al.
Published: (2023)
by: Muennighoff, Niklas, et al.
Published: (2023)
Folding Attention: Memory and Power Optimization for On-Device Transformer-based Streaming Speech Recognition
by: Li, Yang, et al.
Published: (2023)
by: Li, Yang, et al.
Published: (2023)
EgoAVU: Egocentric Audio-Visual Understanding
by: Seth, Ashish, et al.
Published: (2026)
by: Seth, Ashish, et al.
Published: (2026)
Exploring Audio Hallucination in Egocentric Video Understanding
by: Seth, Ashish, et al.
Published: (2026)
by: Seth, Ashish, et al.
Published: (2026)
REAP: Automatic Curation of Coding Agent Benchmarks from Interactive Production Usage
by: Jha, Smriti, et al.
Published: (2026)
by: Jha, Smriti, et al.
Published: (2026)
Measuring Social Biases in Masked Language Models by Proxy of Prediction Quality
by: Zalkikar, Rahul, et al.
Published: (2024)
by: Zalkikar, Rahul, et al.
Published: (2024)
Neural Computers
by: Zhuge, Mingchen, et al.
Published: (2026)
by: Zhuge, Mingchen, et al.
Published: (2026)
Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts
by: Jawahar, Ganesh, et al.
Published: (2023)
by: Jawahar, Ganesh, et al.
Published: (2023)
Morello: Compiling Fast Neural Networks with Dynamic Programming and Spatial Compression
by: Kaufman, Samuel J., et al.
Published: (2025)
by: Kaufman, Samuel J., et al.
Published: (2025)
Param$Δ$ for Direct Weight Mixing: Post-Train Large Language Model at Zero Cost
by: Cao, Sheng, et al.
Published: (2025)
by: Cao, Sheng, et al.
Published: (2025)
RapidUn: Influence-Driven Parameter Reweighting for Efficient Large Language Model Unlearning
by: Zhao, Guoshenghui, et al.
Published: (2025)
by: Zhao, Guoshenghui, et al.
Published: (2025)
Prescriptive Scaling Laws for Data Constrained Training
by: Lovelace, Justin, et al.
Published: (2026)
by: Lovelace, Justin, et al.
Published: (2026)
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
by: Liu, Dongyang, et al.
Published: (2024)
by: Liu, Dongyang, et al.
Published: (2024)
Small Vision-Language Models are Smart Compressors for Long Video Understanding
by: Fei, Junjie, et al.
Published: (2026)
by: Fei, Junjie, et al.
Published: (2026)
Exploring the Effectiveness and Consistency of Task Selection in Intermediate-Task Transfer Learning
by: Lin, Pin-Jie, et al.
Published: (2024)
by: Lin, Pin-Jie, et al.
Published: (2024)
CoDa: Constrained Generation based Data Augmentation for Low-Resource NLP
by: Evuru, Chandra Kiran Reddy, et al.
Published: (2024)
by: Evuru, Chandra Kiran Reddy, et al.
Published: (2024)
DepthLM: Metric Depth From Vision Language Models
by: Cai, Zhipeng, et al.
Published: (2025)
by: Cai, Zhipeng, et al.
Published: (2025)
Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data
by: Gu, Shuhao, et al.
Published: (2024)
by: Gu, Shuhao, et al.
Published: (2024)
Quality or Quantity? On Data Scale and Diversity in Adapting Large Language Models for Low-Resource Translation
by: Iyer, Vivek, et al.
Published: (2024)
by: Iyer, Vivek, et al.
Published: (2024)
Similar Items
-
Target-Aware Language Modeling via Granular Data Sampling
by: Chang, Ernie, et al.
Published: (2024) -
Navigating the Minefield of MT Beam Search in Cascaded Streaming Speech Translation
by: Rabatin, Rastislav, et al.
Published: (2024) -
Self-Vocabularizing Training for Neural Machine Translation
by: Lin, Pin-Jie, et al.
Published: (2025) -
Basis Selection: Low-Rank Decomposition of Pretrained Large Language Models for Target Applications
by: Li, Yang, et al.
Published: (2024) -
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
by: Liu, Zechun, et al.
Published: (2024)