Saved in:
| Main Authors: | Diao, Shizhe, Yang, Yu, Fu, Yonggan, Dong, Xin, Su, Dan, Kliegl, Markus, Chen, Zijia, Belcak, Peter, Suhara, Yoshi, Yin, Hongxu, Patwary, Mostofa, Yingyan, Lin, Kautz, Jan, Molchanov, Pavlo |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.13161 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Small Language Models are the Future of Agentic AI
by: Belcak, Peter, et al.
Published: (2025)
by: Belcak, Peter, et al.
Published: (2025)
Hymba: A Hybrid-head Architecture for Small Language Models
by: Dong, Xin, et al.
Published: (2024)
by: Dong, Xin, et al.
Published: (2024)
Nemotron Elastic: Towards Efficient Many-in-One Reasoning LLMs
by: Taghibakhshi, Ali, et al.
Published: (2025)
by: Taghibakhshi, Ali, et al.
Published: (2025)
Nemotron-CC: Transforming Common Crawl into a Refined Long-Horizon Pretraining Dataset
by: Su, Dan, et al.
Published: (2024)
by: Su, Dan, et al.
Published: (2024)
Nemotron-Flash: Towards Latency-Optimal Hybrid Small Language Models
by: Fu, Yonggan, et al.
Published: (2025)
by: Fu, Yonggan, et al.
Published: (2025)
Universal Deep Research: Bring Your Own Model and Strategy
by: Belcak, Peter, et al.
Published: (2025)
by: Belcak, Peter, et al.
Published: (2025)
Minifinetuning: Low-Data Generation Domain Adaptation through Corrective Self-Distillation
by: Belcak, Peter, et al.
Published: (2025)
by: Belcak, Peter, et al.
Published: (2025)
LongMamba: Enhancing Mamba's Long Context Capabilities via Training-Free Receptive Field Enlargement
by: Ye, Zhifan, et al.
Published: (2025)
by: Ye, Zhifan, et al.
Published: (2025)
ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration
by: Su, Hongjin, et al.
Published: (2025)
by: Su, Hongjin, et al.
Published: (2025)
LLM Pruning and Distillation in Practice: The Minitron Approach
by: Sreenivas, Sharath Turuvekere, et al.
Published: (2024)
by: Sreenivas, Sharath Turuvekere, et al.
Published: (2024)
Minitron-SSM: Efficient Hybrid Language Model Compression through Group-Aware SSM Pruning
by: Taghibakhshi, Ali, et al.
Published: (2025)
by: Taghibakhshi, Ali, et al.
Published: (2025)
VILA: On Pre-training for Visual Language Models
by: Lin, Ji, et al.
Published: (2023)
by: Lin, Ji, et al.
Published: (2023)
Compact Language Models via Pruning and Knowledge Distillation
by: Muralidharan, Saurav, et al.
Published: (2024)
by: Muralidharan, Saurav, et al.
Published: (2024)
Efficient-DLM: From Autoregressive to Diffusion Language Models, and Beyond in Speed
by: Fu, Yonggan, et al.
Published: (2025)
by: Fu, Yonggan, et al.
Published: (2025)
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
by: Liu, Shih-Yang, et al.
Published: (2026)
by: Liu, Shih-Yang, et al.
Published: (2026)
Nemotron-CC-Math: A 133 Billion-Token-Scale High Quality Math Pretraining Dataset
by: Mahabadi, Rabeeh Karimi, et al.
Published: (2025)
by: Mahabadi, Rabeeh Karimi, et al.
Published: (2025)
LaCache: Ladder-Shaped KV Caching for Efficient Long-Context Modeling of Large Language Models
by: Shi, Dachuan, et al.
Published: (2025)
by: Shi, Dachuan, et al.
Published: (2025)
Source Identification in Abstractive Summarization
by: Suhara, Yoshi, et al.
Published: (2024)
by: Suhara, Yoshi, et al.
Published: (2024)
Fast-dLLM v2: Efficient Block-Diffusion LLM
by: Wu, Chengyue, et al.
Published: (2025)
by: Wu, Chengyue, et al.
Published: (2025)
LITA: Language Instructed Temporal-Localization Assistant
by: Huang, De-An, et al.
Published: (2024)
by: Huang, De-An, et al.
Published: (2024)
Flextron: Many-in-One Flexible Large Language Model
by: Cai, Ruisi, et al.
Published: (2024)
by: Cai, Ruisi, et al.
Published: (2024)
ProfBench: Multi-Domain Rubrics requiring Professional Knowledge to Answer and Judge
by: Wang, Zhilin, et al.
Published: (2025)
by: Wang, Zhilin, et al.
Published: (2025)
Nemotron-CrossThink: Scaling Self-Learning beyond Math Reasoning
by: Akter, Syeda Nahida, et al.
Published: (2025)
by: Akter, Syeda Nahida, et al.
Published: (2025)
AM-RADIO: Agglomerative Vision Foundation Model -- Reduce All Domains Into One
by: Ranzinger, Mike, et al.
Published: (2023)
by: Ranzinger, Mike, et al.
Published: (2023)
$R^2$-dLLM: Accelerating Diffusion Large Language Models via Spatio-Temporal Redundancy Reduction
by: Du, Zhenbang, et al.
Published: (2026)
by: Du, Zhenbang, et al.
Published: (2026)
FasterViT: Fast Vision Transformers with Hierarchical Attention
by: Hatamizadeh, Ali, et al.
Published: (2023)
by: Hatamizadeh, Ali, et al.
Published: (2023)
Noisy Pairing and Partial Supervision for Stylized Opinion Summarization
by: Iso, Hayate, et al.
Published: (2022)
by: Iso, Hayate, et al.
Published: (2022)
Large Language Models are Inconsistent and Biased Evaluators
by: Stureborg, Rickard, et al.
Published: (2024)
by: Stureborg, Rickard, et al.
Published: (2024)
Scaling Vision Pre-Training to 4K Resolution
by: Shi, Baifeng, et al.
Published: (2025)
by: Shi, Baifeng, et al.
Published: (2025)
DLER: Doing Length pEnalty Right - Incentivizing More Intelligence per Token via Reinforcement Learning
by: Liu, Shih-Yang, et al.
Published: (2025)
by: Liu, Shih-Yang, et al.
Published: (2025)
RADIOv2.5: Improved Baselines for Agglomerative Vision Foundation Models
by: Heinrich, Greg, et al.
Published: (2024)
by: Heinrich, Greg, et al.
Published: (2024)
MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models
by: Fang, Gongfan, et al.
Published: (2024)
by: Fang, Gongfan, et al.
Published: (2024)
BroRL: Scaling Reinforcement Learning via Broadened Exploration
by: Hu, Jian, et al.
Published: (2025)
by: Hu, Jian, et al.
Published: (2025)
TiDAR: Think in Diffusion, Talk in Autoregression
by: Liu, Jingyu, et al.
Published: (2025)
by: Liu, Jingyu, et al.
Published: (2025)
Adaptive Sharpness-Aware Pruning for Robust Sparse Networks
by: Bair, Anna, et al.
Published: (2023)
by: Bair, Anna, et al.
Published: (2023)
When2Call: When (not) to Call Tools
by: Ross, Hayley, et al.
Published: (2025)
by: Ross, Hayley, et al.
Published: (2025)
Step Out and Seek Around: On Warm-Start Training with Incremental Data
by: Shen, Maying, et al.
Published: (2024)
by: Shen, Maying, et al.
Published: (2024)
Nemotron-4 15B Technical Report
by: Parmar, Jupinder, et al.
Published: (2024)
by: Parmar, Jupinder, et al.
Published: (2024)
Agent Explorative Policy Optimization for Multimodal Agentic Reasoning
by: Kang, Minki, et al.
Published: (2026)
by: Kang, Minki, et al.
Published: (2026)
Entropy-Regularized Process Reward Model
by: Zhang, Hanning, et al.
Published: (2024)
by: Zhang, Hanning, et al.
Published: (2024)
Similar Items
-
Small Language Models are the Future of Agentic AI
by: Belcak, Peter, et al.
Published: (2025) -
Hymba: A Hybrid-head Architecture for Small Language Models
by: Dong, Xin, et al.
Published: (2024) -
Nemotron Elastic: Towards Efficient Many-in-One Reasoning LLMs
by: Taghibakhshi, Ali, et al.
Published: (2025) -
Nemotron-CC: Transforming Common Crawl into a Refined Long-Horizon Pretraining Dataset
by: Su, Dan, et al.
Published: (2024) -
Nemotron-Flash: Towards Latency-Optimal Hybrid Small Language Models
by: Fu, Yonggan, et al.
Published: (2025)