Saved in:
| Main Authors: | Zhuo, Zhijian, Zeng, Yutao, Wang, Ya, Zhang, Sijun, Yang, Jian, Li, Xiaoqing, Zhou, Xun, Ma, Jinwen |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.04598 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models
by: Zhuo, Zhijian, et al.
Published: (2024)
by: Zhuo, Zhijian, et al.
Published: (2024)
Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models
by: Wang, Ya, et al.
Published: (2025)
by: Wang, Ya, et al.
Published: (2025)
Efficient Pretraining Length Scaling
by: Wu, Bohong, et al.
Published: (2025)
by: Wu, Bohong, et al.
Published: (2025)
SpanNorm: Reconciling Training Stability and Performance in Deep Transformers
by: Wang, Chao, et al.
Published: (2026)
by: Wang, Chao, et al.
Published: (2026)
Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling
by: Huang, Hongzhi, et al.
Published: (2025)
by: Huang, Hongzhi, et al.
Published: (2025)
Pronunciation-Lexicon Free Training for Phoneme-based Crosslingual ASR via Joint Stochastic Approximation
by: Yusuyin, Saierdaer, et al.
Published: (2025)
by: Yusuyin, Saierdaer, et al.
Published: (2025)
Understanding In-Context Learning Beyond Transformers: An Investigation of State Space and Hybrid Architectures
by: Wang, Shenran, et al.
Published: (2025)
by: Wang, Shenran, et al.
Published: (2025)
From Similarity to Structure: Training-free LLM Context Compression with Hybrid Graph Priors
by: Zhou, Yitian, et al.
Published: (2026)
by: Zhou, Yitian, et al.
Published: (2026)
Frac-Connections: Fractional Extension of Hyper-Connections
by: Zhu, Defa, et al.
Published: (2025)
by: Zhu, Defa, et al.
Published: (2025)
Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models
by: NVIDIA, et al.
Published: (2025)
by: NVIDIA, et al.
Published: (2025)
TimeViper: A Hybrid Mamba-Transformer Vision-Language Model for Efficient Long Video Understanding
by: Xu, Boshen, et al.
Published: (2025)
by: Xu, Boshen, et al.
Published: (2025)
Boosting Summarization with Normalizing Flows and Aggressive Training
by: Yang, Yu, et al.
Published: (2023)
by: Yang, Yu, et al.
Published: (2023)
Distilling to Hybrid Attention Models via KL-Guided Layer Selection
by: Li, Yanhong, et al.
Published: (2025)
by: Li, Yanhong, et al.
Published: (2025)
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model
by: NVIDIA, et al.
Published: (2025)
by: NVIDIA, et al.
Published: (2025)
Hybrid Quantum Transformer for Language Generation
by: Kong, Desheng, et al.
Published: (2025)
by: Kong, Desheng, et al.
Published: (2025)
HNCSE: Advancing Sentence Embeddings via Hybrid Contrastive Learning with Hard Negatives
by: Liu, Wenxiao, et al.
Published: (2024)
by: Liu, Wenxiao, et al.
Published: (2024)
Towards Efficient Post-Training via Fourier-Driven Adapter Architectures
by: Bae, Donggyun, et al.
Published: (2025)
by: Bae, Donggyun, et al.
Published: (2025)
Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning
by: NVIDIA, et al.
Published: (2026)
by: NVIDIA, et al.
Published: (2026)
Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning
by: NVIDIA, et al.
Published: (2025)
by: NVIDIA, et al.
Published: (2025)
Enabling and Analyzing How to Efficiently Extract Information from Hybrid Long Documents with LLMs
by: Yue, Chongjian, et al.
Published: (2023)
by: Yue, Chongjian, et al.
Published: (2023)
HES-SQL: Hybrid Reasoning for Efficient Text-to-SQL with Structural Skeleton Guidance
by: Qiu, Suming, et al.
Published: (2025)
by: Qiu, Suming, et al.
Published: (2025)
SALMAN: Stability Analysis of Language Models Through the Maps Between Graph-based Manifolds
by: Cheng, Wuxinlin, et al.
Published: (2025)
by: Cheng, Wuxinlin, et al.
Published: (2025)
NormTab: Improving Symbolic Reasoning in LLMs Through Tabular Data Normalization
by: Nahid, Md Mahadi Hasan, et al.
Published: (2024)
by: Nahid, Md Mahadi Hasan, et al.
Published: (2024)
Toward Adaptive Large Language Models Structured Pruning via Hybrid-grained Weight Importance Assessment
by: Liu, Jun, et al.
Published: (2024)
by: Liu, Jun, et al.
Published: (2024)
Hybrid CNN-Transformer Architecture for Arabic Speech Emotion Recognition
by: Gheffari, Youcef Soufiane, et al.
Published: (2026)
by: Gheffari, Youcef Soufiane, et al.
Published: (2026)
Native Hybrid Attention for Efficient Sequence Modeling
by: Du, Jusen, et al.
Published: (2025)
by: Du, Jusen, et al.
Published: (2025)
Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning
by: Ling Team, et al.
Published: (2025)
by: Ling Team, et al.
Published: (2025)
Affine-Scaled Attention: Towards Flexible and Stable Transformer Attention
by: Bae, Jeongin, et al.
Published: (2026)
by: Bae, Jeongin, et al.
Published: (2026)
Piccolo2: General Text Embedding with Multi-task Hybrid Loss Training
by: Huang, Junqin, et al.
Published: (2024)
by: Huang, Junqin, et al.
Published: (2024)
Bounded Hyperbolic Tangent: A Stable and Efficient Alternative to Pre-Layer Normalization in Large Language Models
by: Byun, Hoyoon, et al.
Published: (2025)
by: Byun, Hoyoon, et al.
Published: (2025)
Hybrid-SQuAD: Hybrid Scholarly Question Answering Dataset
by: Taffa, Tilahun Abedissa, et al.
Published: (2024)
by: Taffa, Tilahun Abedissa, et al.
Published: (2024)
H-Mem: A Novel Memory Mechanism for Evolving and Retrieving Agent Memory via a Hybrid Structure
by: Yu, Jiawei, et al.
Published: (2026)
by: Yu, Jiawei, et al.
Published: (2026)
Heterogeneous Subgraph Transformer for Fake News Detection
by: Zhang, Yuchen, et al.
Published: (2024)
by: Zhang, Yuchen, et al.
Published: (2024)
Detecting AI-Generated Sentences in Human-AI Collaborative Hybrid Texts: Challenges, Strategies, and Insights
by: Zeng, Zijie, et al.
Published: (2024)
by: Zeng, Zijie, et al.
Published: (2024)
A$^2$FM: An Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning
by: Chen, Qianben, et al.
Published: (2025)
by: Chen, Qianben, et al.
Published: (2025)
An Isotropic Approach to Efficient Uncertainty Quantification with Gradient Norms
by: Grünefeld, Nils, et al.
Published: (2026)
by: Grünefeld, Nils, et al.
Published: (2026)
Efficient Training for Cross-lingual Speech Language Models
by: Zhou, Yan, et al.
Published: (2026)
by: Zhou, Yan, et al.
Published: (2026)
DuCCAE: A Hybrid Engine for Immersive Conversation via Collaboration, Augmentation, and Evolution
by: Shen, Xin, et al.
Published: (2026)
by: Shen, Xin, et al.
Published: (2026)
MiniCPM-SALA: Hybridizing Sparse and Linear Attention for Efficient Long-Context Modeling
by: MiniCPM Team, et al.
Published: (2026)
by: MiniCPM Team, et al.
Published: (2026)
Division-of-Thoughts: Harnessing Hybrid Language Model Synergy for Efficient On-Device Agents
by: Shao, Chenyang, et al.
Published: (2025)
by: Shao, Chenyang, et al.
Published: (2025)
Similar Items
-
Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models
by: Zhuo, Zhijian, et al.
Published: (2024) -
Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models
by: Wang, Ya, et al.
Published: (2025) -
Efficient Pretraining Length Scaling
by: Wu, Bohong, et al.
Published: (2025) -
SpanNorm: Reconciling Training Stability and Performance in Deep Transformers
by: Wang, Chao, et al.
Published: (2026) -
Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling
by: Huang, Hongzhi, et al.
Published: (2025)