:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhuo, Zhijian, Zeng, Yutao, Wang, Ya, Zhang, Sijun, Yang, Jian, Li, Xiaoqing, Zhou, Xun, Ma, Jinwen
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2503.04598
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models
by: Zhuo, Zhijian, et al.
Published: (2024)

Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models
by: Wang, Ya, et al.
Published: (2025)

Efficient Pretraining Length Scaling
by: Wu, Bohong, et al.
Published: (2025)

SpanNorm: Reconciling Training Stability and Performance in Deep Transformers
by: Wang, Chao, et al.
Published: (2026)

Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling
by: Huang, Hongzhi, et al.
Published: (2025)

Pronunciation-Lexicon Free Training for Phoneme-based Crosslingual ASR via Joint Stochastic Approximation
by: Yusuyin, Saierdaer, et al.
Published: (2025)

Understanding In-Context Learning Beyond Transformers: An Investigation of State Space and Hybrid Architectures
by: Wang, Shenran, et al.
Published: (2025)

From Similarity to Structure: Training-free LLM Context Compression with Hybrid Graph Priors
by: Zhou, Yitian, et al.
Published: (2026)

Frac-Connections: Fractional Extension of Hyper-Connections
by: Zhu, Defa, et al.
Published: (2025)

Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models
by: NVIDIA, et al.
Published: (2025)

TimeViper: A Hybrid Mamba-Transformer Vision-Language Model for Efficient Long Video Understanding
by: Xu, Boshen, et al.
Published: (2025)

Boosting Summarization with Normalizing Flows and Aggressive Training
by: Yang, Yu, et al.
Published: (2023)

Distilling to Hybrid Attention Models via KL-Guided Layer Selection
by: Li, Yanhong, et al.
Published: (2025)

NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model
by: NVIDIA, et al.
Published: (2025)

Hybrid Quantum Transformer for Language Generation
by: Kong, Desheng, et al.
Published: (2025)

HNCSE: Advancing Sentence Embeddings via Hybrid Contrastive Learning with Hard Negatives
by: Liu, Wenxiao, et al.
Published: (2024)

Towards Efficient Post-Training via Fourier-Driven Adapter Architectures
by: Bae, Donggyun, et al.
Published: (2025)

Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning
by: NVIDIA, et al.
Published: (2026)

Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning
by: NVIDIA, et al.
Published: (2025)

Enabling and Analyzing How to Efficiently Extract Information from Hybrid Long Documents with LLMs
by: Yue, Chongjian, et al.
Published: (2023)

HES-SQL: Hybrid Reasoning for Efficient Text-to-SQL with Structural Skeleton Guidance
by: Qiu, Suming, et al.
Published: (2025)

SALMAN: Stability Analysis of Language Models Through the Maps Between Graph-based Manifolds
by: Cheng, Wuxinlin, et al.
Published: (2025)

NormTab: Improving Symbolic Reasoning in LLMs Through Tabular Data Normalization
by: Nahid, Md Mahadi Hasan, et al.
Published: (2024)

Toward Adaptive Large Language Models Structured Pruning via Hybrid-grained Weight Importance Assessment
by: Liu, Jun, et al.
Published: (2024)

Hybrid CNN-Transformer Architecture for Arabic Speech Emotion Recognition
by: Gheffari, Youcef Soufiane, et al.
Published: (2026)

Native Hybrid Attention for Efficient Sequence Modeling
by: Du, Jusen, et al.
Published: (2025)

Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning
by: Ling Team, et al.
Published: (2025)

Affine-Scaled Attention: Towards Flexible and Stable Transformer Attention
by: Bae, Jeongin, et al.
Published: (2026)

Piccolo2: General Text Embedding with Multi-task Hybrid Loss Training
by: Huang, Junqin, et al.
Published: (2024)

Bounded Hyperbolic Tangent: A Stable and Efficient Alternative to Pre-Layer Normalization in Large Language Models
by: Byun, Hoyoon, et al.
Published: (2025)

Hybrid-SQuAD: Hybrid Scholarly Question Answering Dataset
by: Taffa, Tilahun Abedissa, et al.
Published: (2024)

H-Mem: A Novel Memory Mechanism for Evolving and Retrieving Agent Memory via a Hybrid Structure
by: Yu, Jiawei, et al.
Published: (2026)

Heterogeneous Subgraph Transformer for Fake News Detection
by: Zhang, Yuchen, et al.
Published: (2024)

Detecting AI-Generated Sentences in Human-AI Collaborative Hybrid Texts: Challenges, Strategies, and Insights
by: Zeng, Zijie, et al.
Published: (2024)

A$^2$FM: An Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning
by: Chen, Qianben, et al.
Published: (2025)

An Isotropic Approach to Efficient Uncertainty Quantification with Gradient Norms
by: Grünefeld, Nils, et al.
Published: (2026)

Efficient Training for Cross-lingual Speech Language Models
by: Zhou, Yan, et al.
Published: (2026)

DuCCAE: A Hybrid Engine for Immersive Conversation via Collaboration, Augmentation, and Evolution
by: Shen, Xin, et al.
Published: (2026)

MiniCPM-SALA: Hybridizing Sparse and Linear Attention for Efficient Long-Context Modeling
by: MiniCPM Team, et al.
Published: (2026)

Division-of-Thoughts: Harnessing Hybrid Language Model Synergy for Efficient On-Device Agents
by: Shao, Chenyang, et al.
Published: (2025)