:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhu, Defa, Huang, Hongzhi, Zhou, Jundong, Huang, Zihao, Zeng, Yutao, Wu, Banggu, Min, Qiyang, Zhou, Xun
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence Computation and Language
Online Access:	https://arxiv.org/abs/2503.14125
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Hyper-Connections
by: Zhu, Defa, et al.
Published: (2024)

Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling
by: Huang, Hongzhi, et al.
Published: (2025)

Ultra-Sparse Memory Network
by: Huang, Zihao, et al.
Published: (2024)

Expert Race: A Flexible Routing Strategy for Scaling Diffusion Transformer with Mixture of Experts
by: Yuan, Yike, et al.
Published: (2025)

UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning
by: Huang, Zihao, et al.
Published: (2025)

mHC: Manifold-Constrained Hyper-Connections
by: Xie, Zhenda, et al.
Published: (2025)

Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models
by: Zhuo, Zhijian, et al.
Published: (2024)

Implicit Reward as the Bridge: A Unified View of SFT and DPO Connections
by: Wang, Bo, et al.
Published: (2025)

HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization
by: Zhuo, Zhijian, et al.
Published: (2025)

Textual Aesthetics in Large Language Models
by: Jiang, Lingjie, et al.
Published: (2024)

Multi-Head Mixture-of-Experts
by: Wu, Xun, et al.
Published: (2024)

HyperDAS: Towards Automating Mechanistic Interpretability with Hypernetworks
by: Sun, Jiuding, et al.
Published: (2025)

Are Transformers Able to Reason by Connecting Separated Knowledge in Training Data?
by: Yin, Yutong, et al.
Published: (2025)

HyperSteer: Activation Steering at Scale with Hypernetworks
by: Sun, Jiuding, et al.
Published: (2025)

Solo Connection: A Parameter Efficient Fine-Tuning Technique for Transformers
by: Pathak, Harsh Nilesh, et al.
Published: (2025)

From Accuracy to Robustness: A Study of Rule- and Model-based Verifiers in Mathematical Reasoning
by: Huang, Yuzhen, et al.
Published: (2025)

MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections
by: Xiao, Da, et al.
Published: (2025)

Scaling DPPs for RAG: Density Meets Diversity
by: Sun, Xun, et al.
Published: (2026)

SkillNet: Create, Evaluate, and Connect AI Skills
by: Liang, Yuan, et al.
Published: (2026)

DiLoCoX: A Low-Communication Large-Scale Training Framework for Decentralized Cluster
by: Qi, Ji, et al.
Published: (2025)

DOCCI: Descriptions of Connected and Contrasting Images
by: Onoe, Yasumasa, et al.
Published: (2024)

Deja vu: Contrastive Historical Modeling with Prefix-tuning for Temporal Knowledge Graph Reasoning
by: Peng, Miao, et al.
Published: (2024)

A Survey of Quantized Graph Representation Learning: Connecting Graph Structures with Large Language Models
by: Lin, Qika, et al.
Published: (2025)

How to Connect Speech Foundation Models and Large Language Models? What Matters and What Does Not
by: Verdini, Francesco, et al.
Published: (2024)

Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data
by: Treutlein, Johannes, et al.
Published: (2024)

Transformers Can Learn Connectivity in Some Graphs but Not Others
by: Roy, Amit, et al.
Published: (2025)

CoSPlay: Cooperative Self-Play at Test-Time with Self-Generated Code and Unit Test
by: Hu, Zhangyi, et al.
Published: (2026)

A Case Study of Selected PTQ Baselines for Reasoning LLMs on Ascend NPU
by: Luo, Yuchen, et al.
Published: (2026)

HyperMLP: An Integrated Perspective for Sequence Modeling
by: Lu, Jiecheng, et al.
Published: (2026)

Efficient Prompt Optimization Through the Lens of Best Arm Identification
by: Shi, Chengshuai, et al.
Published: (2024)

Probing to Refine: Reinforcement Distillation of LLMs via Explanatory Inversion
by: Tan, Zhen, et al.
Published: (2026)

Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning
by: Dong, Guanting, et al.
Published: (2025)

KromHC: Manifold-Constrained Hyper-Connections with Kronecker-Product Residual Matrices
by: Zhou, Wuyang, et al.
Published: (2026)

AgroLLM: Connecting Farmers and Agricultural Practices through Large Language Models for Enhanced Knowledge Transfer and Practical Application
by: Samuel, Dinesh Jackson, et al.
Published: (2025)

Token-Level LLM Collaboration via FusionRoute
by: Xiong, Nuoya, et al.
Published: (2026)

Understanding Machine Unlearning Through the Lens of Mode Connectivity
by: Cheng, Jiali, et al.
Published: (2025)

To Think or Not to Think: Exploring the Unthinking Vulnerability in Large Reasoning Models
by: Zhu, Zihao, et al.
Published: (2025)

An Analysis of Hyper-Parameter Optimization Methods for Retrieval Augmented Generation
by: Orbach, Matan, et al.
Published: (2025)

Sequential Large Language Model-Based Hyper-parameter Optimization
by: Mahammadli, Kanan, et al.
Published: (2024)

EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis
by: Song, Xiaoshuai, et al.
Published: (2026)