Saved in:
| Main Authors: | Zhu, Defa, Huang, Hongzhi, Zhou, Jundong, Huang, Zihao, Zeng, Yutao, Wu, Banggu, Min, Qiyang, Zhou, Xun |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.14125 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Hyper-Connections
by: Zhu, Defa, et al.
Published: (2024)
by: Zhu, Defa, et al.
Published: (2024)
Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling
by: Huang, Hongzhi, et al.
Published: (2025)
by: Huang, Hongzhi, et al.
Published: (2025)
Ultra-Sparse Memory Network
by: Huang, Zihao, et al.
Published: (2024)
by: Huang, Zihao, et al.
Published: (2024)
Expert Race: A Flexible Routing Strategy for Scaling Diffusion Transformer with Mixture of Experts
by: Yuan, Yike, et al.
Published: (2025)
by: Yuan, Yike, et al.
Published: (2025)
UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning
by: Huang, Zihao, et al.
Published: (2025)
by: Huang, Zihao, et al.
Published: (2025)
mHC: Manifold-Constrained Hyper-Connections
by: Xie, Zhenda, et al.
Published: (2025)
by: Xie, Zhenda, et al.
Published: (2025)
Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models
by: Zhuo, Zhijian, et al.
Published: (2024)
by: Zhuo, Zhijian, et al.
Published: (2024)
Implicit Reward as the Bridge: A Unified View of SFT and DPO Connections
by: Wang, Bo, et al.
Published: (2025)
by: Wang, Bo, et al.
Published: (2025)
HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization
by: Zhuo, Zhijian, et al.
Published: (2025)
by: Zhuo, Zhijian, et al.
Published: (2025)
Textual Aesthetics in Large Language Models
by: Jiang, Lingjie, et al.
Published: (2024)
by: Jiang, Lingjie, et al.
Published: (2024)
Multi-Head Mixture-of-Experts
by: Wu, Xun, et al.
Published: (2024)
by: Wu, Xun, et al.
Published: (2024)
HyperDAS: Towards Automating Mechanistic Interpretability with Hypernetworks
by: Sun, Jiuding, et al.
Published: (2025)
by: Sun, Jiuding, et al.
Published: (2025)
Are Transformers Able to Reason by Connecting Separated Knowledge in Training Data?
by: Yin, Yutong, et al.
Published: (2025)
by: Yin, Yutong, et al.
Published: (2025)
HyperSteer: Activation Steering at Scale with Hypernetworks
by: Sun, Jiuding, et al.
Published: (2025)
by: Sun, Jiuding, et al.
Published: (2025)
Solo Connection: A Parameter Efficient Fine-Tuning Technique for Transformers
by: Pathak, Harsh Nilesh, et al.
Published: (2025)
by: Pathak, Harsh Nilesh, et al.
Published: (2025)
From Accuracy to Robustness: A Study of Rule- and Model-based Verifiers in Mathematical Reasoning
by: Huang, Yuzhen, et al.
Published: (2025)
by: Huang, Yuzhen, et al.
Published: (2025)
MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections
by: Xiao, Da, et al.
Published: (2025)
by: Xiao, Da, et al.
Published: (2025)
Scaling DPPs for RAG: Density Meets Diversity
by: Sun, Xun, et al.
Published: (2026)
by: Sun, Xun, et al.
Published: (2026)
SkillNet: Create, Evaluate, and Connect AI Skills
by: Liang, Yuan, et al.
Published: (2026)
by: Liang, Yuan, et al.
Published: (2026)
DiLoCoX: A Low-Communication Large-Scale Training Framework for Decentralized Cluster
by: Qi, Ji, et al.
Published: (2025)
by: Qi, Ji, et al.
Published: (2025)
DOCCI: Descriptions of Connected and Contrasting Images
by: Onoe, Yasumasa, et al.
Published: (2024)
by: Onoe, Yasumasa, et al.
Published: (2024)
Deja vu: Contrastive Historical Modeling with Prefix-tuning for Temporal Knowledge Graph Reasoning
by: Peng, Miao, et al.
Published: (2024)
by: Peng, Miao, et al.
Published: (2024)
A Survey of Quantized Graph Representation Learning: Connecting Graph Structures with Large Language Models
by: Lin, Qika, et al.
Published: (2025)
by: Lin, Qika, et al.
Published: (2025)
How to Connect Speech Foundation Models and Large Language Models? What Matters and What Does Not
by: Verdini, Francesco, et al.
Published: (2024)
by: Verdini, Francesco, et al.
Published: (2024)
Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data
by: Treutlein, Johannes, et al.
Published: (2024)
by: Treutlein, Johannes, et al.
Published: (2024)
Transformers Can Learn Connectivity in Some Graphs but Not Others
by: Roy, Amit, et al.
Published: (2025)
by: Roy, Amit, et al.
Published: (2025)
CoSPlay: Cooperative Self-Play at Test-Time with Self-Generated Code and Unit Test
by: Hu, Zhangyi, et al.
Published: (2026)
by: Hu, Zhangyi, et al.
Published: (2026)
A Case Study of Selected PTQ Baselines for Reasoning LLMs on Ascend NPU
by: Luo, Yuchen, et al.
Published: (2026)
by: Luo, Yuchen, et al.
Published: (2026)
HyperMLP: An Integrated Perspective for Sequence Modeling
by: Lu, Jiecheng, et al.
Published: (2026)
by: Lu, Jiecheng, et al.
Published: (2026)
Efficient Prompt Optimization Through the Lens of Best Arm Identification
by: Shi, Chengshuai, et al.
Published: (2024)
by: Shi, Chengshuai, et al.
Published: (2024)
Probing to Refine: Reinforcement Distillation of LLMs via Explanatory Inversion
by: Tan, Zhen, et al.
Published: (2026)
by: Tan, Zhen, et al.
Published: (2026)
Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning
by: Dong, Guanting, et al.
Published: (2025)
by: Dong, Guanting, et al.
Published: (2025)
KromHC: Manifold-Constrained Hyper-Connections with Kronecker-Product Residual Matrices
by: Zhou, Wuyang, et al.
Published: (2026)
by: Zhou, Wuyang, et al.
Published: (2026)
AgroLLM: Connecting Farmers and Agricultural Practices through Large Language Models for Enhanced Knowledge Transfer and Practical Application
by: Samuel, Dinesh Jackson, et al.
Published: (2025)
by: Samuel, Dinesh Jackson, et al.
Published: (2025)
Token-Level LLM Collaboration via FusionRoute
by: Xiong, Nuoya, et al.
Published: (2026)
by: Xiong, Nuoya, et al.
Published: (2026)
Understanding Machine Unlearning Through the Lens of Mode Connectivity
by: Cheng, Jiali, et al.
Published: (2025)
by: Cheng, Jiali, et al.
Published: (2025)
To Think or Not to Think: Exploring the Unthinking Vulnerability in Large Reasoning Models
by: Zhu, Zihao, et al.
Published: (2025)
by: Zhu, Zihao, et al.
Published: (2025)
An Analysis of Hyper-Parameter Optimization Methods for Retrieval Augmented Generation
by: Orbach, Matan, et al.
Published: (2025)
by: Orbach, Matan, et al.
Published: (2025)
Sequential Large Language Model-Based Hyper-parameter Optimization
by: Mahammadli, Kanan, et al.
Published: (2024)
by: Mahammadli, Kanan, et al.
Published: (2024)
EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis
by: Song, Xiaoshuai, et al.
Published: (2026)
by: Song, Xiaoshuai, et al.
Published: (2026)
Similar Items
-
Hyper-Connections
by: Zhu, Defa, et al.
Published: (2024) -
Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling
by: Huang, Hongzhi, et al.
Published: (2025) -
Ultra-Sparse Memory Network
by: Huang, Zihao, et al.
Published: (2024) -
Expert Race: A Flexible Routing Strategy for Scaling Diffusion Transformer with Mixture of Experts
by: Yuan, Yike, et al.
Published: (2025) -
UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning
by: Huang, Zihao, et al.
Published: (2025)