Saved in:
| Main Authors: | Cheng, Xin, Zeng, Wangding, Dai, Damai, Chen, Qinyu, Wang, Bingxuan, Xie, Zhenda, Huang, Kezhao, Yu, Xingkai, Hao, Zhewen, Li, Yukun, Zhang, Han, Zhang, Huishuai, Zhao, Dongyan, Liang, Wenfeng |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.07372 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
by: Yuan, Jingyang, et al.
Published: (2025)
by: Yuan, Jingyang, et al.
Published: (2025)
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
by: Dai, Damai, et al.
Published: (2024)
by: Dai, Damai, et al.
Published: (2024)
MAC-Lookup: Multi-Axis Conditional Lookup Model for Underwater Image Enhancement
by: Yi, Fanghai, et al.
Published: (2025)
by: Yi, Fanghai, et al.
Published: (2025)
Evidence-Enhanced Triplet Generation Framework for Hallucination Alleviation in Generative Question Answering
by: Du, Haowei, et al.
Published: (2024)
by: Du, Haowei, et al.
Published: (2024)
mHC: Manifold-Constrained Hyper-Connections
by: Xie, Zhenda, et al.
Published: (2025)
by: Xie, Zhenda, et al.
Published: (2025)
Understanding Multimodal Hallucination with Parameter-Free Representation Alignment
by: Wang, Yueqian, et al.
Published: (2024)
by: Wang, Yueqian, et al.
Published: (2024)
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
by: DeepSeek-AI, et al.
Published: (2024)
by: DeepSeek-AI, et al.
Published: (2024)
TupleChain: Fast Lookup of OpenFlow Table with Multifaceted Scalability
by: Li, Yanbiao, et al.
Published: (2024)
by: Li, Yanbiao, et al.
Published: (2024)
ReasVQA: Advancing VideoQA with Imperfect Reasoning Process
by: Liang, Jianxin, et al.
Published: (2025)
by: Liang, Jianxin, et al.
Published: (2025)
FREAK: A Fine-grained Hallucination Evaluation Benchmark for Advanced MLLMs
by: Yin, Zhihan, et al.
Published: (2026)
by: Yin, Zhihan, et al.
Published: (2026)
Latent Preference Coding: Aligning Large Language Models via Discrete Latent Codes
by: Gong, Zhuocheng, et al.
Published: (2025)
by: Gong, Zhuocheng, et al.
Published: (2025)
Efficient Continual Pre-training by Mitigating the Stability Gap
by: Guo, Yiduo, et al.
Published: (2024)
by: Guo, Yiduo, et al.
Published: (2024)
ProactiveVideoQA: A Comprehensive Benchmark Evaluating Proactive Interactions in Video Large Language Models
by: Wang, Yueqian, et al.
Published: (2025)
by: Wang, Yueqian, et al.
Published: (2025)
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
by: Wu, Zhiyu, et al.
Published: (2024)
by: Wu, Zhiyu, et al.
Published: (2024)
Multi-Satellite Beam Hopping and Power Allocation Using Deep Reinforcement Learning
by: Xie, Xia, et al.
Published: (2025)
by: Xie, Xia, et al.
Published: (2025)
Beyond Isolated Facts: Synthesizing Narrative and Grounded Supervision for VideoQA
by: Liang, Jianxin, et al.
Published: (2025)
by: Liang, Jianxin, et al.
Published: (2025)
VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video Comprehension with Video-Text Duet Interaction Format
by: Wang, Yueqian, et al.
Published: (2024)
by: Wang, Yueqian, et al.
Published: (2024)
xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token
by: Cheng, Xin, et al.
Published: (2024)
by: Cheng, Xin, et al.
Published: (2024)
Shorten After You're Right: Lazy Length Penalties for Reasoning RL
by: Yuan, Danlong, et al.
Published: (2025)
by: Yuan, Danlong, et al.
Published: (2025)
The Collusion of Memory and Nonlinearity in Stochastic Approximation With Constant Stepsize
by: Huo, Dongyan, et al.
Published: (2024)
by: Huo, Dongyan, et al.
Published: (2024)
Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling
by: Chen, Xiaokang, et al.
Published: (2025)
by: Chen, Xiaokang, et al.
Published: (2025)
Exploring Activation Patterns of Parameters in Language Models
by: Wang, Yudong, et al.
Published: (2024)
by: Wang, Yudong, et al.
Published: (2024)
Language Models Encode the Value of Numbers Linearly
by: Zhu, Fangwei, et al.
Published: (2024)
by: Zhu, Fangwei, et al.
Published: (2024)
De-Anonymization at Scale via Tournament-Style Attribution
by: Zhang, Lirui, et al.
Published: (2026)
by: Zhang, Lirui, et al.
Published: (2026)
SWE-MiniSandbox: Container-Free Reinforcement Learning for Building Software Engineering Agents
by: Yuan, Danlong, et al.
Published: (2026)
by: Yuan, Danlong, et al.
Published: (2026)
ReMamba: Equip Mamba with Effective Long-Sequence Modeling
by: Yuan, Danlong, et al.
Published: (2024)
by: Yuan, Danlong, et al.
Published: (2024)
MMDuet2: Enhancing Proactive Interaction of Video MLLMs with Multi-Turn Reinforcement Learning
by: Wang, Yueqian, et al.
Published: (2025)
by: Wang, Yueqian, et al.
Published: (2025)
Two-Step Diffusion: Fast Sampling and Reliable Prediction for 3D Keller--Segel and KPP Equations in Fluid Flows
by: Shen, Zhenda, et al.
Published: (2026)
by: Shen, Zhenda, et al.
Published: (2026)
Invertible Bloom Lookup Tables with Less Memory and Randomness
by: Fleischhacker, Nils, et al.
Published: (2023)
by: Fleischhacker, Nils, et al.
Published: (2023)
Not All Demonstration Examples are Equally Beneficial: Reweighting Demonstration Examples for In-Context Learning
by: Yang, Zhe, et al.
Published: (2023)
by: Yang, Zhe, et al.
Published: (2023)
Lego Sketch: A Scalable Memory-augmented Neural Network for Sketching Data Streams
by: Feng, Yuan, et al.
Published: (2025)
by: Feng, Yuan, et al.
Published: (2025)
SMES: Towards Scalable Multi-Task Recommendation via Expert Sparsity
by: Zhang, Yukun, et al.
Published: (2026)
by: Zhang, Yukun, et al.
Published: (2026)
LOOKAT: Lookup-Optimized Key-Attention for Memory-Efficient Transformers
by: Karmore, Aryan
Published: (2026)
by: Karmore, Aryan
Published: (2026)
Beyond Dense Connectivity: Explicit Sparsity for Scalable Recommendation
by: Yu, Yantao, et al.
Published: (2026)
by: Yu, Yantao, et al.
Published: (2026)
Asymptotic Product-form Steady-state for Multiclass Queueing Networks: A Reentrant Line Case Study
by: Dai, Jim, et al.
Published: (2024)
by: Dai, Jim, et al.
Published: (2024)
DeltaLLM: A Training-Free Framework Exploiting Temporal Sparsity for Efficient Edge LLM Inference
by: Qi, Jiawen, et al.
Published: (2025)
by: Qi, Jiawen, et al.
Published: (2025)
StyleChat: Learning Recitation-Augmented Memory in LLMs for Stylized Dialogue Generation
by: Li, Jinpeng, et al.
Published: (2024)
by: Li, Jinpeng, et al.
Published: (2024)
Beyond Fixed Benchmarks and Worst-Case Attacks: Dynamic Boundary Evaluation for Language Models
by: Wang, Haoxiang, et al.
Published: (2026)
by: Wang, Haoxiang, et al.
Published: (2026)
OLion: Approaching the Hadamard Ideal by Intersecting Spectral and $\ell_{\infty}$ Implicit Biases
by: Wang, Zixiao, et al.
Published: (2026)
by: Wang, Zixiao, et al.
Published: (2026)
GeoBuildBench: A Benchmark for Interactive and Executable Geometry Construction from Natural Language
by: Kim, Jinwoong, et al.
Published: (2026)
by: Kim, Jinwoong, et al.
Published: (2026)
Similar Items
-
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
by: Yuan, Jingyang, et al.
Published: (2025) -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
by: Dai, Damai, et al.
Published: (2024) -
MAC-Lookup: Multi-Axis Conditional Lookup Model for Underwater Image Enhancement
by: Yi, Fanghai, et al.
Published: (2025) -
Evidence-Enhanced Triplet Generation Framework for Hallucination Alleviation in Generative Question Answering
by: Du, Haowei, et al.
Published: (2024) -
mHC: Manifold-Constrained Hyper-Connections
by: Xie, Zhenda, et al.
Published: (2025)