:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Cheng, Xin, Zeng, Wangding, Dai, Damai, Chen, Qinyu, Wang, Bingxuan, Xie, Zhenda, Huang, Kezhao, Yu, Xingkai, Hao, Zhewen, Li, Yukun, Zhang, Han, Zhang, Huishuai, Zhao, Dongyan, Liang, Wenfeng
Format:	Preprint
Published:	2026
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2601.07372
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
by: Yuan, Jingyang, et al.
Published: (2025)

DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
by: Dai, Damai, et al.
Published: (2024)

MAC-Lookup: Multi-Axis Conditional Lookup Model for Underwater Image Enhancement
by: Yi, Fanghai, et al.
Published: (2025)

Evidence-Enhanced Triplet Generation Framework for Hallucination Alleviation in Generative Question Answering
by: Du, Haowei, et al.
Published: (2024)

mHC: Manifold-Constrained Hyper-Connections
by: Xie, Zhenda, et al.
Published: (2025)

Understanding Multimodal Hallucination with Parameter-Free Representation Alignment
by: Wang, Yueqian, et al.
Published: (2024)

DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
by: DeepSeek-AI, et al.
Published: (2024)

TupleChain: Fast Lookup of OpenFlow Table with Multifaceted Scalability
by: Li, Yanbiao, et al.
Published: (2024)

ReasVQA: Advancing VideoQA with Imperfect Reasoning Process
by: Liang, Jianxin, et al.
Published: (2025)

FREAK: A Fine-grained Hallucination Evaluation Benchmark for Advanced MLLMs
by: Yin, Zhihan, et al.
Published: (2026)

Latent Preference Coding: Aligning Large Language Models via Discrete Latent Codes
by: Gong, Zhuocheng, et al.
Published: (2025)

Efficient Continual Pre-training by Mitigating the Stability Gap
by: Guo, Yiduo, et al.
Published: (2024)

ProactiveVideoQA: A Comprehensive Benchmark Evaluating Proactive Interactions in Video Large Language Models
by: Wang, Yueqian, et al.
Published: (2025)

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
by: Wu, Zhiyu, et al.
Published: (2024)

Multi-Satellite Beam Hopping and Power Allocation Using Deep Reinforcement Learning
by: Xie, Xia, et al.
Published: (2025)

Beyond Isolated Facts: Synthesizing Narrative and Grounded Supervision for VideoQA
by: Liang, Jianxin, et al.
Published: (2025)

VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video Comprehension with Video-Text Duet Interaction Format
by: Wang, Yueqian, et al.
Published: (2024)

xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token
by: Cheng, Xin, et al.
Published: (2024)

Shorten After You're Right: Lazy Length Penalties for Reasoning RL
by: Yuan, Danlong, et al.
Published: (2025)

The Collusion of Memory and Nonlinearity in Stochastic Approximation With Constant Stepsize
by: Huo, Dongyan, et al.
Published: (2024)

Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling
by: Chen, Xiaokang, et al.
Published: (2025)

Exploring Activation Patterns of Parameters in Language Models
by: Wang, Yudong, et al.
Published: (2024)

Language Models Encode the Value of Numbers Linearly
by: Zhu, Fangwei, et al.
Published: (2024)

De-Anonymization at Scale via Tournament-Style Attribution
by: Zhang, Lirui, et al.
Published: (2026)

SWE-MiniSandbox: Container-Free Reinforcement Learning for Building Software Engineering Agents
by: Yuan, Danlong, et al.
Published: (2026)

ReMamba: Equip Mamba with Effective Long-Sequence Modeling
by: Yuan, Danlong, et al.
Published: (2024)

MMDuet2: Enhancing Proactive Interaction of Video MLLMs with Multi-Turn Reinforcement Learning
by: Wang, Yueqian, et al.
Published: (2025)

Two-Step Diffusion: Fast Sampling and Reliable Prediction for 3D Keller--Segel and KPP Equations in Fluid Flows
by: Shen, Zhenda, et al.
Published: (2026)

Invertible Bloom Lookup Tables with Less Memory and Randomness
by: Fleischhacker, Nils, et al.
Published: (2023)

Not All Demonstration Examples are Equally Beneficial: Reweighting Demonstration Examples for In-Context Learning
by: Yang, Zhe, et al.
Published: (2023)

Lego Sketch: A Scalable Memory-augmented Neural Network for Sketching Data Streams
by: Feng, Yuan, et al.
Published: (2025)

SMES: Towards Scalable Multi-Task Recommendation via Expert Sparsity
by: Zhang, Yukun, et al.
Published: (2026)

LOOKAT: Lookup-Optimized Key-Attention for Memory-Efficient Transformers
by: Karmore, Aryan
Published: (2026)

Beyond Dense Connectivity: Explicit Sparsity for Scalable Recommendation
by: Yu, Yantao, et al.
Published: (2026)

Asymptotic Product-form Steady-state for Multiclass Queueing Networks: A Reentrant Line Case Study
by: Dai, Jim, et al.
Published: (2024)

DeltaLLM: A Training-Free Framework Exploiting Temporal Sparsity for Efficient Edge LLM Inference
by: Qi, Jiawen, et al.
Published: (2025)

StyleChat: Learning Recitation-Augmented Memory in LLMs for Stylized Dialogue Generation
by: Li, Jinpeng, et al.
Published: (2024)

Beyond Fixed Benchmarks and Worst-Case Attacks: Dynamic Boundary Evaluation for Language Models
by: Wang, Haoxiang, et al.
Published: (2026)

OLion: Approaching the Hadamard Ideal by Intersecting Spectral and $\ell_{\infty}$ Implicit Biases
by: Wang, Zixiao, et al.
Published: (2026)

GeoBuildBench: A Benchmark for Interactive and Executable Geometry Construction from Natural Language
by: Kim, Jinwoong, et al.
Published: (2026)