:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Shao, Jintian, Huang, Hongyi, Wu, Jiayi, Zhang, Beiwen, Wu, ZhiYu, Shan, You, Zheng, MingKai
Format:	Preprint
Veröffentlicht:	2025
Schlagworte:	Machine Learning Computation and Language
Online-Zugang:	https://arxiv.org/abs/2505.10222
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

VQ-Logits: Compressing the Output Bottleneck of Large Language Models via Vector Quantized Logits
von: Shao, Jintian, et al.
Veröffentlicht: (2025)

Towards Analyzing and Understanding the Limitations of VAPO: A Theoretical Perspective
von: Shao, Jintian, et al.
Veröffentlicht: (2025)

EulerFormer: Sequential User Behavior Modeling with Complex Vector Attention
von: Tian, Zhen, et al.
Veröffentlicht: (2024)

DUFOMap: Efficient Dynamic Awareness Mapping
von: Duberg, Daniel, et al.
Veröffentlicht: (2024)

CipherFormer: Efficient Transformer Private Inference with Low Round Complexity
von: Wang, Weize, et al.
Veröffentlicht: (2024)

Interactive Multi-Head Self-Attention with Linear Complexity
von: Kang, Hankyul, et al.
Veröffentlicht: (2024)

RecurFormer: Not All Transformer Heads Need Self-Attention
von: Yan, Ruiqing, et al.
Veröffentlicht: (2024)

Power-Law Decay Loss for Large Language Model Finetuning: A Theory Perspective
von: Shao, Jintian
Veröffentlicht: (2025)

ReCQR: Incorporating conversational query rewriting to improve Multimodal Image Retrieval
von: Hu, Yuan, et al.
Veröffentlicht: (2026)

Multi-party Agent Relation Sampling for Multi-party Ad Hoc Teamwork
von: Zhang, Beiwen, et al.
Veröffentlicht: (2025)

Inter2Former: Dynamic Hybrid Attention for Efficient High-Precision Interactive
von: Huang, You, et al.
Veröffentlicht: (2025)

SpikeVideoFormer: An Efficient Spike-Driven Video Transformer with Hamming Attention and $\mathcal{O}(T)$ Complexity
von: Zou, Shihao, et al.
Veröffentlicht: (2025)

AnchorFormer: Differentiable Anchor Attention for Efficient Vision Transformer
von: Shan, Jiquan, et al.
Veröffentlicht: (2025)

MSDCC ‐Net: A Fine‐Scale Remote Sensing Extraction Method for Bare Surface Land in Rare Earth Mining Areas Based on Multi‐Scale Attention Mechanisms
von: Yingming Cai, et al.
Veröffentlicht: (2026)

From Complex Dynamics to DynFormer: Rethinking Transformers for PDEs
von: Lai, Pengyu, et al.
Veröffentlicht: (2026)

LATTE: Low-Precision Approximate Attention with Head-wise Trainable Threshold for Efficient Transformer
von: Wang, Jiing-Ping, et al.
Veröffentlicht: (2024)

Towards Analyzing and Understanding the Limitations of VAPO: A Theoretical Perspective
von: Shao, Jintian, et al.
Veröffentlicht: (2025)

CoT is Not True Reasoning, It Is Just a Tight Constraint to Imitate: A Theory Perspective
von: Shao, Jintian, et al.
Veröffentlicht: (2025)

ParFormer: A Vision Transformer with Parallel Mixer and Sparse Channel Attention Patch Embedding
von: Setyawan, Novendra, et al.
Veröffentlicht: (2024)

Scalable Complexity Control Facilitates Reasoning Ability of LLMs
von: Hang, Liangkai, et al.
Veröffentlicht: (2025)

PosFormer: Recognizing Complex Handwritten Mathematical Expression with Position Forest Transformer
von: Guan, Tongkun, et al.
Veröffentlicht: (2024)

CHAI: Clustered Head Attention for Efficient LLM Inference
von: Agarwal, Saurabh, et al.
Veröffentlicht: (2024)

MatFormer: Nested Transformer for Elastic Inference
von: Devvrit, et al.
Veröffentlicht: (2023)

AttentionInfluence: Adopting Attention Head Influence for Weak-to-Strong Pretraining Data Selection
von: Hua, Kai, et al.
Veröffentlicht: (2025)

Comet: A Communication-efficient and Performant Approximation for Private Transformer Inference
von: Xu, Xiangrui, et al.
Veröffentlicht: (2024)

Sample Complexity and Representation Ability of Test-time Scaling Paradigms
von: Huang, Baihe, et al.
Veröffentlicht: (2025)

Complexity Equals (Almost) Anything
von: Myers, Robert C., et al.
Veröffentlicht: (2024)

Subsystem Complexity and Measurements in Holography
von: Jian, Shao-Kai, et al.
Veröffentlicht: (2023)

Singular Vectors of Attention Heads Align with Features
von: Franco, Gabriel, et al.
Veröffentlicht: (2026)

Causal Inference with Complex Treatments: A Survey
von: Wang, Yingrong, et al.
Veröffentlicht: (2024)

Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs
von: Ji, Tao, et al.
Veröffentlicht: (2025)

SGFormer: Single-Layer Graph Transformers with Approximation-Free Linear Complexity
von: Wu, Qitian, et al.
Veröffentlicht: (2024)

NoiseFormer -- Noise Diffused Symmetric Attention Transformer
von: Kumar, Phani, et al.
Veröffentlicht: (2026)

RTA-Former: Reverse Transformer Attention for Polyp Segmentation
von: Li, Zhikai, et al.
Veröffentlicht: (2024)

Advances in Gait Alterations and Rehabilitation After Anterior Cruciate Ligament Reconstruction: Biomechanics and Emerging Technologies
von: TszLeung Yu, et al.
Veröffentlicht: (2026)

Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention at Vision Transformer Inference
von: You, Haoran, et al.
Veröffentlicht: (2022)

VecFormer: Towards Efficient and Generalizable Graph Transformer with Graph Token Attention
von: Zhou, Jingbo, et al.
Veröffentlicht: (2026)

Complexity=Anything: Singularity Probes
von: Jørstad, Eivind, et al.
Veröffentlicht: (2023)

Ring Structure in the Complex Plane: A Fingerprint of non-Hermitian Mobility Edge
von: Li, Shan-Zhong, et al.
Veröffentlicht: (2024)

CoFlow: Coordinated Few-Step Flow for Offline Multi-Agent Decision Making
von: Zou, Guowei, et al.
Veröffentlicht: (2026)