Saved in:
| Main Authors: | Gao, Xinyu, Wang, Shaonan, Ding, Nai |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.15846 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
How Syntax Specialization Emerges in Language Models
by: Duan, Xufeng, et al.
Published: (2025)
by: Duan, Xufeng, et al.
Published: (2025)
Bitune: Leveraging Bidirectional Attention to Improve Decoder-Only LLMs
by: Kopiczko, Dawid J., et al.
Published: (2024)
by: Kopiczko, Dawid J., et al.
Published: (2024)
How Do Decoder-Only LLMs Perceive Users? Rethinking Attention Masking for User Representation Learning
by: Yuan, Jiahao, et al.
Published: (2026)
by: Yuan, Jiahao, et al.
Published: (2026)
ViCA: Efficient Multimodal LLMs with Vision-Only Cross-Attention
by: Liu, Wenjie, et al.
Published: (2026)
by: Liu, Wenjie, et al.
Published: (2026)
Memorization and Knowledge Injection in Gated LLMs
by: Pan, Xu, et al.
Published: (2025)
by: Pan, Xu, et al.
Published: (2025)
Improving In-context Learning of Multilingual Generative Language Models with Cross-lingual Alignment
by: Li, Chong, et al.
Published: (2023)
by: Li, Chong, et al.
Published: (2023)
Cross-Attention Speculative Decoding
by: Zhong, Wei, et al.
Published: (2025)
by: Zhong, Wei, et al.
Published: (2025)
DOA: Training-Free Decoder-Only Attention Policy for Long-Form Simultaneous Translation with SpeechLLMs
by: Papi, Sara, et al.
Published: (2026)
by: Papi, Sara, et al.
Published: (2026)
Decoder-Only LLMs are Better Controllers for Diffusion Models
by: Dong, Ziyi, et al.
Published: (2025)
by: Dong, Ziyi, et al.
Published: (2025)
Decoding the Multimodal Mind: Generalizable Brain-to-Text Translation via Multimodal Alignment and Adaptive Routing
by: Ye, Chunyu, et al.
Published: (2025)
by: Ye, Chunyu, et al.
Published: (2025)
Encoder-Decoder or Decoder-Only? Revisiting Encoder-Decoder Large Language Model
by: Zhang, Biao, et al.
Published: (2025)
by: Zhang, Biao, et al.
Published: (2025)
Don't Fine-Tune, Decode: Syntax Error-Free Tool Use via Constrained Decoding
by: Zhang, Kexun, et al.
Published: (2023)
by: Zhang, Kexun, et al.
Published: (2023)
You Only Cache Once: Decoder-Decoder Architectures for Language Models
by: Sun, Yutao, et al.
Published: (2024)
by: Sun, Yutao, et al.
Published: (2024)
A Comprehensive Study of Decoder-Only LLMs for Text-to-Image Generation
by: Wang, Andrew Z., et al.
Published: (2025)
by: Wang, Andrew Z., et al.
Published: (2025)
Varying Shades of Wrong: Aligning LLMs with Wrong Answers Only
by: Yao, Jihan, et al.
Published: (2024)
by: Yao, Jihan, et al.
Published: (2024)
Is More Data Worth the Cost? Dataset Scaling Laws in a Tiny Attention-Only Decoder
by: Wiegand, Götz-Henrik, et al.
Published: (2026)
by: Wiegand, Götz-Henrik, et al.
Published: (2026)
On The Adaptation of Unlimiformer for Decoder-Only Transformers
by: Ahrabian, Kian, et al.
Published: (2024)
by: Ahrabian, Kian, et al.
Published: (2024)
From Syntax to Emotion: A Mechanistic Analysis of Emotion Inference in LLMs
by: Shu, Bangzhao, et al.
Published: (2026)
by: Shu, Bangzhao, et al.
Published: (2026)
SASST: Leveraging Syntax-Aware Chunking and LLMs for Simultaneous Speech Translation
by: Yang, Zeyu, et al.
Published: (2025)
by: Yang, Zeyu, et al.
Published: (2025)
Prompt Decorators: A Declarative and Composable Syntax for Reasoning, Formatting, and Control in LLMs
by: Heris, Mostapha Kalami
Published: (2025)
by: Heris, Mostapha Kalami
Published: (2025)
In-context KV-Cache Eviction for LLMs via Attention-Gate
by: Zeng, Zihao, et al.
Published: (2024)
by: Zeng, Zihao, et al.
Published: (2024)
For-Value: Efficient Forward-Only Data Valuation for finetuning LLMs and VLMs
by: Deng, Wenlong, et al.
Published: (2025)
by: Deng, Wenlong, et al.
Published: (2025)
X-Instruction: Aligning Language Model in Low-resource Languages with Self-curated Cross-lingual Instructions
by: Li, Chong, et al.
Published: (2024)
by: Li, Chong, et al.
Published: (2024)
Evaluation of LLMs on Syntax-Aware Code Fill-in-the-Middle Tasks
by: Gong, Linyuan, et al.
Published: (2024)
by: Gong, Linyuan, et al.
Published: (2024)
EG-MLA: Embedding-Gated Multi-head Latent Attention for Scalable and Efficient LLMs
by: Cai, Zhengge, et al.
Published: (2025)
by: Cai, Zhengge, et al.
Published: (2025)
SyntaxShap: Syntax-aware Explainability Method for Text Generation
by: Amara, Kenza, et al.
Published: (2024)
by: Amara, Kenza, et al.
Published: (2024)
Sneaking Syntax into Transformer Language Models with Tree Regularization
by: Nandi, Ananjan, et al.
Published: (2024)
by: Nandi, Ananjan, et al.
Published: (2024)
You Only Need Minimal RLVR Training: Extrapolating LLMs via Rank-1 Trajectories
by: Wei, Zhepei, et al.
Published: (2026)
by: Wei, Zhepei, et al.
Published: (2026)
$\text{R}^2\text{R}$: A Route-to-Rerank Post-Training Framework for Multi-Domain Decoder-Only Rerankers
by: Wang, Xinyu, et al.
Published: (2025)
by: Wang, Xinyu, et al.
Published: (2025)
Stepwise Reasoning Checkpoint Analysis: A Test Time Scaling Method to Enhance LLMs' Reasoning
by: Wang, Zezhong, et al.
Published: (2025)
by: Wang, Zezhong, et al.
Published: (2025)
How Powerful are Decoder-Only Transformer Neural Models?
by: Roberts, Jesse
Published: (2023)
by: Roberts, Jesse
Published: (2023)
Formal Constraints on Dependency Syntax
by: Gómez-Rodríguez, et al.
Published: (2026)
by: Gómez-Rodríguez, et al.
Published: (2026)
Infusing Prompts with Syntax and Semantics
by: Labate, Anton Bulle, et al.
Published: (2024)
by: Labate, Anton Bulle, et al.
Published: (2024)
Morphology and Syntax of the Tamil Language
by: Sarveswaran, Kengatharaiyer
Published: (2024)
by: Sarveswaran, Kengatharaiyer
Published: (2024)
OPT-Tree: Speculative Decoding with Adaptive Draft Tree Structure
by: Wang, Jikai, et al.
Published: (2024)
by: Wang, Jikai, et al.
Published: (2024)
Gender Disambiguation in Machine Translation: Diagnostic Evaluation in Decoder-Only Architectures
by: Manna, Chiara, et al.
Published: (2026)
by: Manna, Chiara, et al.
Published: (2026)
LLaMA based Punctuation Restoration With Forward Pass Only Decoding
by: Pang, Yutong, et al.
Published: (2024)
by: Pang, Yutong, et al.
Published: (2024)
Active Use of Latent Constituency Representation in both Humans and Large Language Models
by: Liu, Wei, et al.
Published: (2024)
by: Liu, Wei, et al.
Published: (2024)
Checkpoint-GCG: Auditing and Attacking Fine-Tuning-Based Prompt Injection Defenses
by: Yang, Xiaoxue, et al.
Published: (2025)
by: Yang, Xiaoxue, et al.
Published: (2025)
Multilingual Pretraining and Instruction Tuning Improve Cross-Lingual Knowledge Alignment, But Only Shallowly
by: Gao, Changjiang, et al.
Published: (2024)
by: Gao, Changjiang, et al.
Published: (2024)
Similar Items
-
How Syntax Specialization Emerges in Language Models
by: Duan, Xufeng, et al.
Published: (2025) -
Bitune: Leveraging Bidirectional Attention to Improve Decoder-Only LLMs
by: Kopiczko, Dawid J., et al.
Published: (2024) -
How Do Decoder-Only LLMs Perceive Users? Rethinking Attention Masking for User Representation Learning
by: Yuan, Jiahao, et al.
Published: (2026) -
ViCA: Efficient Multimodal LLMs with Vision-Only Cross-Attention
by: Liu, Wenjie, et al.
Published: (2026) -
Memorization and Knowledge Injection in Gated LLMs
by: Pan, Xu, et al.
Published: (2025)