Saved in:
| Main Authors: | Zhao, Mingkuan, Hu, Wentao, Wang, Jiayin, Lai, Xin, Huang, Tianchen, Min, Yuheng, Yan, Rui, Zhu, Xiaoyan |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.09596 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Fast Quiet-STaR: Thinking Without Thought Tokens
by: Huang, Wei, et al.
Published: (2025)
by: Huang, Wei, et al.
Published: (2025)
D-SMART: Enhancing LLM Dialogue Consistency via Dynamic Structured Memory And Reasoning Tree
by: Lei, Xiang, et al.
Published: (2025)
by: Lei, Xiang, et al.
Published: (2025)
Train-Attention: Meta-Learning Where to Focus in Continual Knowledge Learning
by: Seo, Yeongbin, et al.
Published: (2024)
by: Seo, Yeongbin, et al.
Published: (2024)
WeDLM: Reconciling Diffusion Language Models with Standard Causal Attention for Fast Inference
by: Liu, Aiwei, et al.
Published: (2025)
by: Liu, Aiwei, et al.
Published: (2025)
When Does Content-Based Routing Work? Representation Requirements for Selective Attention in Hybrid Sequence Models
by: Basu, Abhinaba
Published: (2026)
by: Basu, Abhinaba
Published: (2026)
From Brazilian Portuguese to European Portuguese
by: Sanches, João, et al.
Published: (2024)
by: Sanches, João, et al.
Published: (2024)
Towards Effective and Efficient Continual Pre-training of Large Language Models
by: Chen, Jie, et al.
Published: (2024)
by: Chen, Jie, et al.
Published: (2024)
Prompt Engineering and the Effectiveness of Large Language Models in Enhancing Human Productivity
by: Anam, Rizal Khoirul
Published: (2025)
by: Anam, Rizal Khoirul
Published: (2025)
Transactional Attention: Semantic Sponsorship for KV-Cache Retention
by: Basu, Abhinaba
Published: (2026)
by: Basu, Abhinaba
Published: (2026)
How Pruning Reshapes Features: Sparse Autoencoder Analysis of Weight-Pruned Language Models
by: Borobia, Hector, et al.
Published: (2026)
by: Borobia, Hector, et al.
Published: (2026)
CLMN: Concept based Language Models via Neural Symbolic Reasoning
by: Yang, Yibo
Published: (2025)
by: Yang, Yibo
Published: (2025)
Fact Grounded Attention: Eliminating Hallucination in Large Language Models Through Attention Level Knowledge Integration
by: Gupta, Aayush
Published: (2025)
by: Gupta, Aayush
Published: (2025)
Softmax Linear Attention: Reclaiming Global Competition
by: Xu, Mingwei, et al.
Published: (2026)
by: Xu, Mingwei, et al.
Published: (2026)
Co-NAML-LSTUR: A Combined Model with Attentive Multi-View Learning and Long- and Short-term User Representations for News Recommendation
by: Nguyen, Minh Hoang, et al.
Published: (2025)
by: Nguyen, Minh Hoang, et al.
Published: (2025)
Mitigating LLM Hallucinations through Domain-Grounded Tiered Retrieval
by: Haque, Md. Asraful, et al.
Published: (2026)
by: Haque, Md. Asraful, et al.
Published: (2026)
Multipole Semantic Attention: A Fast Approximation of Softmax Attention for Pretraining
by: Mitchell, Rupert, et al.
Published: (2025)
by: Mitchell, Rupert, et al.
Published: (2025)
Align-to-Distill: Trainable Attention Alignment for Knowledge Distillation in Neural Machine Translation
by: Jin, Heegon, et al.
Published: (2024)
by: Jin, Heegon, et al.
Published: (2024)
Investigating Neuron Ablation in Attention Heads: The Case for Peak Activation Centering
by: Pochinkov, Nicholas, et al.
Published: (2024)
by: Pochinkov, Nicholas, et al.
Published: (2024)
Enhancing OCR for Sino-Vietnamese Language Processing via Fine-tuned PaddleOCRv5
by: Nguyen, Minh Hoang, et al.
Published: (2025)
by: Nguyen, Minh Hoang, et al.
Published: (2025)
Towards Probabilistic Question Answering Over Tabular Data
by: Shen, Chen, et al.
Published: (2025)
by: Shen, Chen, et al.
Published: (2025)
RHealthTwin: Towards Responsible and Multimodal Digital Twins for Personalized Well-being
by: Ferdousi, Rahatara, et al.
Published: (2025)
by: Ferdousi, Rahatara, et al.
Published: (2025)
Advancing Explainability in Neural Machine Translation: Analytical Metrics for Attention and Alignment Consistency
by: Mishra, Anurag
Published: (2024)
by: Mishra, Anurag
Published: (2024)
$δ$-STEAL: LLM Stealing Attack with Local Differential Privacy
by: Dang, Kieu, et al.
Published: (2025)
by: Dang, Kieu, et al.
Published: (2025)
Multi-Model Synthetic Training for Mission-Critical Small Language Models
by: Platt, Nolan, et al.
Published: (2025)
by: Platt, Nolan, et al.
Published: (2025)
PaperAudit-Bench: Benchmarking Error Detection in Research Papers for Critical Automated Peer Review
by: Tu, Songjun, et al.
Published: (2026)
by: Tu, Songjun, et al.
Published: (2026)
PairCFR: Enhancing Model Training on Paired Counterfactually Augmented Data through Contrastive Learning
by: Qiu, Xiaoqi, et al.
Published: (2024)
by: Qiu, Xiaoqi, et al.
Published: (2024)
Bi-Attention HateXplain : Taking into account the sequential aspect of data during explainability in a multi-task context
by: Mondjo, Ghislain Dorian Tchuente
Published: (2026)
by: Mondjo, Ghislain Dorian Tchuente
Published: (2026)
How much do LLMs learn from negative examples?
by: Hamdan, Shadi, et al.
Published: (2025)
by: Hamdan, Shadi, et al.
Published: (2025)
Communicative Agents for Slideshow Storytelling Video Generation based on LLMs
by: Fan, Jingxing, et al.
Published: (2025)
by: Fan, Jingxing, et al.
Published: (2025)
An Epidemiological Knowledge Graph extracted from the World Health Organization's Disease Outbreak News
by: Consoli, Sergio, et al.
Published: (2025)
by: Consoli, Sergio, et al.
Published: (2025)
MedMemoryBench: Benchmarking Agent Memory in Personalized Healthcare
by: Wang, Yihao, et al.
Published: (2026)
by: Wang, Yihao, et al.
Published: (2026)
From Noise to Diversity: Random Embedding Injection in LLM Reasoning
by: Kim, Heejun, et al.
Published: (2026)
by: Kim, Heejun, et al.
Published: (2026)
NOTAI.AI: Explainable Detection of Machine-Generated Text via Curvature and Feature Attribution
by: Breneur, Oleksandr Marchenko, et al.
Published: (2026)
by: Breneur, Oleksandr Marchenko, et al.
Published: (2026)
Rethinking the Multilingual Reasoning Gap with Layer Swap
by: Lasbordes, Maxence, et al.
Published: (2026)
by: Lasbordes, Maxence, et al.
Published: (2026)
Cache-to-Cache: Direct Semantic Communication Between Large Language Models
by: Fu, Tianyu, et al.
Published: (2025)
by: Fu, Tianyu, et al.
Published: (2025)
An Iterative Optimizing Framework for Radiology Report Summarization with ChatGPT
by: Ma, Chong, et al.
Published: (2023)
by: Ma, Chong, et al.
Published: (2023)
The Knesset Corpus: An Annotated Corpus of Hebrew Parliamentary Proceedings
by: Goldin, Gili, et al.
Published: (2024)
by: Goldin, Gili, et al.
Published: (2024)
Math Natural Language Inference: this should be easy!
by: de Paiva, Valeria, et al.
Published: (2025)
by: de Paiva, Valeria, et al.
Published: (2025)
New Skills or Sharper Primitives? A Probabilistic Perspective on the Emergence of Reasoning in RLVR
by: Wang, Zhilin, et al.
Published: (2026)
by: Wang, Zhilin, et al.
Published: (2026)
Pitfalls in Evaluating Interpretability Agents
by: Haklay, Tal, et al.
Published: (2026)
by: Haklay, Tal, et al.
Published: (2026)
Similar Items
-
Fast Quiet-STaR: Thinking Without Thought Tokens
by: Huang, Wei, et al.
Published: (2025) -
D-SMART: Enhancing LLM Dialogue Consistency via Dynamic Structured Memory And Reasoning Tree
by: Lei, Xiang, et al.
Published: (2025) -
Train-Attention: Meta-Learning Where to Focus in Continual Knowledge Learning
by: Seo, Yeongbin, et al.
Published: (2024) -
WeDLM: Reconciling Diffusion Language Models with Standard Causal Attention for Fast Inference
by: Liu, Aiwei, et al.
Published: (2025) -
When Does Content-Based Routing Work? Representation Requirements for Selective Attention in Hybrid Sequence Models
by: Basu, Abhinaba
Published: (2026)