Saved in:
| Main Authors: | Omidi, Parsa, Huang, Xingshuai, Laborieux, Axel, Nikpour, Bahareh, Shi, Tianyu, Eshaghi, Armaghan |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.10824 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Hierarchical Chain-of-Thought Prompting: Enhancing LLM Reasoning Performance and Efficiency
by: Huang, Xingshuai, et al.
Published: (2026)
by: Huang, Xingshuai, et al.
Published: (2026)
Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models
by: Wang, Xindi, et al.
Published: (2024)
by: Wang, Xindi, et al.
Published: (2024)
Language-Guided Reinforcement Learning for Hard Attention in Few-Shot Learning
by: Nikpour, Bahareh, et al.
Published: (2023)
by: Nikpour, Bahareh, et al.
Published: (2023)
Improving equilibrium propagation without weight symmetry through Jacobian homeostasis
by: Laborieux, Axel, et al.
Published: (2023)
by: Laborieux, Axel, et al.
Published: (2023)
Theories of synaptic memory consolidation and intelligent plasticity for continual learning
by: Zenke, Friedemann, et al.
Published: (2024)
by: Zenke, Friedemann, et al.
Published: (2024)
Leveraging Distillation Techniques for Document Understanding: A Case Study with FLAN-T5
by: Lamott, Marcel, et al.
Published: (2024)
by: Lamott, Marcel, et al.
Published: (2024)
An Evolved Universal Transformer Memory
by: Cetin, Edoardo, et al.
Published: (2024)
by: Cetin, Edoardo, et al.
Published: (2024)
Review, Remask, Refine (R3): Process-Guided Block Diffusion for Text Generation
by: Mounier, Nikita, et al.
Published: (2025)
by: Mounier, Nikita, et al.
Published: (2025)
Goal-Conditioned Data Augmentation for Offline Reinforcement Learning
by: Huang, Xingshuai, et al.
Published: (2024)
by: Huang, Xingshuai, et al.
Published: (2024)
Enhancing Length Extrapolation in Sequential Models with Pointer-Augmented Neural Memory
by: Le, Hung, et al.
Published: (2024)
by: Le, Hung, et al.
Published: (2024)
Selective Attention: Enhancing Transformer through Principled Context Control
by: Zhang, Xuechen, et al.
Published: (2024)
by: Zhang, Xuechen, et al.
Published: (2024)
DRDT3: Diffusion-Refined Decision Test-Time Training Model
by: Huang, Xingshuai, et al.
Published: (2025)
by: Huang, Xingshuai, et al.
Published: (2025)
LifeAlign: Lifelong Alignment for Large Language Models with Memory-Augmented Focalized Preference Optimization
by: Li, Junsong, et al.
Published: (2025)
by: Li, Junsong, et al.
Published: (2025)
The Impact of Quantization on the Robustness of Transformer-based Text Classifiers
by: Neshaei, Seyed Parsa, et al.
Published: (2024)
by: Neshaei, Seyed Parsa, et al.
Published: (2024)
Design Principle Transfer in Neural Architecture Search via Large Language Models
by: Zhou, Xun, et al.
Published: (2024)
by: Zhou, Xun, et al.
Published: (2024)
Towards Principled Design of Mixture-of-Experts Language Models under Memory and Inference Constraints
by: Liew, Seng Pei, et al.
Published: (2026)
by: Liew, Seng Pei, et al.
Published: (2026)
PanGu-$π$: Enhancing Language Model Architectures via Nonlinearity Compensation
by: Wang, Yunhe, et al.
Published: (2023)
by: Wang, Yunhe, et al.
Published: (2023)
Efficacy of Large Language Models in Systematic Reviews
by: Shah, Aaditya, et al.
Published: (2024)
by: Shah, Aaditya, et al.
Published: (2024)
A Systematic Review of Key Retrieval-Augmented Generation (RAG) Systems: Progress, Gaps, and Future Directions
by: Oche, Agada Joseph, et al.
Published: (2025)
by: Oche, Agada Joseph, et al.
Published: (2025)
Can Memory-Augmented Language Models Generalize on Reasoning-in-a-Haystack Tasks?
by: Das, Payel, et al.
Published: (2025)
by: Das, Payel, et al.
Published: (2025)
On the Power of Convolution Augmented Transformer
by: Li, Mingchen, et al.
Published: (2024)
by: Li, Mingchen, et al.
Published: (2024)
Gated Associative Memory: A Parallel O(N) Architecture for Efficient Sequence Modeling
by: Acharya, Rishiraj
Published: (2025)
by: Acharya, Rishiraj
Published: (2025)
Chain-of-Defensive-Thought: Structured Reasoning Elicits Robustness in Large Language Models against Reference Corruption
by: Wang, Wenxiao, et al.
Published: (2025)
by: Wang, Wenxiao, et al.
Published: (2025)
MemGuard: Preventing Memory Contamination in Long-Term Memory-Augmented Large Language Models
by: Ha, Hyeonjeong, et al.
Published: (2026)
by: Ha, Hyeonjeong, et al.
Published: (2026)
Decoupling Scores and Text: The Politeness Principle in Peer Review
by: Wen, Yingxuan
Published: (2026)
by: Wen, Yingxuan
Published: (2026)
HypoBench: Towards Systematic and Principled Benchmarking for Hypothesis Generation
by: Liu, Haokun, et al.
Published: (2025)
by: Liu, Haokun, et al.
Published: (2025)
Balcony: A Lightweight Approach to Dynamic Inference of Generative Language Models
by: Jamialahmadi, Benyamin, et al.
Published: (2025)
by: Jamialahmadi, Benyamin, et al.
Published: (2025)
Catching rationalization in the act: detecting motivated reasoning before and after CoT via activation probing
by: Mirtaheri, Parsa, et al.
Published: (2026)
by: Mirtaheri, Parsa, et al.
Published: (2026)
DTRNet: Dynamic Token Routing Network to Reduce Quadratic Costs in Transformers
by: Sharma, Aman, et al.
Published: (2025)
by: Sharma, Aman, et al.
Published: (2025)
POET-X: Memory-efficient LLM Training by Scaling Orthogonal Transformation
by: Qiu, Zeju, et al.
Published: (2026)
by: Qiu, Zeju, et al.
Published: (2026)
Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers
by: Huang, Yixiao, et al.
Published: (2025)
by: Huang, Yixiao, et al.
Published: (2025)
Latent Recurrent Transformer: Architecture Exploration, Training Strategies, and Scaling Behavior
by: Huang, Zeyi, et al.
Published: (2026)
by: Huang, Zeyi, et al.
Published: (2026)
Memory-Efficient Looped Transformer: Decoupling Compute from Memory in Looped Language Models
by: Vendrell, Victor Conchello, et al.
Published: (2026)
by: Vendrell, Victor Conchello, et al.
Published: (2026)
Towards Robust Few-Shot Text Classification Using Transformer Architectures and Dual Loss Strategies
by: Han, Xu, et al.
Published: (2025)
by: Han, Xu, et al.
Published: (2025)
Data Augmentations for Improved (Large) Language Model Generalization
by: Feder, Amir, et al.
Published: (2023)
by: Feder, Amir, et al.
Published: (2023)
A Systematic Review of Federated Generative Models
by: Gargary, Ashkan Vedadi, et al.
Published: (2024)
by: Gargary, Ashkan Vedadi, et al.
Published: (2024)
MetaState: Persistent Working Memory Enhances Reasoning in Discrete Diffusion Language Models
by: Xia, Kejing, et al.
Published: (2026)
by: Xia, Kejing, et al.
Published: (2026)
Born a Transformer -- Always a Transformer? On the Effect of Pretraining on Architectural Abilities
by: Jobanputra, Mayank, et al.
Published: (2025)
by: Jobanputra, Mayank, et al.
Published: (2025)
Efficient Systematic Reviews: Literature Filtering with Transformers & Transfer Learning
by: Hawkins, John, et al.
Published: (2024)
by: Hawkins, John, et al.
Published: (2024)
CLIBE: Detecting Dynamic Backdoors in Transformer-based NLP Models
by: Zeng, Rui, et al.
Published: (2024)
by: Zeng, Rui, et al.
Published: (2024)
Similar Items
-
Hierarchical Chain-of-Thought Prompting: Enhancing LLM Reasoning Performance and Efficiency
by: Huang, Xingshuai, et al.
Published: (2026) -
Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models
by: Wang, Xindi, et al.
Published: (2024) -
Language-Guided Reinforcement Learning for Hard Attention in Few-Shot Learning
by: Nikpour, Bahareh, et al.
Published: (2023) -
Improving equilibrium propagation without weight symmetry through Jacobian homeostasis
by: Laborieux, Axel, et al.
Published: (2023) -
Theories of synaptic memory consolidation and intelligent plasticity for continual learning
by: Zenke, Friedemann, et al.
Published: (2024)