:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Omidi, Parsa, Huang, Xingshuai, Laborieux, Axel, Nikpour, Bahareh, Shi, Tianyu, Eshaghi, Armaghan
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Computation and Language
Online Access:	https://arxiv.org/abs/2508.10824
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Hierarchical Chain-of-Thought Prompting: Enhancing LLM Reasoning Performance and Efficiency
by: Huang, Xingshuai, et al.
Published: (2026)

Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models
by: Wang, Xindi, et al.
Published: (2024)

Language-Guided Reinforcement Learning for Hard Attention in Few-Shot Learning
by: Nikpour, Bahareh, et al.
Published: (2023)

Improving equilibrium propagation without weight symmetry through Jacobian homeostasis
by: Laborieux, Axel, et al.
Published: (2023)

Theories of synaptic memory consolidation and intelligent plasticity for continual learning
by: Zenke, Friedemann, et al.
Published: (2024)

Leveraging Distillation Techniques for Document Understanding: A Case Study with FLAN-T5
by: Lamott, Marcel, et al.
Published: (2024)

An Evolved Universal Transformer Memory
by: Cetin, Edoardo, et al.
Published: (2024)

Review, Remask, Refine (R3): Process-Guided Block Diffusion for Text Generation
by: Mounier, Nikita, et al.
Published: (2025)

Goal-Conditioned Data Augmentation for Offline Reinforcement Learning
by: Huang, Xingshuai, et al.
Published: (2024)

Enhancing Length Extrapolation in Sequential Models with Pointer-Augmented Neural Memory
by: Le, Hung, et al.
Published: (2024)

Selective Attention: Enhancing Transformer through Principled Context Control
by: Zhang, Xuechen, et al.
Published: (2024)

DRDT3: Diffusion-Refined Decision Test-Time Training Model
by: Huang, Xingshuai, et al.
Published: (2025)

LifeAlign: Lifelong Alignment for Large Language Models with Memory-Augmented Focalized Preference Optimization
by: Li, Junsong, et al.
Published: (2025)

The Impact of Quantization on the Robustness of Transformer-based Text Classifiers
by: Neshaei, Seyed Parsa, et al.
Published: (2024)

Design Principle Transfer in Neural Architecture Search via Large Language Models
by: Zhou, Xun, et al.
Published: (2024)

Towards Principled Design of Mixture-of-Experts Language Models under Memory and Inference Constraints
by: Liew, Seng Pei, et al.
Published: (2026)

PanGu-$π$: Enhancing Language Model Architectures via Nonlinearity Compensation
by: Wang, Yunhe, et al.
Published: (2023)

Efficacy of Large Language Models in Systematic Reviews
by: Shah, Aaditya, et al.
Published: (2024)

A Systematic Review of Key Retrieval-Augmented Generation (RAG) Systems: Progress, Gaps, and Future Directions
by: Oche, Agada Joseph, et al.
Published: (2025)

Can Memory-Augmented Language Models Generalize on Reasoning-in-a-Haystack Tasks?
by: Das, Payel, et al.
Published: (2025)

On the Power of Convolution Augmented Transformer
by: Li, Mingchen, et al.
Published: (2024)

Gated Associative Memory: A Parallel O(N) Architecture for Efficient Sequence Modeling
by: Acharya, Rishiraj
Published: (2025)

Chain-of-Defensive-Thought: Structured Reasoning Elicits Robustness in Large Language Models against Reference Corruption
by: Wang, Wenxiao, et al.
Published: (2025)

MemGuard: Preventing Memory Contamination in Long-Term Memory-Augmented Large Language Models
by: Ha, Hyeonjeong, et al.
Published: (2026)

Decoupling Scores and Text: The Politeness Principle in Peer Review
by: Wen, Yingxuan
Published: (2026)

HypoBench: Towards Systematic and Principled Benchmarking for Hypothesis Generation
by: Liu, Haokun, et al.
Published: (2025)

Balcony: A Lightweight Approach to Dynamic Inference of Generative Language Models
by: Jamialahmadi, Benyamin, et al.
Published: (2025)

Catching rationalization in the act: detecting motivated reasoning before and after CoT via activation probing
by: Mirtaheri, Parsa, et al.
Published: (2026)

DTRNet: Dynamic Token Routing Network to Reduce Quadratic Costs in Transformers
by: Sharma, Aman, et al.
Published: (2025)

POET-X: Memory-efficient LLM Training by Scaling Orthogonal Transformation
by: Qiu, Zeju, et al.
Published: (2026)

Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers
by: Huang, Yixiao, et al.
Published: (2025)

Latent Recurrent Transformer: Architecture Exploration, Training Strategies, and Scaling Behavior
by: Huang, Zeyi, et al.
Published: (2026)

Memory-Efficient Looped Transformer: Decoupling Compute from Memory in Looped Language Models
by: Vendrell, Victor Conchello, et al.
Published: (2026)

Towards Robust Few-Shot Text Classification Using Transformer Architectures and Dual Loss Strategies
by: Han, Xu, et al.
Published: (2025)

Data Augmentations for Improved (Large) Language Model Generalization
by: Feder, Amir, et al.
Published: (2023)

A Systematic Review of Federated Generative Models
by: Gargary, Ashkan Vedadi, et al.
Published: (2024)

MetaState: Persistent Working Memory Enhances Reasoning in Discrete Diffusion Language Models
by: Xia, Kejing, et al.
Published: (2026)

Born a Transformer -- Always a Transformer? On the Effect of Pretraining on Architectural Abilities
by: Jobanputra, Mayank, et al.
Published: (2025)

Efficient Systematic Reviews: Literature Filtering with Transformers & Transfer Learning
by: Hawkins, John, et al.
Published: (2024)

CLIBE: Detecting Dynamic Backdoors in Transformer-based NLP Models
by: Zeng, Rui, et al.
Published: (2024)