:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Gopalakrishnan, Anand, Csordás, Robert, Schmidhuber, Jürgen, Mozer, Michael C.
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence Computation and Language
Online Access:	https://arxiv.org/abs/2509.10534
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing
by: Piękos, Piotr, et al.
Published: (2025)

Recurrent Complex-Weighted Autoencoders for Unsupervised Object Discovery
by: Gopalakrishnan, Anand, et al.
Published: (2024)

MoEUT: Mixture-of-Experts Universal Transformers
by: Csordás, Róbert, et al.
Published: (2024)

SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
by: Csordás, Róbert, et al.
Published: (2023)

MrT5: Dynamic Token Merging for Efficient Byte-level Language Models
by: Kallini, Julie, et al.
Published: (2024)

Thoughtbubbles: an Unsupervised Method for Parallel Thinking in Latent Space
by: Liu, Houjun, et al.
Published: (2025)

Self-Organising Neural Discrete Representation Learning à la Kohonen
by: Irie, Kazuki, et al.
Published: (2023)

Language Agents as Optimizable Graphs
by: Zhuge, Mingchen, et al.
Published: (2024)

What is in a name? Mitigating Name Bias in Text Embeddings via Anonymization
by: Manchanda, Sahil, et al.
Published: (2025)

Fantastic Bugs and Where to Find Them in AI Benchmarks
by: Truong, Sang, et al.
Published: (2025)

Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead
by: Balachandran, Vidhisha, et al.
Published: (2025)

Where Do Reasoning Models Refuse?
by: Yamaguchi, Kureha, et al.
Published: (2025)

Decomposing Elements of Problem Solving: What "Math" Does RL Teach?
by: Qin, Tian, et al.
Published: (2025)

Aligners: Decoupling LLMs and Alignment
by: Ngweta, Lilian, et al.
Published: (2024)

Rotary Positional Embeddings as Phase Modulation: Theoretical Bounds on the RoPE Base for Long-Context Transformers
by: Liu, Feilong
Published: (2026)

CoCA: Fusing Position Embedding with Collinear Constrained Attention in Transformers for Long Context Window Extending
by: Zhu, Shiyi, et al.
Published: (2023)

Where does output diversity collapse in post-training?
by: Karouzos, Constantinos, et al.
Published: (2026)

Improving Discrete Optimisation Via Decoupled Straight-Through Estimator
by: Shah, Rushi, et al.
Published: (2024)

What Should Embeddings Embed? Autoregressive Models Represent Latent Generating Distributions
by: Zhang, Liyi, et al.
Published: (2024)

Metalearning Continual Learning Algorithms
by: Irie, Kazuki, et al.
Published: (2023)

Where Norms and References Collide: Evaluating LLMs on Normative Reasoning
by: Abrams, Mitchell, et al.
Published: (2026)

Do Language Models Use Their Depth Efficiently?
by: Csordás, Róbert, et al.
Published: (2025)

AKReF: An argumentative knowledge representation framework for structured argumentation
by: Bhattacharjee, Debarati, et al.
Published: (2025)

Where Pretraining writes and Alignment reads: the asymmetry of Transformer weight space
by: Ruscio, Valeria, et al.
Published: (2026)

Leviathan: Decoupling Input and Output Representations in Language Models
by: Batley, Reza T., et al.
Published: (2026)

Fantastic Reasoning Behaviors and Where to Find Them: Unsupervised Discovery of the Reasoning Process
by: Zhang, Zhenyu, et al.
Published: (2025)

Where Does Toxicity Live? Mechanistic Localization and Targeted Suppression in Language Models
by: Beniwal, Himanshu, et al.
Published: (2026)

Structured Query Construction via Knowledge Graph Embedding
by: Wang, Ruijie, et al.
Published: (2019)

Position: Uncertainty Quantification Needs Reassessment for Large-language Model Agents
by: Kirchhof, Michael, et al.
Published: (2025)

Where Rollouts Begin: Low-Load, High-Leverage First-Token Diversification for RLVR
by: Kim, Soeun, et al.
Published: (2026)

Finetune Once: Decoupling General & Domain Learning with Dynamic Boosted Annealing
by: Tang, Yang, et al.
Published: (2025)

DPIC: Decoupling Prompt and Intrinsic Characteristics for LLM Generated Text Detection
by: Yu, Xiao, et al.
Published: (2023)

Proxy-RLHF: Decoupling Generation and Alignment in Large Language Model with Proxy
by: Zhu, Yu, et al.
Published: (2024)

Tabular Embeddings for Tables with Bi-Dimensional Hierarchical Metadata and Nesting
by: Shrestha, Gyanendra, et al.
Published: (2025)

Where Did It Go Wrong? Attributing Undesirable LLM Behaviors via Representation Gradient Tracing
by: Li, Zhe, et al.
Published: (2025)

Decoupling Knowledge and Reasoning in Transformers: A Modular Architecture with Generalized Cross-Attention
by: Guo, Zhenyu, et al.
Published: (2025)

Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoE
by: Huang, Haiduo, et al.
Published: (2025)

Decoupling Reasoning and Confidence: Resurrecting Calibration in Reinforcement Learning from Verifiable Rewards
by: Ma, Zhengzhao, et al.
Published: (2026)

ctELM: Decoding and Manipulating Embeddings of Clinical Trials with Embedding Language Models
by: Ondov, Brian, et al.
Published: (2026)

Measuring In-Context Computation Complexity via Hidden State Prediction
by: Herrmann, Vincent, et al.
Published: (2025)