Saved in:
| Main Authors: | Gopalakrishnan, Anand, Csordás, Robert, Schmidhuber, Jürgen, Mozer, Michael C. |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.10534 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing
by: Piękos, Piotr, et al.
Published: (2025)
by: Piękos, Piotr, et al.
Published: (2025)
Recurrent Complex-Weighted Autoencoders for Unsupervised Object Discovery
by: Gopalakrishnan, Anand, et al.
Published: (2024)
by: Gopalakrishnan, Anand, et al.
Published: (2024)
MoEUT: Mixture-of-Experts Universal Transformers
by: Csordás, Róbert, et al.
Published: (2024)
by: Csordás, Róbert, et al.
Published: (2024)
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
by: Csordás, Róbert, et al.
Published: (2023)
by: Csordás, Róbert, et al.
Published: (2023)
MrT5: Dynamic Token Merging for Efficient Byte-level Language Models
by: Kallini, Julie, et al.
Published: (2024)
by: Kallini, Julie, et al.
Published: (2024)
Thoughtbubbles: an Unsupervised Method for Parallel Thinking in Latent Space
by: Liu, Houjun, et al.
Published: (2025)
by: Liu, Houjun, et al.
Published: (2025)
Self-Organising Neural Discrete Representation Learning à la Kohonen
by: Irie, Kazuki, et al.
Published: (2023)
by: Irie, Kazuki, et al.
Published: (2023)
Language Agents as Optimizable Graphs
by: Zhuge, Mingchen, et al.
Published: (2024)
by: Zhuge, Mingchen, et al.
Published: (2024)
What is in a name? Mitigating Name Bias in Text Embeddings via Anonymization
by: Manchanda, Sahil, et al.
Published: (2025)
by: Manchanda, Sahil, et al.
Published: (2025)
Fantastic Bugs and Where to Find Them in AI Benchmarks
by: Truong, Sang, et al.
Published: (2025)
by: Truong, Sang, et al.
Published: (2025)
Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead
by: Balachandran, Vidhisha, et al.
Published: (2025)
by: Balachandran, Vidhisha, et al.
Published: (2025)
Where Do Reasoning Models Refuse?
by: Yamaguchi, Kureha, et al.
Published: (2025)
by: Yamaguchi, Kureha, et al.
Published: (2025)
Decomposing Elements of Problem Solving: What "Math" Does RL Teach?
by: Qin, Tian, et al.
Published: (2025)
by: Qin, Tian, et al.
Published: (2025)
Aligners: Decoupling LLMs and Alignment
by: Ngweta, Lilian, et al.
Published: (2024)
by: Ngweta, Lilian, et al.
Published: (2024)
Rotary Positional Embeddings as Phase Modulation: Theoretical Bounds on the RoPE Base for Long-Context Transformers
by: Liu, Feilong
Published: (2026)
by: Liu, Feilong
Published: (2026)
CoCA: Fusing Position Embedding with Collinear Constrained Attention in Transformers for Long Context Window Extending
by: Zhu, Shiyi, et al.
Published: (2023)
by: Zhu, Shiyi, et al.
Published: (2023)
Where does output diversity collapse in post-training?
by: Karouzos, Constantinos, et al.
Published: (2026)
by: Karouzos, Constantinos, et al.
Published: (2026)
Improving Discrete Optimisation Via Decoupled Straight-Through Estimator
by: Shah, Rushi, et al.
Published: (2024)
by: Shah, Rushi, et al.
Published: (2024)
What Should Embeddings Embed? Autoregressive Models Represent Latent Generating Distributions
by: Zhang, Liyi, et al.
Published: (2024)
by: Zhang, Liyi, et al.
Published: (2024)
Metalearning Continual Learning Algorithms
by: Irie, Kazuki, et al.
Published: (2023)
by: Irie, Kazuki, et al.
Published: (2023)
Where Norms and References Collide: Evaluating LLMs on Normative Reasoning
by: Abrams, Mitchell, et al.
Published: (2026)
by: Abrams, Mitchell, et al.
Published: (2026)
Do Language Models Use Their Depth Efficiently?
by: Csordás, Róbert, et al.
Published: (2025)
by: Csordás, Róbert, et al.
Published: (2025)
AKReF: An argumentative knowledge representation framework for structured argumentation
by: Bhattacharjee, Debarati, et al.
Published: (2025)
by: Bhattacharjee, Debarati, et al.
Published: (2025)
Where Pretraining writes and Alignment reads: the asymmetry of Transformer weight space
by: Ruscio, Valeria, et al.
Published: (2026)
by: Ruscio, Valeria, et al.
Published: (2026)
Leviathan: Decoupling Input and Output Representations in Language Models
by: Batley, Reza T., et al.
Published: (2026)
by: Batley, Reza T., et al.
Published: (2026)
Fantastic Reasoning Behaviors and Where to Find Them: Unsupervised Discovery of the Reasoning Process
by: Zhang, Zhenyu, et al.
Published: (2025)
by: Zhang, Zhenyu, et al.
Published: (2025)
Where Does Toxicity Live? Mechanistic Localization and Targeted Suppression in Language Models
by: Beniwal, Himanshu, et al.
Published: (2026)
by: Beniwal, Himanshu, et al.
Published: (2026)
Structured Query Construction via Knowledge Graph Embedding
by: Wang, Ruijie, et al.
Published: (2019)
by: Wang, Ruijie, et al.
Published: (2019)
Position: Uncertainty Quantification Needs Reassessment for Large-language Model Agents
by: Kirchhof, Michael, et al.
Published: (2025)
by: Kirchhof, Michael, et al.
Published: (2025)
Where Rollouts Begin: Low-Load, High-Leverage First-Token Diversification for RLVR
by: Kim, Soeun, et al.
Published: (2026)
by: Kim, Soeun, et al.
Published: (2026)
Finetune Once: Decoupling General & Domain Learning with Dynamic Boosted Annealing
by: Tang, Yang, et al.
Published: (2025)
by: Tang, Yang, et al.
Published: (2025)
DPIC: Decoupling Prompt and Intrinsic Characteristics for LLM Generated Text Detection
by: Yu, Xiao, et al.
Published: (2023)
by: Yu, Xiao, et al.
Published: (2023)
Proxy-RLHF: Decoupling Generation and Alignment in Large Language Model with Proxy
by: Zhu, Yu, et al.
Published: (2024)
by: Zhu, Yu, et al.
Published: (2024)
Tabular Embeddings for Tables with Bi-Dimensional Hierarchical Metadata and Nesting
by: Shrestha, Gyanendra, et al.
Published: (2025)
by: Shrestha, Gyanendra, et al.
Published: (2025)
Where Did It Go Wrong? Attributing Undesirable LLM Behaviors via Representation Gradient Tracing
by: Li, Zhe, et al.
Published: (2025)
by: Li, Zhe, et al.
Published: (2025)
Decoupling Knowledge and Reasoning in Transformers: A Modular Architecture with Generalized Cross-Attention
by: Guo, Zhenyu, et al.
Published: (2025)
by: Guo, Zhenyu, et al.
Published: (2025)
Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoE
by: Huang, Haiduo, et al.
Published: (2025)
by: Huang, Haiduo, et al.
Published: (2025)
Decoupling Reasoning and Confidence: Resurrecting Calibration in Reinforcement Learning from Verifiable Rewards
by: Ma, Zhengzhao, et al.
Published: (2026)
by: Ma, Zhengzhao, et al.
Published: (2026)
ctELM: Decoding and Manipulating Embeddings of Clinical Trials with Embedding Language Models
by: Ondov, Brian, et al.
Published: (2026)
by: Ondov, Brian, et al.
Published: (2026)
Measuring In-Context Computation Complexity via Hidden State Prediction
by: Herrmann, Vincent, et al.
Published: (2025)
by: Herrmann, Vincent, et al.
Published: (2025)
Similar Items
-
Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing
by: Piękos, Piotr, et al.
Published: (2025) -
Recurrent Complex-Weighted Autoencoders for Unsupervised Object Discovery
by: Gopalakrishnan, Anand, et al.
Published: (2024) -
MoEUT: Mixture-of-Experts Universal Transformers
by: Csordás, Róbert, et al.
Published: (2024) -
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
by: Csordás, Róbert, et al.
Published: (2023) -
MrT5: Dynamic Token Merging for Efficient Byte-level Language Models
by: Kallini, Julie, et al.
Published: (2024)