Saved in:
| Main Author: | Jiang, Yuhang |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2606.00926 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Task-driven Layerwise Additive Activation Intervention
by: Nguyen, Hieu Trung, et al.
Published: (2025)
by: Nguyen, Hieu Trung, et al.
Published: (2025)
From Compression to Expression: A Layerwise Analysis of In-Context Learning
by: Jiang, Jiachen, et al.
Published: (2025)
by: Jiang, Jiachen, et al.
Published: (2025)
Explicitly Encoding Structural Symmetry is Key to Length Generalization in Arithmetic Tasks
by: Sabbaghi, Mahdi, et al.
Published: (2024)
by: Sabbaghi, Mahdi, et al.
Published: (2024)
Detection vs. Execution: Single-Bucket Probes Miss Half the Mamba-2 State Sink
by: Jiang, Yuhang
Published: (2026)
by: Jiang, Yuhang
Published: (2026)
Layerwise Recall and the Geometry of Interwoven Knowledge in LLMs
by: Lei, Ge, et al.
Published: (2025)
by: Lei, Ge, et al.
Published: (2025)
Outlier-weighed Layerwise Sampling for LLM Fine-tuning
by: Li, Pengxiang, et al.
Published: (2024)
by: Li, Pengxiang, et al.
Published: (2024)
Layerwise Change of Knowledge in Neural Networks
by: Cheng, Xu, et al.
Published: (2024)
by: Cheng, Xu, et al.
Published: (2024)
Encoding Agent Trajectories as Representations with Sequence Transformers
by: Tsiligkaridis, Athanasios, et al.
Published: (2024)
by: Tsiligkaridis, Athanasios, et al.
Published: (2024)
Multilingual Language Models Encode Script Over Linguistic Structure
by: Verma, Aastha A K, et al.
Published: (2026)
by: Verma, Aastha A K, et al.
Published: (2026)
Adaptive Large Language Models By Layerwise Attention Shortcuts
by: Verma, Prateek, et al.
Published: (2024)
by: Verma, Prateek, et al.
Published: (2024)
R2T: Rule-Encoded Loss Functions for Low-Resource Sequence Tagging
by: Keita, Mamadou K., et al.
Published: (2025)
by: Keita, Mamadou K., et al.
Published: (2025)
LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning
by: Pan, Rui, et al.
Published: (2024)
by: Pan, Rui, et al.
Published: (2024)
The UNDO Flip-Flop: A Controlled Probe for Reversible Semantic State Management in State Space Model
by: Zhou, Hongxu
Published: (2026)
by: Zhou, Hongxu
Published: (2026)
GLiClass: Generalist Lightweight Model for Sequence Classification Tasks
by: Stepanov, Ihor, et al.
Published: (2025)
by: Stepanov, Ihor, et al.
Published: (2025)
Pretrained Generative Language Models as General Learning Frameworks for Sequence-Based Tasks
by: Fauber, Ben
Published: (2024)
by: Fauber, Ben
Published: (2024)
Structured Recurrent Mixers for Massively Parallelized Sequence Generation
by: Badger, Benjamin L.
Published: (2026)
by: Badger, Benjamin L.
Published: (2026)
What Do Language Models Learn in Context? The Structured Task Hypothesis
by: Li, Jiaoda, et al.
Published: (2024)
by: Li, Jiaoda, et al.
Published: (2024)
Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task
by: Li, Kenneth, et al.
Published: (2022)
by: Li, Kenneth, et al.
Published: (2022)
Sequences of Logits Reveal the Low Rank Structure of Language Models
by: Golowich, Noah, et al.
Published: (2025)
by: Golowich, Noah, et al.
Published: (2025)
On the "Induction Bias" in Sequence Models
by: Ebrahimi, M. Reza, et al.
Published: (2026)
by: Ebrahimi, M. Reza, et al.
Published: (2026)
Assessing Episodic Memory in LLMs with Sequence Order Recall Tasks
by: Pink, Mathis, et al.
Published: (2024)
by: Pink, Mathis, et al.
Published: (2024)
Plan, Verify and Fill: A Structured Parallel Decoding Approach for Diffusion Language Models
by: Li, Miao, et al.
Published: (2026)
by: Li, Miao, et al.
Published: (2026)
Struc-EMB: The Potential of Structure-Aware Encoding in Language Embeddings
by: Liu, Shikun, et al.
Published: (2025)
by: Liu, Shikun, et al.
Published: (2025)
Training Large Reasoning Models Efficiently via Progressive Thought Encoding
by: Zhang, Zeliang, et al.
Published: (2026)
by: Zhang, Zeliang, et al.
Published: (2026)
On the Geometry of Positional Encodings in Transformers
by: Cirrincione, Giansalvo
Published: (2026)
by: Cirrincione, Giansalvo
Published: (2026)
Large Language Models Encode Semantics and Alignment in Linearly Separable Representations
by: Saglam, Baturay, et al.
Published: (2025)
by: Saglam, Baturay, et al.
Published: (2025)
SpikingSSMs: Learning Long Sequences with Sparse and Parallel Spiking State Space Models
by: Shen, Shuaijie, et al.
Published: (2024)
by: Shen, Shuaijie, et al.
Published: (2024)
Sparse Autoencoder Decomposition of Clinical Sequence Model Representations: Feature Complexity, Task Specialisation, and Mortality Prediction
by: Sainsbury, Chris, et al.
Published: (2026)
by: Sainsbury, Chris, et al.
Published: (2026)
Language Models over Canonical Byte-Pair Encodings
by: Vieira, Tim, et al.
Published: (2025)
by: Vieira, Tim, et al.
Published: (2025)
Sequence-to-Sequence Spanish Pre-trained Language Models
by: Araujo, Vladimir, et al.
Published: (2023)
by: Araujo, Vladimir, et al.
Published: (2023)
Long-range Modeling and Processing of Multimodal Event Sequences
by: Li, Jichu, et al.
Published: (2026)
by: Li, Jichu, et al.
Published: (2026)
ParaScopes: What do Language Models Activations Encode About Future Text?
by: Pochinkov, Nicky, et al.
Published: (2025)
by: Pochinkov, Nicky, et al.
Published: (2025)
Transforming Chatbot Text: A Sequence-to-Sequence Approach
by: Reddy, Natesh, et al.
Published: (2025)
by: Reddy, Natesh, et al.
Published: (2025)
Temporal Tokenization Strategies for Event Sequence Modeling with Large Language Models
by: Liu, Zefang, et al.
Published: (2025)
by: Liu, Zefang, et al.
Published: (2025)
Mitigating Reversal Curse in Large Language Models via Semantic-aware Permutation Training
by: Guo, Qingyan, et al.
Published: (2024)
by: Guo, Qingyan, et al.
Published: (2024)
Neural Sequence-to-Sequence Modeling with Attention by Leveraging Deep Learning Architectures for Enhanced Contextual Understanding in Abstractive Text Summarization
by: Challagundla, Bhavith Chandra, et al.
Published: (2024)
by: Challagundla, Bhavith Chandra, et al.
Published: (2024)
Learning-Time Encoding Shapes Unlearning in LLMs
by: Wu, Ruihan, et al.
Published: (2025)
by: Wu, Ruihan, et al.
Published: (2025)
The Necessity of Imperfection:Reversing Model Collapse via Simulating Cognitive Boundedness
by: Jiang, Zhongjie
Published: (2025)
by: Jiang, Zhongjie
Published: (2025)
Insertion Language Models: Sequence Generation with Arbitrary-Position Insertions
by: Patel, Dhruvesh, et al.
Published: (2025)
by: Patel, Dhruvesh, et al.
Published: (2025)
Enhanced Structured State Space Models via Grouped FIR Filtering and Attention Sink Mechanisms
by: Meng, Tian, et al.
Published: (2024)
by: Meng, Tian, et al.
Published: (2024)
Similar Items
-
Task-driven Layerwise Additive Activation Intervention
by: Nguyen, Hieu Trung, et al.
Published: (2025) -
From Compression to Expression: A Layerwise Analysis of In-Context Learning
by: Jiang, Jiachen, et al.
Published: (2025) -
Explicitly Encoding Structural Symmetry is Key to Length Generalization in Arithmetic Tasks
by: Sabbaghi, Mahdi, et al.
Published: (2024) -
Detection vs. Execution: Single-Bucket Probes Miss Half the Mamba-2 State Sink
by: Jiang, Yuhang
Published: (2026) -
Layerwise Recall and the Geometry of Interwoven Knowledge in LLMs
by: Lei, Ge, et al.
Published: (2025)