Saved in:
| Main Authors: | Yueyu, Lin, Zhiyuan, Li, Yue, Peter, Xiao, Liu |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2501.15570 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
WuNeng: Hybrid State with Attention
by: Xiao, Liu, et al.
Published: (2025)
by: Xiao, Liu, et al.
Published: (2025)
Cross-attention for State-based model RWKV-7
by: Xiao, Liu, et al.
Published: (2025)
by: Xiao, Liu, et al.
Published: (2025)
State Tuning: State-based Test-Time Scaling on RWKV-7
by: Xiao, Liu, et al.
Published: (2025)
by: Xiao, Liu, et al.
Published: (2025)
Millions of States: Designing a Scalable MoE Architecture with RWKV-7 Meta-learner
by: Xiao, Liu, et al.
Published: (2025)
by: Xiao, Liu, et al.
Published: (2025)
Modeling Bilingual Sentence Processing: Evaluating RNN and Transformer Architectures for Cross-Language Structural Priming
by: Zhang, Demi, et al.
Published: (2024)
by: Zhang, Demi, et al.
Published: (2024)
Efficient Language Modeling for Low-Resource Settings with Hybrid RNN-Transformer Architectures
by: Lindenmaier, Gabriel, et al.
Published: (2025)
by: Lindenmaier, Gabriel, et al.
Published: (2025)
Born a Transformer -- Always a Transformer? On the Effect of Pretraining on Architectural Abilities
by: Jobanputra, Mayank, et al.
Published: (2025)
by: Jobanputra, Mayank, et al.
Published: (2025)
A Study of the Plausibility of Attention between RNN Encoders in Natural Language Inference
by: Nguyen, Duc Hau, et al.
Published: (2025)
by: Nguyen, Duc Hau, et al.
Published: (2025)
HealthcareNLP: where are we and what is next?
by: Han, Lifeng, et al.
Published: (2025)
by: Han, Lifeng, et al.
Published: (2025)
RAT: Bridging RNN Efficiency and Attention Accuracy via Chunk-based Sequence Modeling
by: Wei, Xiuying, et al.
Published: (2025)
by: Wei, Xiuying, et al.
Published: (2025)
PretrainZero: Reinforcement Active Pretraining
by: Xing, Xingrun, et al.
Published: (2025)
by: Xing, Xingrun, et al.
Published: (2025)
A Family of Pretrained Transformer Language Models for Russian
by: Zmitrovich, Dmitry, et al.
Published: (2023)
by: Zmitrovich, Dmitry, et al.
Published: (2023)
Finding Challenging Metaphors that Confuse Pretrained Language Models
by: Li, Yucheng, et al.
Published: (2024)
by: Li, Yucheng, et al.
Published: (2024)
The Emergence of Chunking Structures with Hierarchical RNN
by: Wu, Zijun, et al.
Published: (2023)
by: Wu, Zijun, et al.
Published: (2023)
Pretrained Multilingual Transformers Reveal Quantitative Distance Between Human Languages
by: Zhao, Yue, et al.
Published: (2026)
by: Zhao, Yue, et al.
Published: (2026)
Do we still need Human Annotators? Prompting Large Language Models for Aspect Sentiment Quad Prediction
by: Hellwig, Nils Constantin, et al.
Published: (2025)
by: Hellwig, Nils Constantin, et al.
Published: (2025)
StateX: Enhancing RNN Recall via Post-training State Expansion
by: Shen, Xingyu, et al.
Published: (2025)
by: Shen, Xingyu, et al.
Published: (2025)
Craw4LLM: Efficient Web Crawling for LLM Pretraining
by: Yu, Shi, et al.
Published: (2025)
by: Yu, Shi, et al.
Published: (2025)
Learning Less Is More: Premature Upper-Layer Attention Specialization Hurts Language Model Pretraining
by: Zhu, Jinchang, et al.
Published: (2026)
by: Zhu, Jinchang, et al.
Published: (2026)
Retrieval-Pretrained Transformer: Long-range Language Modeling with Self-retrieval
by: Rubin, Ohad, et al.
Published: (2023)
by: Rubin, Ohad, et al.
Published: (2023)
RNN Generalization to Omega-Regular Languages
by: Pert, Charles, et al.
Published: (2025)
by: Pert, Charles, et al.
Published: (2025)
Pooling Attention: Evaluating Pretrained Transformer Embeddings for Deception Classification
by: Mamtani, Sumit, et al.
Published: (2025)
by: Mamtani, Sumit, et al.
Published: (2025)
Cleaner Pretraining Corpus Curation with Neural Web Scraping
by: Xu, Zhipeng, et al.
Published: (2024)
by: Xu, Zhipeng, et al.
Published: (2024)
Learning Transductions and Alignments with RNN Seq2seq Models
by: Wang, Zhengxiang
Published: (2023)
by: Wang, Zhengxiang
Published: (2023)
Improving the Completeness and Comparability of Segment Disclosures: A Large Language Model Approach
by: Liu, Yue, et al.
Published: (2026)
by: Liu, Yue, et al.
Published: (2026)
Generative Pretrained Structured Transformers: Unsupervised Syntactic Language Models at Scale
by: Hu, Xiang, et al.
Published: (2024)
by: Hu, Xiang, et al.
Published: (2024)
Efficient Sparse Attention needs Adaptive Token Release
by: Zhang, Chaoran, et al.
Published: (2024)
by: Zhang, Chaoran, et al.
Published: (2024)
The Fine Line: Navigating Large Language Model Pretraining with Down-streaming Capability Analysis
by: Yang, Chen, et al.
Published: (2024)
by: Yang, Chen, et al.
Published: (2024)
Deciphering the Impact of Pretraining Data on Large Language Models through Machine Unlearning
by: Zhao, Yang, et al.
Published: (2024)
by: Zhao, Yang, et al.
Published: (2024)
Can Pretrained Language Models Derive Correct Semantics from Corrupt Subwords under Noise?
by: Li, Xinzhe, et al.
Published: (2023)
by: Li, Xinzhe, et al.
Published: (2023)
GhostRNN: Reducing State Redundancy in RNN with Cheap Operations
by: Zhou, Hang, et al.
Published: (2024)
by: Zhou, Hang, et al.
Published: (2024)
Bias A-head? Analyzing Bias in Transformer-Based Language Model Attention Heads
by: Yang, Yi, et al.
Published: (2023)
by: Yang, Yi, et al.
Published: (2023)
Multilingual Knowledge Graph Completion from Pretrained Language Models with Knowledge Constraints
by: Song, Ran, et al.
Published: (2024)
by: Song, Ran, et al.
Published: (2024)
LLM-as-RNN: A Recurrent Language Model for Memory Updates and Sequence Prediction
by: Lu, Yuxing, et al.
Published: (2026)
by: Lu, Yuxing, et al.
Published: (2026)
PonderLM: Pretraining Language Models to Ponder in Continuous Space
by: Zeng, Boyi, et al.
Published: (2025)
by: Zeng, Boyi, et al.
Published: (2025)
Rephrasing Electronic Health Records for Pretraining Clinical Language Models
by: Liu, Jinghui, et al.
Published: (2024)
by: Liu, Jinghui, et al.
Published: (2024)
Pretraining Language Models Using Translationese
by: Doshi, Meet, et al.
Published: (2024)
by: Doshi, Meet, et al.
Published: (2024)
Geographic Adaptation of Pretrained Language Models
by: Hofmann, Valentin, et al.
Published: (2022)
by: Hofmann, Valentin, et al.
Published: (2022)
Pretraining Language Models for Diachronic Linguistic Change Discovery
by: Fittschen, Elisabeth, et al.
Published: (2025)
by: Fittschen, Elisabeth, et al.
Published: (2025)
A One-Layer Decoder-Only Transformer is a Two-Layer RNN: With an Application to Certified Robustness
by: Zhang, Yuhao, et al.
Published: (2024)
by: Zhang, Yuhao, et al.
Published: (2024)
Similar Items
-
WuNeng: Hybrid State with Attention
by: Xiao, Liu, et al.
Published: (2025) -
Cross-attention for State-based model RWKV-7
by: Xiao, Liu, et al.
Published: (2025) -
State Tuning: State-based Test-Time Scaling on RWKV-7
by: Xiao, Liu, et al.
Published: (2025) -
Millions of States: Designing a Scalable MoE Architecture with RWKV-7 Meta-learner
by: Xiao, Liu, et al.
Published: (2025) -
Modeling Bilingual Sentence Processing: Evaluating RNN and Transformer Architectures for Cross-Language Structural Priming
by: Zhang, Demi, et al.
Published: (2024)