Shranjeno v:
| Main Authors: | Li, Hongkang, Wang, Meng, Lu, Songtao, Cui, Xiaodong, Chen, Pin-Yu |
|---|---|
| Format: | Preprint |
| Izdano: |
2024
|
| Teme: | |
| Online dostop: | https://arxiv.org/abs/2402.15607 |
| Oznake: |
Označite
Brez oznak, prvi označite!
|
Podobne knjige/članki
Can Mamba Learn In Context with Outliers? A Theoretical Generalization Analysis
od: Li, Hongkang, et al.
Izdano: (2025)
od: Li, Hongkang, et al.
Izdano: (2025)
Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis
od: Li, Hongkang, et al.
Izdano: (2024)
od: Li, Hongkang, et al.
Izdano: (2024)
When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers
od: Li, Hongkang, et al.
Izdano: (2025)
od: Li, Hongkang, et al.
Izdano: (2025)
Learning on Transformers is Provable Low-Rank and Sparse: A One-layer Analysis
od: Li, Hongkang, et al.
Izdano: (2024)
od: Li, Hongkang, et al.
Izdano: (2024)
What Improves the Generalization of Graph Transformers? A Theoretical Dive into the Self-attention and Positional Encoding
od: Li, Hongkang, et al.
Izdano: (2024)
od: Li, Hongkang, et al.
Izdano: (2024)
How does promoting the minority fraction affect generalization? A theoretical study of the one-hidden-layer neural network on group imbalance
od: Li, Hongkang, et al.
Izdano: (2024)
od: Li, Hongkang, et al.
Izdano: (2024)
A Framework for Quantifying How Pre-Training and Context Benefit In-Context Learning
od: Song, Bingqing, et al.
Izdano: (2025)
od: Song, Bingqing, et al.
Izdano: (2025)
Theoretical Learning Performance of Graph Neural Networks: The Impact of Jumping Connections and Layer-wise Sparsification
od: Sun, Jiawei, et al.
Izdano: (2025)
od: Sun, Jiawei, et al.
Izdano: (2025)
Transformers Learn the Optimal DDPM Denoiser for Multi-Token GMMs
od: Li, Hongkang, et al.
Izdano: (2026)
od: Li, Hongkang, et al.
Izdano: (2026)
SF-DQN: Provable Knowledge Transfer using Successor Feature for Deep Reinforcement Learning
od: Zhang, Shuai, et al.
Izdano: (2024)
od: Zhang, Shuai, et al.
Izdano: (2024)
Understanding the Generalization of In-Context Learning in Transformers: An Empirical Study
od: Zhang, Xingxuan, et al.
Izdano: (2025)
od: Zhang, Xingxuan, et al.
Izdano: (2025)
Provable In-Context Learning of Nonlinear Regression with Transformers
od: Li, Hongbo, et al.
Izdano: (2025)
od: Li, Hongbo, et al.
Izdano: (2025)
Optimality and NP-Hardness of Transformers in Learning Markovian Dynamical Functions
od: Ding, Yanna, et al.
Izdano: (2025)
od: Ding, Yanna, et al.
Izdano: (2025)
Visual prompting reimagined: The power of the Activation Prompts
od: Zhang, Yihua, et al.
Izdano: (2026)
od: Zhang, Yihua, et al.
Izdano: (2026)
A Theoretical Analysis of Mamba's Training Dynamics: Filtering Relevant Features for Generalization in State Space Models
od: Shandirasegaran, Mugunthan, et al.
Izdano: (2026)
od: Shandirasegaran, Mugunthan, et al.
Izdano: (2026)
How do Transformers perform In-Context Autoregressive Learning?
od: Sander, Michael E., et al.
Izdano: (2024)
od: Sander, Michael E., et al.
Izdano: (2024)
Understanding Generalization and Forgetting in In-Context Continual Learning
od: Li, Guangyu, et al.
Izdano: (2026)
od: Li, Guangyu, et al.
Izdano: (2026)
Joint Unsupervised and Supervised Training for Automatic Speech Recognition via Bilevel Optimization
od: Saif, A F M, et al.
Izdano: (2024)
od: Saif, A F M, et al.
Izdano: (2024)
On the Role of Transformer Feed-Forward Layers in Nonlinear In-Context Learning
od: Sun, Haoyuan, et al.
Izdano: (2025)
od: Sun, Haoyuan, et al.
Izdano: (2025)
Understanding In-Context Learning for Nonlinear Regression with Transformers: Attention as Featurizer
od: Hsu, Alexander, et al.
Izdano: (2026)
od: Hsu, Alexander, et al.
Izdano: (2026)
Do pretrained Transformers Learn In-Context by Gradient Descent?
od: Shen, Lingfeng, et al.
Izdano: (2023)
od: Shen, Lingfeng, et al.
Izdano: (2023)
How Transformers Learn In-Context Recall Tasks? Optimality, Training Dynamics and Generalization
od: Nguyen, Quan, et al.
Izdano: (2025)
od: Nguyen, Quan, et al.
Izdano: (2025)
In-Context Compositional Learning via Sparse Coding Transformer
od: Chen, Wei, et al.
Izdano: (2025)
od: Chen, Wei, et al.
Izdano: (2025)
Objective Soups: Multilingual Multi-Task Modeling for Speech Processing
od: Saif, A F M, et al.
Izdano: (2025)
od: Saif, A F M, et al.
Izdano: (2025)
Learning Mutual Excitation for Hand-to-Hand and Human-to-Human Interaction Recognition
od: Liu, Mengyuan, et al.
Izdano: (2024)
od: Liu, Mengyuan, et al.
Izdano: (2024)
Transformers Meet In-Context Learning: A Universal Approximation Theory
od: Li, Gen, et al.
Izdano: (2025)
od: Li, Gen, et al.
Izdano: (2025)
Transformers Learn Nonlinear Features In Context: Nonconvex Mean-field Dynamics on the Attention Landscape
od: Kim, Juno, et al.
Izdano: (2024)
od: Kim, Juno, et al.
Izdano: (2024)
Generative Pre-Trained Transformer for Symbolic Regression Base In-Context Reinforcement Learning
od: Li, Yanjie, et al.
Izdano: (2024)
od: Li, Yanjie, et al.
Izdano: (2024)
Node Identifiers: Compact, Discrete Representations for Efficient Graph Learning
od: Luo, Yuankai, et al.
Izdano: (2024)
od: Luo, Yuankai, et al.
Izdano: (2024)
Computational Safety for Generative AI: A Signal Processing Perspective
od: Chen, Pin-Yu
Izdano: (2025)
od: Chen, Pin-Yu
Izdano: (2025)
Meta-Learning Transformers to Improve In-Context Generalization
od: Braccaioli, Lorenzo, et al.
Izdano: (2025)
od: Braccaioli, Lorenzo, et al.
Izdano: (2025)
How Data Mixing Shapes In-Context Learning: Asymptotic Equivalence for Transformers with MLPs
od: Demir, Samet, et al.
Izdano: (2025)
od: Demir, Samet, et al.
Izdano: (2025)
How Do Transformers Learn Variable Binding in Symbolic Programs?
od: Wu, Yiwei, et al.
Izdano: (2025)
od: Wu, Yiwei, et al.
Izdano: (2025)
Improving Transformers using Faithful Positional Encoding
od: Idé, Tsuyoshi, et al.
Izdano: (2024)
od: Idé, Tsuyoshi, et al.
Izdano: (2024)
In-Context In-Context Learning with Transformer Neural Processes
od: Ashman, Matthew, et al.
Izdano: (2024)
od: Ashman, Matthew, et al.
Izdano: (2024)
In-Context Learning with Representations: Contextual Generalization of Trained Transformers
od: Yang, Tong, et al.
Izdano: (2024)
od: Yang, Tong, et al.
Izdano: (2024)
Benchmarking General-Purpose In-Context Learning
od: Wang, Fan, et al.
Izdano: (2024)
od: Wang, Fan, et al.
Izdano: (2024)
How Do Transformers Learn to Associate Tokens: Gradient Leading Terms Bring Mechanistic Interpretability
od: Im, Shawn, et al.
Izdano: (2026)
od: Im, Shawn, et al.
Izdano: (2026)
How Transformers Utilize Multi-Head Attention in In-Context Learning? A Case Study on Sparse Linear Regression
od: Chen, Xingwu, et al.
Izdano: (2024)
od: Chen, Xingwu, et al.
Izdano: (2024)
General-Purpose In-Context Learning by Meta-Learning Transformers
od: Kirsch, Louis, et al.
Izdano: (2022)
od: Kirsch, Louis, et al.
Izdano: (2022)
Podobne knjige/članki
-
Can Mamba Learn In Context with Outliers? A Theoretical Generalization Analysis
od: Li, Hongkang, et al.
Izdano: (2025) -
Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis
od: Li, Hongkang, et al.
Izdano: (2024) -
When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers
od: Li, Hongkang, et al.
Izdano: (2025) -
Learning on Transformers is Provable Low-Rank and Sparse: A One-layer Analysis
od: Li, Hongkang, et al.
Izdano: (2024) -
What Improves the Generalization of Graph Transformers? A Theoretical Dive into the Self-attention and Positional Encoding
od: Li, Hongkang, et al.
Izdano: (2024)