Guardado en:
| Autores principales: | Alokhina, Anastasiia, Li, Pan |
|---|---|
| Formato: | Preprint |
| Publicado: |
2025
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2512.12805 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
Sharper Generalization Bounds for Transformer
por: Li, Yawen, et al.
Publicado: (2026)
por: Li, Yawen, et al.
Publicado: (2026)
Effective Sample Size and Generalization Bounds for Temporal Networks
por: Gahtan, Barak, et al.
Publicado: (2025)
por: Gahtan, Barak, et al.
Publicado: (2025)
From Generic Correlation to Input-Specific Credit in On-Policy Self Distillation
por: Shen, Guobin, et al.
Publicado: (2026)
por: Shen, Guobin, et al.
Publicado: (2026)
VariViT: A Vision Transformer for Variable Image Sizes
por: Varma, Aswathi, et al.
Publicado: (2026)
por: Varma, Aswathi, et al.
Publicado: (2026)
Two Heads Are Better than One: Simulating Large Transformers with Small Ones
por: Yu, Hantao, et al.
Publicado: (2025)
por: Yu, Hantao, et al.
Publicado: (2025)
Towards Attributions of Input Variables in a Coalition
por: Zheng, Xinhao, et al.
Publicado: (2023)
por: Zheng, Xinhao, et al.
Publicado: (2023)
STAR-VAE: Latent Variable Transformers for Scalable and Controllable Molecular Generation
por: Kwon, Bum Chul, et al.
Publicado: (2025)
por: Kwon, Bum Chul, et al.
Publicado: (2025)
Provably Overwhelming Transformer Models with Designed Inputs
por: Stambler, Lev, et al.
Publicado: (2025)
por: Stambler, Lev, et al.
Publicado: (2025)
ALERT Open Dataset and Input-Size-Agnostic Vision Transformer for Driver Activity Recognition using IR-UWB
por: Park, Jeongjun, et al.
Publicado: (2025)
por: Park, Jeongjun, et al.
Publicado: (2025)
GraphMaker: Can Diffusion Models Generate Large Attributed Graphs?
por: Li, Mufei, et al.
Publicado: (2023)
por: Li, Mufei, et al.
Publicado: (2023)
MMET: A Multi-Input and Multi-Scale Transformer for Efficient PDEs Solving
por: Luo, Yichen, et al.
Publicado: (2025)
por: Luo, Yichen, et al.
Publicado: (2025)
Learning with Noisy Labels by Adaptive Gradient-Based Outlier Removal
por: Sedova, Anastasiia, et al.
Publicado: (2023)
por: Sedova, Anastasiia, et al.
Publicado: (2023)
Understanding the Training and Generalization of Pretrained Transformer for Sequential Decision Making
por: Wang, Hanzhao, et al.
Publicado: (2024)
por: Wang, Hanzhao, et al.
Publicado: (2024)
It Ain't That Bad: Understanding the Mysterious Performance Drop in OOD Generalization for Generative Transformer Models
por: Xu, Xingcheng, et al.
Publicado: (2023)
por: Xu, Xingcheng, et al.
Publicado: (2023)
From Large to Small: Transferring CUDA Optimization Expertise via Reasoning Graph
por: Gong, Junfeng, et al.
Publicado: (2025)
por: Gong, Junfeng, et al.
Publicado: (2025)
LouisKV: Efficient KV Cache Retrieval for Long Input-Output Sequences
por: Wu, Wenbo, et al.
Publicado: (2025)
por: Wu, Wenbo, et al.
Publicado: (2025)
Stable Minima Cannot Overfit in Univariate ReLU Networks: Generalization by Large Step Sizes
por: Qiao, Dan, et al.
Publicado: (2024)
por: Qiao, Dan, et al.
Publicado: (2024)
From Local to Cluster: A Unified Framework for Causal Discovery with Latent Variables
por: Li, Zongyu
Publicado: (2026)
por: Li, Zongyu
Publicado: (2026)
AIGC for Industrial Time Series: From Deep Generative Models to Large Generative Models
por: Ren, Lei, et al.
Publicado: (2024)
por: Ren, Lei, et al.
Publicado: (2024)
Towards Better Generalization via Distributional Input Projection Network
por: Hao, Yifan, et al.
Publicado: (2025)
por: Hao, Yifan, et al.
Publicado: (2025)
Negligible in Size, Significant in Effect: On Scale Vectors in Large Language Models
por: Wang, Mingze, et al.
Publicado: (2026)
por: Wang, Mingze, et al.
Publicado: (2026)
From Classical Probabilistic Latent Variable Models to Modern Generative AI: A Unified Perspective
por: Chen, Tianhua
Publicado: (2025)
por: Chen, Tianhua
Publicado: (2025)
Taming the Entropy Cliff: Variable Codebook Size Quantization for Autoregressive Visual Generation
por: Zheng, Bowen, et al.
Publicado: (2026)
por: Zheng, Bowen, et al.
Publicado: (2026)
Measures of Variability for Risk-averse Policy Gradient
por: Luo, Yudong, et al.
Publicado: (2025)
por: Luo, Yudong, et al.
Publicado: (2025)
Batch-Size Independent Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms or Independent Arms
por: Liu, Xutong, et al.
Publicado: (2022)
por: Liu, Xutong, et al.
Publicado: (2022)
READ: Recurrent Adaptation of Large Transformers
por: Nguyen, John, et al.
Publicado: (2023)
por: Nguyen, John, et al.
Publicado: (2023)
Partial Parameter Updates for Efficient Distributed Training
por: Filippova, Anastasiia, et al.
Publicado: (2025)
por: Filippova, Anastasiia, et al.
Publicado: (2025)
Beyond One-Size-Fits-All: Tailored Benchmarks for Efficient Evaluation
por: Yuan, Peiwen, et al.
Publicado: (2025)
por: Yuan, Peiwen, et al.
Publicado: (2025)
Stochastic KV Routing: Enabling Adaptive Depth-Wise Cache Sharing
por: Filippova, Anastasiia, et al.
Publicado: (2026)
por: Filippova, Anastasiia, et al.
Publicado: (2026)
Out-of-Variable Generalization for Discriminative Models
por: Guo, Siyuan, et al.
Publicado: (2023)
por: Guo, Siyuan, et al.
Publicado: (2023)
Upper and Lower Bounds for Distributionally Robust Off-Dynamics Reinforcement Learning
por: Liu, Zhishuai, et al.
Publicado: (2024)
por: Liu, Zhishuai, et al.
Publicado: (2024)
TimeXer: Empowering Transformers for Time Series Forecasting with Exogenous Variables
por: Wang, Yuxuan, et al.
Publicado: (2024)
por: Wang, Yuxuan, et al.
Publicado: (2024)
Understanding Domain-Size Generalization in Markov Logic Networks
por: Chen, Florian, et al.
Publicado: (2024)
por: Chen, Florian, et al.
Publicado: (2024)
The Impact of Fine-tuning Large Language Models on Automated Program Repair
por: Macháček, Roman, et al.
Publicado: (2025)
por: Macháček, Roman, et al.
Publicado: (2025)
On Vanishing Variance in Transformer Length Generalization
por: Li, Ruining, et al.
Publicado: (2025)
por: Li, Ruining, et al.
Publicado: (2025)
2DXformer: Dual Transformers for Wind Power Forecasting with Dual Exogenous Variables
por: Zhang, Yajuan, et al.
Publicado: (2025)
por: Zhang, Yajuan, et al.
Publicado: (2025)
Tackling Size Generalization of Graph Neural Networks on Biological Data from a Spectral Perspective
por: Li, Gaotang, et al.
Publicado: (2023)
por: Li, Gaotang, et al.
Publicado: (2023)
Traj-Transformer: Diffusion Models with Transformer for GPS Trajectory Generation
por: Zhang, Zhiyang, et al.
Publicado: (2025)
por: Zhang, Zhiyang, et al.
Publicado: (2025)
Human-like Cognitive Generalization for Large Models via Brain-in-the-loop Supervision
por: Chen, Jiaxuan, et al.
Publicado: (2025)
por: Chen, Jiaxuan, et al.
Publicado: (2025)
A Generalization Bound for Nearly-Linear Networks
por: Golikov, Eugene
Publicado: (2024)
por: Golikov, Eugene
Publicado: (2024)
Ejemplares similares
-
Sharper Generalization Bounds for Transformer
por: Li, Yawen, et al.
Publicado: (2026) -
Effective Sample Size and Generalization Bounds for Temporal Networks
por: Gahtan, Barak, et al.
Publicado: (2025) -
From Generic Correlation to Input-Specific Credit in On-Policy Self Distillation
por: Shen, Guobin, et al.
Publicado: (2026) -
VariViT: A Vision Transformer for Variable Image Sizes
por: Varma, Aswathi, et al.
Publicado: (2026) -
Two Heads Are Better than One: Simulating Large Transformers with Small Ones
por: Yu, Hantao, et al.
Publicado: (2025)