:: Library Catalog

Imagen de Portada

Guardado en:

Detalles Bibliográficos
Autores principales:	Alokhina, Anastasiia, Li, Pan
Formato:	Preprint
Publicado:	2025
Materias:	Machine Learning Artificial Intelligence
Acceso en línea:	https://arxiv.org/abs/2512.12805
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

Ejemplares similares

Sharper Generalization Bounds for Transformer
por: Li, Yawen, et al.
Publicado: (2026)

Effective Sample Size and Generalization Bounds for Temporal Networks
por: Gahtan, Barak, et al.
Publicado: (2025)

From Generic Correlation to Input-Specific Credit in On-Policy Self Distillation
por: Shen, Guobin, et al.
Publicado: (2026)

VariViT: A Vision Transformer for Variable Image Sizes
por: Varma, Aswathi, et al.
Publicado: (2026)

Two Heads Are Better than One: Simulating Large Transformers with Small Ones
por: Yu, Hantao, et al.
Publicado: (2025)

Towards Attributions of Input Variables in a Coalition
por: Zheng, Xinhao, et al.
Publicado: (2023)

STAR-VAE: Latent Variable Transformers for Scalable and Controllable Molecular Generation
por: Kwon, Bum Chul, et al.
Publicado: (2025)

Provably Overwhelming Transformer Models with Designed Inputs
por: Stambler, Lev, et al.
Publicado: (2025)

ALERT Open Dataset and Input-Size-Agnostic Vision Transformer for Driver Activity Recognition using IR-UWB
por: Park, Jeongjun, et al.
Publicado: (2025)

GraphMaker: Can Diffusion Models Generate Large Attributed Graphs?
por: Li, Mufei, et al.
Publicado: (2023)

MMET: A Multi-Input and Multi-Scale Transformer for Efficient PDEs Solving
por: Luo, Yichen, et al.
Publicado: (2025)

Learning with Noisy Labels by Adaptive Gradient-Based Outlier Removal
por: Sedova, Anastasiia, et al.
Publicado: (2023)

Understanding the Training and Generalization of Pretrained Transformer for Sequential Decision Making
por: Wang, Hanzhao, et al.
Publicado: (2024)

It Ain't That Bad: Understanding the Mysterious Performance Drop in OOD Generalization for Generative Transformer Models
por: Xu, Xingcheng, et al.
Publicado: (2023)

From Large to Small: Transferring CUDA Optimization Expertise via Reasoning Graph
por: Gong, Junfeng, et al.
Publicado: (2025)

LouisKV: Efficient KV Cache Retrieval for Long Input-Output Sequences
por: Wu, Wenbo, et al.
Publicado: (2025)

Stable Minima Cannot Overfit in Univariate ReLU Networks: Generalization by Large Step Sizes
por: Qiao, Dan, et al.
Publicado: (2024)

From Local to Cluster: A Unified Framework for Causal Discovery with Latent Variables
por: Li, Zongyu
Publicado: (2026)

AIGC for Industrial Time Series: From Deep Generative Models to Large Generative Models
por: Ren, Lei, et al.
Publicado: (2024)

Towards Better Generalization via Distributional Input Projection Network
por: Hao, Yifan, et al.
Publicado: (2025)

Negligible in Size, Significant in Effect: On Scale Vectors in Large Language Models
por: Wang, Mingze, et al.
Publicado: (2026)

From Classical Probabilistic Latent Variable Models to Modern Generative AI: A Unified Perspective
por: Chen, Tianhua
Publicado: (2025)

Taming the Entropy Cliff: Variable Codebook Size Quantization for Autoregressive Visual Generation
por: Zheng, Bowen, et al.
Publicado: (2026)

Measures of Variability for Risk-averse Policy Gradient
por: Luo, Yudong, et al.
Publicado: (2025)

Batch-Size Independent Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms or Independent Arms
por: Liu, Xutong, et al.
Publicado: (2022)

READ: Recurrent Adaptation of Large Transformers
por: Nguyen, John, et al.
Publicado: (2023)

Partial Parameter Updates for Efficient Distributed Training
por: Filippova, Anastasiia, et al.
Publicado: (2025)

Beyond One-Size-Fits-All: Tailored Benchmarks for Efficient Evaluation
por: Yuan, Peiwen, et al.
Publicado: (2025)

Stochastic KV Routing: Enabling Adaptive Depth-Wise Cache Sharing
por: Filippova, Anastasiia, et al.
Publicado: (2026)

Out-of-Variable Generalization for Discriminative Models
por: Guo, Siyuan, et al.
Publicado: (2023)

Upper and Lower Bounds for Distributionally Robust Off-Dynamics Reinforcement Learning
por: Liu, Zhishuai, et al.
Publicado: (2024)

TimeXer: Empowering Transformers for Time Series Forecasting with Exogenous Variables
por: Wang, Yuxuan, et al.
Publicado: (2024)

Understanding Domain-Size Generalization in Markov Logic Networks
por: Chen, Florian, et al.
Publicado: (2024)

The Impact of Fine-tuning Large Language Models on Automated Program Repair
por: Macháček, Roman, et al.
Publicado: (2025)

On Vanishing Variance in Transformer Length Generalization
por: Li, Ruining, et al.
Publicado: (2025)

2DXformer: Dual Transformers for Wind Power Forecasting with Dual Exogenous Variables
por: Zhang, Yajuan, et al.
Publicado: (2025)

Tackling Size Generalization of Graph Neural Networks on Biological Data from a Spectral Perspective
por: Li, Gaotang, et al.
Publicado: (2023)

Traj-Transformer: Diffusion Models with Transformer for GPS Trajectory Generation
por: Zhang, Zhiyang, et al.
Publicado: (2025)

Human-like Cognitive Generalization for Large Models via Brain-in-the-loop Supervision
por: Chen, Jiaxuan, et al.
Publicado: (2025)

A Generalization Bound for Nearly-Linear Networks
por: Golikov, Eugene
Publicado: (2024)