Guardado en:
| Autores principales: | Watanabe, Chihiro, Suzuki, Taiji |
|---|---|
| Formato: | Preprint |
| Publicado: |
2021
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2103.14203 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
AutoLL: Automatic Linear Layout of Graphs based on Deep Neural Network
por: Watanabe, Chihiro, et al.
Publicado: (2021)
por: Watanabe, Chihiro, et al.
Publicado: (2021)
Mean-field Analysis on Two-layer Neural Networks from a Kernel Perspective
por: Takakura, Shokichi, et al.
Publicado: (2024)
por: Takakura, Shokichi, et al.
Publicado: (2024)
Self-Supervised Learning for Sparse Matrix Reordering
por: Li, Ziwei, et al.
Publicado: (2026)
por: Li, Ziwei, et al.
Publicado: (2026)
Transformers Learn Nonlinear Features In Context: Nonconvex Mean-field Dynamics on the Attention Landscape
por: Kim, Juno, et al.
Publicado: (2024)
por: Kim, Juno, et al.
Publicado: (2024)
The Mechanism of Weak-to-Strong Generalization: Feature Elicitation from Latent Knowledge
por: Awano, Ryoya, et al.
Publicado: (2026)
por: Awano, Ryoya, et al.
Publicado: (2026)
State Space Models are Provably Comparable to Transformers in Dynamic Token Selection
por: Nishikawa, Naoki, et al.
Publicado: (2024)
por: Nishikawa, Naoki, et al.
Publicado: (2024)
Transformers Provably Solve Parity Efficiently with Chain of Thought
por: Kim, Juno, et al.
Publicado: (2024)
por: Kim, Juno, et al.
Publicado: (2024)
Test time training enhances in-context learning of nonlinear functions
por: Kuwataka, Kento, et al.
Publicado: (2025)
por: Kuwataka, Kento, et al.
Publicado: (2025)
Transformers as Measure-Theoretic Associative Memory: A Statistical Perspective and Minimax Optimality
por: Kawata, Ryotaro, et al.
Publicado: (2026)
por: Kawata, Ryotaro, et al.
Publicado: (2026)
In-Context Learning Is Provably Bayesian Inference: A Generalization Theory for Meta-Learning
por: Wakayama, Tomoya, et al.
Publicado: (2025)
por: Wakayama, Tomoya, et al.
Publicado: (2025)
Approximation and Estimation Ability of Transformers for Sequence-to-Sequence Functions with Infinite Dimensional Input
por: Takakura, Shokichi, et al.
Publicado: (2023)
por: Takakura, Shokichi, et al.
Publicado: (2023)
Empirical Cumulative Distribution Function Clustering for LLM-based Agent System Analysis
por: Watanabe, Chihiro, et al.
Publicado: (2026)
por: Watanabe, Chihiro, et al.
Publicado: (2026)
MultiwayPAM: Multiway Partitioning Around Medoids for LLM-as-a-Judge Score Analysis
por: Watanabe, Chihiro, et al.
Publicado: (2026)
por: Watanabe, Chihiro, et al.
Publicado: (2026)
Direct Density Ratio Optimization: A Statistically Consistent Approach to Aligning Large Language Models
por: Higuchi, Rei, et al.
Publicado: (2025)
por: Higuchi, Rei, et al.
Publicado: (2025)
High-Dimensional Kernel Methods under Covariate Shift: Data-Dependent Implicit Regularization
por: Chen, Yihang, et al.
Publicado: (2024)
por: Chen, Yihang, et al.
Publicado: (2024)
Towards a Unified Analysis of Neural Networks in Nonparametric Instrumental Variable Regression: Optimization and Generalization
por: Chen, Zonghao, et al.
Publicado: (2025)
por: Chen, Zonghao, et al.
Publicado: (2025)
Bridging the Gap between Sparse Matrix Reordering and Factorization: A Deep Learning Framework for Fill-in Reduction
por: Li, Ziwei, et al.
Publicado: (2026)
por: Li, Ziwei, et al.
Publicado: (2026)
Optimality and Adaptivity of Deep Neural Features for Instrumental Variable Regression
por: Kim, Juno, et al.
Publicado: (2025)
por: Kim, Juno, et al.
Publicado: (2025)
From Saddle Points Toward Global Minima: A Newton-Type Method on Wasserstein Space
por: Lascu, Razvan-Andrei, et al.
Publicado: (2026)
por: Lascu, Razvan-Andrei, et al.
Publicado: (2026)
Degrees of Freedom for Linear Attention: Distilling Softmax Attention with Optimal Feature Efficiency
por: Nishikawa, Naoki, et al.
Publicado: (2025)
por: Nishikawa, Naoki, et al.
Publicado: (2025)
Transformers are Minimax Optimal Nonparametric In-Context Learners
por: Kim, Juno, et al.
Publicado: (2024)
por: Kim, Juno, et al.
Publicado: (2024)
Mamba Can Learn Low-Dimensional Targets In-Context via Test-Time Feature Learning
por: Oh, Junsoo, et al.
Publicado: (2025)
por: Oh, Junsoo, et al.
Publicado: (2025)
Factorization-in-Loop: Proximal Fill-in Minimization for Sparse Matrix Reordering
por: Li, Ziwei, et al.
Publicado: (2025)
por: Li, Ziwei, et al.
Publicado: (2025)
Hessian-guided Perturbed Wasserstein Gradient Flows for Escaping Saddle Points
por: Yamamoto, Naoya, et al.
Publicado: (2025)
por: Yamamoto, Naoya, et al.
Publicado: (2025)
Convergence Error Analysis of Reflected Gradient Langevin Dynamics for Globally Optimizing Non-Convex Constrained Problems
por: Sato, Kanji, et al.
Publicado: (2022)
por: Sato, Kanji, et al.
Publicado: (2022)
On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent
por: Li, Bingrui, et al.
Publicado: (2024)
por: Li, Bingrui, et al.
Publicado: (2024)
A Relative-Budget Theory for Reinforcement Learning with Verifiable Rewards in Large Language Model Reasoning
por: Wachi, Akifumi, et al.
Publicado: (2026)
por: Wachi, Akifumi, et al.
Publicado: (2026)
Order Matters: Improving Domain Adaptation by Reordering Data
por: Napoli, Andrea, et al.
Publicado: (2026)
por: Napoli, Andrea, et al.
Publicado: (2026)
Zero-Flow Encoders
por: Wang, Yakun, et al.
Publicado: (2026)
por: Wang, Yakun, et al.
Publicado: (2026)
Direct Distributional Optimization for Provable Alignment of Diffusion Models
por: Kawata, Ryotaro, et al.
Publicado: (2025)
por: Kawata, Ryotaro, et al.
Publicado: (2025)
Learning sum of diverse features: computational hardness and efficient gradient-based training for ridge combinations
por: Oko, Kazusato, et al.
Publicado: (2024)
por: Oko, Kazusato, et al.
Publicado: (2024)
Pretrained transformer efficiently learns low-dimensional target functions in-context
por: Oko, Kazusato, et al.
Publicado: (2024)
por: Oko, Kazusato, et al.
Publicado: (2024)
Generalization Bound of Gradient Flow through Training Trajectory and Data-dependent Kernel
por: Chen, Yilan, et al.
Publicado: (2025)
por: Chen, Yilan, et al.
Publicado: (2025)
GE2E-AC: Generalized End-to-End Loss Training for Accent Classification
por: Watanabe, Chihiro, et al.
Publicado: (2024)
por: Watanabe, Chihiro, et al.
Publicado: (2024)
How Neural Reward Models Learn Features for Policy Optimization: A Single-Index Analysis
por: Higuchi, Rei, et al.
Publicado: (2026)
por: Higuchi, Rei, et al.
Publicado: (2026)
On the Learning Dynamics of Two-layer Linear Networks with Label Noise SGD
por: Zhang, Tongcheng, et al.
Publicado: (2026)
por: Zhang, Tongcheng, et al.
Publicado: (2026)
Intrinsic Wasserstein Rates for Score-Based Generative Models on Smooth Manifolds
por: Fu, Guoji, et al.
Publicado: (2026)
por: Fu, Guoji, et al.
Publicado: (2026)
Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit
por: Lee, Jason D., et al.
Publicado: (2024)
por: Lee, Jason D., et al.
Publicado: (2024)
Metastable Dynamics of Chain-of-Thought Reasoning: Provable Benefits of Search, RL and Distillation
por: Kim, Juno, et al.
Publicado: (2025)
por: Kim, Juno, et al.
Publicado: (2025)
From Shortcut to Induction Head: How Data Diversity Shapes Algorithm Selection in Transformers
por: Kawata, Ryotaro, et al.
Publicado: (2025)
por: Kawata, Ryotaro, et al.
Publicado: (2025)
Ejemplares similares
-
AutoLL: Automatic Linear Layout of Graphs based on Deep Neural Network
por: Watanabe, Chihiro, et al.
Publicado: (2021) -
Mean-field Analysis on Two-layer Neural Networks from a Kernel Perspective
por: Takakura, Shokichi, et al.
Publicado: (2024) -
Self-Supervised Learning for Sparse Matrix Reordering
por: Li, Ziwei, et al.
Publicado: (2026) -
Transformers Learn Nonlinear Features In Context: Nonconvex Mean-field Dynamics on the Attention Landscape
por: Kim, Juno, et al.
Publicado: (2024) -
The Mechanism of Weak-to-Strong Generalization: Feature Elicitation from Latent Knowledge
por: Awano, Ryoya, et al.
Publicado: (2026)