Saved in:
| Main Authors: | Kinoshita, Yuri, Nishikawa, Naoki, Toyoizumi, Taro |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.14830 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
A provable control of sensitivity of neural networks through a direct parameterization of the overall bi-Lipschitzness
by: Kinoshita, Yuri, et al.
Published: (2024)
by: Kinoshita, Yuri, et al.
Published: (2024)
Mixture of Experts Provably Detect and Learn the Latent Cluster Structure in Gradient-Based Learning
by: Kawata, Ryotaro, et al.
Published: (2025)
by: Kawata, Ryotaro, et al.
Published: (2025)
Degrees of Freedom for Linear Attention: Distilling Softmax Attention with Optimal Feature Efficiency
by: Nishikawa, Naoki, et al.
Published: (2025)
by: Nishikawa, Naoki, et al.
Published: (2025)
Cortex and subcortex play distinct roles over learning when cortical memory is limited
by: Farrell, Matthew, et al.
Published: (2026)
by: Farrell, Matthew, et al.
Published: (2026)
State Space Models are Provably Comparable to Transformers in Dynamic Token Selection
by: Nishikawa, Naoki, et al.
Published: (2024)
by: Nishikawa, Naoki, et al.
Published: (2024)
Gradient-Based Non-Linear Inverse Learning
by: Abhishake, et al.
Published: (2024)
by: Abhishake, et al.
Published: (2024)
Distilling Linearized Behavior into Non-Linear Fine-Tuning for Effective Task Arithmetic
by: Sommariva, Thomas, et al.
Published: (2026)
by: Sommariva, Thomas, et al.
Published: (2026)
Sample-Efficient Linear Representation Learning from Non-IID Non-Isotropic Data
by: Zhang, Thomas T. C. K., et al.
Published: (2023)
by: Zhang, Thomas T. C. K., et al.
Published: (2023)
Learning Task-Agnostic Representations through Multi-Teacher Distillation
by: Formont, Philippe, et al.
Published: (2025)
by: Formont, Philippe, et al.
Published: (2025)
DataDAM: Efficient Dataset Distillation with Attention Matching
by: Sajedi, Ahmad, et al.
Published: (2023)
by: Sajedi, Ahmad, et al.
Published: (2023)
From Low Intrinsic Dimensionality to Non-Vacuous Generalization Bounds in Deep Multi-Task Learning
by: Zakerinia, Hossein, et al.
Published: (2025)
by: Zakerinia, Hossein, et al.
Published: (2025)
Optimal Task Order for Continual Learning of Multiple Tasks
by: Li, Ziyan, et al.
Published: (2025)
by: Li, Ziyan, et al.
Published: (2025)
Learning Shared Representations for Multi-Task Linear Bandits
by: Lin, Jiabin, et al.
Published: (2026)
by: Lin, Jiabin, et al.
Published: (2026)
Multi-Task Representation Learning for Conservative Linear Bandits
by: Lin, Jiabin, et al.
Published: (2026)
by: Lin, Jiabin, et al.
Published: (2026)
Causality-Induced Positional Encoding for Transformer-Based Representation Learning of Non-Sequential Features
by: Xu, Kaichen, et al.
Published: (2025)
by: Xu, Kaichen, et al.
Published: (2025)
Disentangling and Mitigating the Impact of Task Similarity for Continual Learning
by: Hiratani, Naoki
Published: (2024)
by: Hiratani, Naoki
Published: (2024)
Reshaping Neural Representation via Associative, Presynaptic Short-Term Plasticity
by: Shimizu, Genki, et al.
Published: (2026)
by: Shimizu, Genki, et al.
Published: (2026)
Learning Dynamical Systems Encoding Non-Linearity within Space Curvature
by: Fichera, Bernardo, et al.
Published: (2024)
by: Fichera, Bernardo, et al.
Published: (2024)
Near-optimal and Efficient First-Order Algorithm for Multi-Task Learning with Shared Linear Representation
by: Ding, Shihong, et al.
Published: (2026)
by: Ding, Shihong, et al.
Published: (2026)
Blurred Encoding for Trajectory Representation Learning
by: Zhou, Silin, et al.
Published: (2025)
by: Zhou, Silin, et al.
Published: (2025)
Comparison of Autoencoder Encodings for ECG Representation in Downstream Prediction Tasks
by: Harvey, Christopher J., et al.
Published: (2024)
by: Harvey, Christopher J., et al.
Published: (2024)
What is Dataset Distillation Learning?
by: Yang, William, et al.
Published: (2024)
by: Yang, William, et al.
Published: (2024)
Random Gradient-Free Optimization in Infinite Dimensional Spaces
by: Peixoto, Caio Lins, et al.
Published: (2025)
by: Peixoto, Caio Lins, et al.
Published: (2025)
Transformers Implement Functional Gradient Descent to Learn Non-Linear Functions In Context
by: Cheng, Xiang, et al.
Published: (2023)
by: Cheng, Xiang, et al.
Published: (2023)
Dataset Distillation-based Hybrid Federated Learning on Non-IID Data
by: Shi, Xiufang, et al.
Published: (2024)
by: Shi, Xiufang, et al.
Published: (2024)
Spectral Gradient Surgery for Domain-Generalizable Dataset Distillation
by: Oh, Minyoung, et al.
Published: (2026)
by: Oh, Minyoung, et al.
Published: (2026)
Large Language Models Encode Semantics and Alignment in Linearly Separable Representations
by: Saglam, Baturay, et al.
Published: (2025)
by: Saglam, Baturay, et al.
Published: (2025)
Mask-Encoded Sparsification: Mitigating Biased Gradients in Communication-Efficient Split Learning
by: Zhou, Wenxuan, et al.
Published: (2024)
by: Zhou, Wenxuan, et al.
Published: (2024)
On the Diversity and Realism of Distilled Dataset: An Efficient Dataset Distillation Paradigm
by: Sun, Peng, et al.
Published: (2023)
by: Sun, Peng, et al.
Published: (2023)
Learning Linear Regression with Low-Rank Tasks in-Context
by: Takanami, Kaito, et al.
Published: (2025)
by: Takanami, Kaito, et al.
Published: (2025)
LDAdam: Adaptive Optimization from Low-Dimensional Gradient Statistics
by: Robert, Thomas, et al.
Published: (2024)
by: Robert, Thomas, et al.
Published: (2024)
Learning from Linear Algebra: A Graph Neural Network Approach to Preconditioner Design for Conjugate Gradient Solvers
by: Trifonov, Vladislav, et al.
Published: (2024)
by: Trifonov, Vladislav, et al.
Published: (2024)
Data-to-Model Distillation: Data-Efficient Learning Framework
by: Sajedi, Ahmad, et al.
Published: (2024)
by: Sajedi, Ahmad, et al.
Published: (2024)
Dynamics and Representation Structure of Local Approximations to Gradient-Based Learning in Linear Recurrent Neural Networks
by: Williams, Ezekiel, et al.
Published: (2026)
by: Williams, Ezekiel, et al.
Published: (2026)
Chebyshev Policies and the Mountain Car Problem: Reinforcement Learning for Low-Dimensional Control Tasks
by: Huber, Stefan, et al.
Published: (2026)
by: Huber, Stefan, et al.
Published: (2026)
Nonparametric Instrumental Variable Regression through Stochastic Approximate Gradients
by: Fonseca, Yuri, et al.
Published: (2024)
by: Fonseca, Yuri, et al.
Published: (2024)
On Learning Representations for Tabular Data Distillation
by: Kang, Inwon, et al.
Published: (2025)
by: Kang, Inwon, et al.
Published: (2025)
High-Dimensional Search, Low-Dimensional Solution: Decoupling Optimization from Representation
by: Kalyoncuoglu, Yusuf, et al.
Published: (2025)
by: Kalyoncuoglu, Yusuf, et al.
Published: (2025)
Learning to Flow from Generative Pretext Tasks for Neural Architecture Encoding
by: Kim, Sunwoo, et al.
Published: (2025)
by: Kim, Sunwoo, et al.
Published: (2025)
Exploring the Potential of QEEGNet for Cross-Task and Cross-Dataset Electroencephalography Encoding with Quantum Machine Learning
by: Chen, Chi-Sheng, et al.
Published: (2025)
by: Chen, Chi-Sheng, et al.
Published: (2025)
Similar Items
-
A provable control of sensitivity of neural networks through a direct parameterization of the overall bi-Lipschitzness
by: Kinoshita, Yuri, et al.
Published: (2024) -
Mixture of Experts Provably Detect and Learn the Latent Cluster Structure in Gradient-Based Learning
by: Kawata, Ryotaro, et al.
Published: (2025) -
Degrees of Freedom for Linear Attention: Distilling Softmax Attention with Optimal Feature Efficiency
by: Nishikawa, Naoki, et al.
Published: (2025) -
Cortex and subcortex play distinct roles over learning when cortical memory is limited
by: Farrell, Matthew, et al.
Published: (2026) -
State Space Models are Provably Comparable to Transformers in Dynamic Token Selection
by: Nishikawa, Naoki, et al.
Published: (2024)