:: Library Catalog

Copertina

Salvato in:

Dettagli Bibliografici
Autori principali:	Sanford, Clayton, Hsu, Daniel, Telgarsky, Matus
Natura:	Preprint
Pubblicazione:	2024
Soggetti:	Machine Learning
Accesso online:	https://arxiv.org/abs/2402.09268
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

Documenti analoghi

One-layer transformers fail to solve the induction heads task
di: Sanford, Clayton, et al.
Pubblicazione: (2024)

Fast attention mechanisms: a tale of parallelism
di: Liu, Jingwen, et al.
Pubblicazione: (2025)

On Achieving Optimal Adversarial Test Error
di: Li, Justin D., et al.
Pubblicazione: (2023)

Astral Space: Convex Analysis at Infinity
di: Dudík, Miroslav, et al.
Pubblicazione: (2022)

Benefits of Early Stopping in Gradient Descent for Overparameterized Logistic Regression
di: Wu, Jingfeng, et al.
Pubblicazione: (2025)

Spectrum Extraction and Clipping for Implicitly Linear Layers
di: Boroojeny, Ali Ebrahimpour, et al.
Pubblicazione: (2024)

Large Stepsize Gradient Descent for Logistic Loss: Non-Monotonicity of the Loss Improves Optimization Efficiency
di: Wu, Jingfeng, et al.
Pubblicazione: (2024)

Basic Inequalities for First-Order Optimization with Applications to Statistical Risk Analysis
di: Paik, Seunghoon, et al.
Pubblicazione: (2025)

When Do Transformers Outperform Feedforward and Recurrent Networks? A Statistical Perspective
di: Mousavi-Hosseini, Alireza, et al.
Pubblicazione: (2025)

Lost in Tokenization: Fundamental Trade-offs in Graph Tokenization for Transformers
di: Bechler-Speicher, Maya, et al.
Pubblicazione: (2026)

Depth-Width tradeoffs in Algorithmic Reasoning of Graph Tasks with Transformers
di: Yehudai, Gilad, et al.
Pubblicazione: (2025)

Understanding Transformer Reasoning Capabilities via Graph Algorithms
di: Sanford, Clayton, et al.
Pubblicazione: (2024)

Lower bounds for one-layer transformers that compute parity
di: Hsu, Daniel
Pubblicazione: (2026)

Next-Token Prediction and Regret Minimization
di: Mohri, Mehryar, et al.
Pubblicazione: (2026)

Tensor-based computation of the Koopman generator via operator logarithm
di: Kishimoto, Tatsuya, et al.
Pubblicazione: (2026)

Best of Both Worlds: Advantages of Hybrid Graph Sequence Models
di: Behrouz, Ali, et al.
Pubblicazione: (2024)

Engineering Verifiable Modularity in Transformers via Per-Layer Supervision
di: Kerce, J. Clayton
Pubblicazione: (2026)

Fixed Universal Transformers
di: Liu, Jingwen, et al.
Pubblicazione: (2026)

Interpretable-by-Design Transformers via Architectural Stream Independence
di: Kerce, Clayton, et al.
Pubblicazione: (2026)

The Dual-Stream Transformer: Channelized Architecture for Interpretable Language Modeling
di: Kerce, J. Clayton, et al.
Pubblicazione: (2026)

Transformer-Based Approaches for Sensor-Based Human Activity Recognition: Opportunities and Challenges
di: Leite, Clayton Souza, et al.
Pubblicazione: (2024)

Stream separation improves Bregman conditioning in transformers
di: Kerce, James Clayton
Pubblicazione: (2026)

Adversarial Debiasing for Unbiased Parameter Recovery
di: Sanford, Luke C, et al.
Pubblicazione: (2025)

The Implicit Bias of Gradient Descent on Separable Multiclass Data
di: Ravi, Hrithik, et al.
Pubblicazione: (2024)

Dimension lower bounds for linear approaches to function approximation
di: Hsu, Daniel
Pubblicazione: (2025)

Enhancing Motion Variation in Text-to-Motion Models via Pose and Video Conditioned Editing
di: Leite, Clayton, et al.
Pubblicazione: (2024)

Flow Straight and Fast in Hilbert Space: Functional Rectified Flow
di: Zhang, Jianxin, et al.
Pubblicazione: (2025)

Label Embedding via Low-Coherence Matrices
di: Zhang, Jianxin, et al.
Pubblicazione: (2023)

Unified Binary and Multiclass Margin-Based Classification
di: Wang, Yutong, et al.
Pubblicazione: (2023)

Hierarchical Motion Captioning Utilizing External Text Data Source
di: Leite, Clayton, et al.
Pubblicazione: (2025)

Discovering influential text using convolutional neural networks
di: Ayers, Megan, et al.
Pubblicazione: (2024)

Transformers Provably Learn Sparse Token Selection While Fully-Connected Nets Cannot
di: Wang, Zixuan, et al.
Pubblicazione: (2024)

Data-Driven Hamiltonian Reduction for Superconducting Qubits via Meta-Learning
di: Sanford, Arielle, et al.
Pubblicazione: (2026)

Multi-group Learning for Hierarchical Groups
di: Deng, Samuel, et al.
Pubblicazione: (2024)

Simple and near-optimal algorithms for hidden stratification and multi-group learning
di: Tosh, Christopher, et al.
Pubblicazione: (2021)

Learning Compositional Functions with Transformers from Easy-to-Hard Data
di: Wang, Zixuan, et al.
Pubblicazione: (2025)

Transformer Model for Alzheimer's Disease Progression Prediction Using Longitudinal Visit Sequences
di: Moghaddami, Mahdi, et al.
Pubblicazione: (2025)

Can AI-predicted complexes teach machine learning to compute drug binding affinity?
di: Hsu, Wei-Tse, et al.
Pubblicazione: (2025)

Deep Learning Methods for Detecting Thermal Runaway Events in Battery Production Lines
di: Athanasopoulos, Athanasios, et al.
Pubblicazione: (2025)

Unifying Interpretability and Explainability for Alzheimer's Disease Progression Prediction
di: Ali, Raja Farrukh, et al.
Pubblicazione: (2024)