Saved in:
| Main Authors: | Novikov, Georgii, Oseledets, Ivan |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2407.15545 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Tensor-Train Point Cloud Compression and Efficient Approximate Nearest-Neighbor Search
by: Novikov, Georgii, et al.
Published: (2024)
by: Novikov, Georgii, et al.
Published: (2024)
Quasi-Random Physics-informed Neural Networks
by: Yu, Tianchi, et al.
Published: (2025)
by: Yu, Tianchi, et al.
Published: (2025)
Spectral Informed Neural Network: An Efficient and Low-Memory PINN
by: Yu, Tianchi, et al.
Published: (2024)
by: Yu, Tianchi, et al.
Published: (2024)
RECE: Reduced Cross-Entropy Loss for Large-Catalogue Sequential Recommenders
by: Gusak, Danil, et al.
Published: (2024)
by: Gusak, Danil, et al.
Published: (2024)
Diagonal Batching Unlocks Parallelism in Recurrent Memory Transformers for Long Contexts
by: Sivtsov, Danil, et al.
Published: (2025)
by: Sivtsov, Danil, et al.
Published: (2025)
Linearly Constrained Weights: Reducing Activation Shift for Faster Training of Neural Networks
by: Kutsuna, Takuro
Published: (2024)
by: Kutsuna, Takuro
Published: (2024)
On the Spatial Structure of Mixture-of-Experts in Transformers
by: Bershatsky, Daniel, et al.
Published: (2025)
by: Bershatsky, Daniel, et al.
Published: (2025)
Exploring the Hidden Capacity of LLMs for One-Step Text Generation
by: Mezentsev, Gleb, et al.
Published: (2025)
by: Mezentsev, Gleb, et al.
Published: (2025)
Binding threshold units with artificial oscillatory neurons
by: Fanaskov, Vladimir, et al.
Published: (2025)
by: Fanaskov, Vladimir, et al.
Published: (2025)
MLPMoE: Zero-Shot Architectural Metamorphosis of Dense LLM MLPs into Static Mixture-of-Experts
by: Novikov, Ivan
Published: (2025)
by: Novikov, Ivan
Published: (2025)
Faster Language Models with Better Multi-Token Prediction Using Tensor Decomposition
by: Basharin, Artem, et al.
Published: (2024)
by: Basharin, Artem, et al.
Published: (2024)
Run LoRA Run: Faster and Lighter LoRA Implementations
by: Cherniuk, Daria, et al.
Published: (2023)
by: Cherniuk, Daria, et al.
Published: (2023)
FreshGNN: Reducing Memory Access via Stable Historical Embeddings for Graph Neural Network Training
by: Huang, Kezhao, et al.
Published: (2023)
by: Huang, Kezhao, et al.
Published: (2023)
The Rogue Scalpel: Activation Steering Compromises LLM Safety
by: Korznikov, Anton, et al.
Published: (2025)
by: Korznikov, Anton, et al.
Published: (2025)
Monitoring Neural Training with Topology: A Footprint-Predictable Collapse Index
by: Kalinowski, Alexander
Published: (2026)
by: Kalinowski, Alexander
Published: (2026)
Learning from Linear Algebra: A Graph Neural Network Approach to Preconditioner Design for Conjugate Gradient Solvers
by: Trifonov, Vladislav, et al.
Published: (2024)
by: Trifonov, Vladislav, et al.
Published: (2024)
Spectral Analysis of the Weighted Frobenius Objective
by: Trifonov, Vladislav, et al.
Published: (2025)
by: Trifonov, Vladislav, et al.
Published: (2025)
Bayesian Inverse Problems Meet Flow Matching: Efficient and Flexible Inference via Transformers
by: Sherki, Daniil, et al.
Published: (2025)
by: Sherki, Daniil, et al.
Published: (2025)
Memory Faults in Activation-sparse Quantized Deep Neural Networks: Analysis and Mitigation using Sharpness-aware Training
by: Malhotra, Akul, et al.
Published: (2024)
by: Malhotra, Akul, et al.
Published: (2024)
Reducing Smoothness with Expressive Memory Enhanced Hierarchical Graph Neural Networks
by: Bailie, Thomas, et al.
Published: (2025)
by: Bailie, Thomas, et al.
Published: (2025)
ConDiff: A Challenging Dataset for Neural Solvers of Partial Differential Equations
by: Trifonov, Vladislav, et al.
Published: (2024)
by: Trifonov, Vladislav, et al.
Published: (2024)
Explicit Flow Matching: On The Theory of Flow Matching Algorithms with Applications
by: Ryzhakov, Gleb, et al.
Published: (2024)
by: Ryzhakov, Gleb, et al.
Published: (2024)
MaxInfo: A Training-Free Key-Frame Selection Method Using Maximum Volume for Enhanced Video Understanding
by: Li, Pengyi, et al.
Published: (2025)
by: Li, Pengyi, et al.
Published: (2025)
Message-Passing GNNs Fail to Approximate Sparse Triangular Factorizations
by: Trifonov, Vladislav, et al.
Published: (2025)
by: Trifonov, Vladislav, et al.
Published: (2025)
DNN Memory Footprint Reduction via Post-Training Intra-Layer Multi-Precision Quantization
by: Ghavami, Behnam, et al.
Published: (2024)
by: Ghavami, Behnam, et al.
Published: (2024)
Topology-based Representative Datasets to Reduce Neural Network Training Resources
by: Gonzalez-Diaz, Rocio, et al.
Published: (2019)
by: Gonzalez-Diaz, Rocio, et al.
Published: (2019)
Framework GNN-AID: Graph Neural Network Analysis Interpretation and Defense
by: Lukyanov, Kirill, et al.
Published: (2025)
by: Lukyanov, Kirill, et al.
Published: (2025)
A case study of spatiotemporal forecasting techniques for weather forecasting
by: Sofi, Shakir Showkat, et al.
Published: (2022)
by: Sofi, Shakir Showkat, et al.
Published: (2022)
Sparse and Transferable Universal Singular Vectors Attack
by: Kuvshinova, Kseniia, et al.
Published: (2024)
by: Kuvshinova, Kseniia, et al.
Published: (2024)
Inverting Non-Injective Functions with Twin Neural Network Regression
by: Wetzel, Sebastian J.
Published: (2026)
by: Wetzel, Sebastian J.
Published: (2026)
Black-Box Approximation and Optimization with Hierarchical Tucker Decomposition
by: Ryzhakov, Gleb, et al.
Published: (2024)
by: Ryzhakov, Gleb, et al.
Published: (2024)
Scalable Cross-Entropy Loss for Sequential Recommendations with Large Item Catalogs
by: Mezentsev, Gleb, et al.
Published: (2024)
by: Mezentsev, Gleb, et al.
Published: (2024)
Back to Basics: Revisiting Exploration in Reinforcement Learning for LLM Reasoning via Generative Probabilities
by: Li, Pengyi, et al.
Published: (2026)
by: Li, Pengyi, et al.
Published: (2026)
NNTile: a machine learning framework capable of training extremely large GPT language models on a single node
by: Mikhalev, Aleksandr, et al.
Published: (2025)
by: Mikhalev, Aleksandr, et al.
Published: (2025)
Training Memory in Deep Neural Networks: Mechanisms, Evidence, and Measurement Gaps
by: Sevetlidis, Vasileios, et al.
Published: (2026)
by: Sevetlidis, Vasileios, et al.
Published: (2026)
Marchuk: Efficient Global Weather Forecasting from Mid-Range to Sub-Seasonal Scales via Flow Matching
by: Kuzhamuratov, Arsen, et al.
Published: (2026)
by: Kuzhamuratov, Arsen, et al.
Published: (2026)
OASIS: Online Activation Subspace Learning for Memory-Efficient Training
by: Choudhary, Sakshi, et al.
Published: (2026)
by: Choudhary, Sakshi, et al.
Published: (2026)
FRUGAL: Memory-Efficient Optimization by Reducing State Overhead for Scalable Training
by: Zmushko, Philip, et al.
Published: (2024)
by: Zmushko, Philip, et al.
Published: (2024)
OUI as a Structural Observable: Towards an Activation-Centric View of Neural Network Training
by: Fernández-Hernández, Alberto, et al.
Published: (2026)
by: Fernández-Hernández, Alberto, et al.
Published: (2026)
Semiring Activation in Neural Networks
by: Smets, Bart M. N., et al.
Published: (2024)
by: Smets, Bart M. N., et al.
Published: (2024)
Similar Items
-
Tensor-Train Point Cloud Compression and Efficient Approximate Nearest-Neighbor Search
by: Novikov, Georgii, et al.
Published: (2024) -
Quasi-Random Physics-informed Neural Networks
by: Yu, Tianchi, et al.
Published: (2025) -
Spectral Informed Neural Network: An Efficient and Low-Memory PINN
by: Yu, Tianchi, et al.
Published: (2024) -
RECE: Reduced Cross-Entropy Loss for Large-Catalogue Sequential Recommenders
by: Gusak, Danil, et al.
Published: (2024) -
Diagonal Batching Unlocks Parallelism in Recurrent Memory Transformers for Long Contexts
by: Sivtsov, Danil, et al.
Published: (2025)