Guardado en:
| Autores principales: | Malviya, Pranshu, Mordido, Gonçalo, Baratin, Aristide, Harikandeh, Reza Babanezhad, Huang, Jerry, Lacoste-Julien, Simon, Pascanu, Razvan, Chandar, Sarath |
|---|---|
| Formato: | Preprint |
| Publicado: |
2023
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2307.09638 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
Torque-Aware Momentum
por: Malviya, Pranshu, et al.
Publicado: (2024)
por: Malviya, Pranshu, et al.
Publicado: (2024)
Lookbehind-SAM: k steps back, 1 step forward
por: Mordido, Gonçalo, et al.
Publicado: (2023)
por: Mordido, Gonçalo, et al.
Publicado: (2023)
Manifold Metric: A Loss Landscape Approach for Predicting Model Performance
por: Malviya, Pranshu, et al.
Publicado: (2024)
por: Malviya, Pranshu, et al.
Publicado: (2024)
Layerwise LQR for Geometry-Aware Optimization of Deep Networks
por: Dufort-Labbé, Simon, et al.
Publicado: (2026)
por: Dufort-Labbé, Simon, et al.
Publicado: (2026)
Exploring Quantization for Efficient Pre-Training of Transformer Language Models
por: Chitsaz, Kamran, et al.
Publicado: (2024)
por: Chitsaz, Kamran, et al.
Publicado: (2024)
Why Don't Prompt-Based Fairness Metrics Correlate?
por: Zayed, Abdelrahman, et al.
Publicado: (2024)
por: Zayed, Abdelrahman, et al.
Publicado: (2024)
Should We Attend More or Less? Modulating Attention for Fairness
por: Zayed, Abdelrahman, et al.
Publicado: (2023)
por: Zayed, Abdelrahman, et al.
Publicado: (2023)
CoPeP: Benchmarking Continual Pretraining for Protein Language Models
por: Patil, Darshan, et al.
Publicado: (2026)
por: Patil, Darshan, et al.
Publicado: (2026)
Navigating Potholes with Geometry-Aware Sharpness Minimization
por: Dufort-Labbé, Simon, et al.
Publicado: (2026)
por: Dufort-Labbé, Simon, et al.
Publicado: (2026)
Maxwell's Demon at Work: Efficient Pruning by Leveraging Saturation of Neurons
por: Dufort-Labbé, Simon, et al.
Publicado: (2024)
por: Dufort-Labbé, Simon, et al.
Publicado: (2024)
Revisiting Adam for Streaming Reinforcement Learning
por: Gogianu, Florin, et al.
Publicado: (2026)
por: Gogianu, Florin, et al.
Publicado: (2026)
Iterative Methods via Locally Evolving Set Process
por: Zhou, Baojian, et al.
Publicado: (2024)
por: Zhou, Baojian, et al.
Publicado: (2024)
Convergence of Steepest Descent and Adam under Non-Uniform Smoothness
por: Vaswani, Sharan, et al.
Publicado: (2026)
por: Vaswani, Sharan, et al.
Publicado: (2026)
Mastering Memory Tasks with World Models
por: Samsami, Mohammad Reza, et al.
Publicado: (2024)
por: Samsami, Mohammad Reza, et al.
Publicado: (2024)
Dissecting Discrete Soft Actor-Critic: Limitations and Principled Alternatives
por: Asad, Reza, et al.
Publicado: (2025)
por: Asad, Reza, et al.
Publicado: (2025)
Any-Property-Conditional Molecule Generation with Self-Criticism using Spanning Trees
por: Jolicoeur-Martineau, Alexia, et al.
Publicado: (2024)
por: Jolicoeur-Martineau, Alexia, et al.
Publicado: (2024)
Lattice: Learning to Efficiently Compress the Memory
por: Karami, Mahdi, et al.
Publicado: (2025)
por: Karami, Mahdi, et al.
Publicado: (2025)
Lazy vs hasty: linearization in deep networks impacts learning schedule based on example difficulty
por: George, Thomas, et al.
Publicado: (2022)
por: George, Thomas, et al.
Publicado: (2022)
Context-Aware Assistant Selection for Improved Inference Acceleration with Large Language Models
por: Huang, Jerry, et al.
Publicado: (2024)
por: Huang, Jerry, et al.
Publicado: (2024)
Towards Practical Tool Usage for Continually Learning LLMs
por: Huang, Jerry, et al.
Publicado: (2024)
por: Huang, Jerry, et al.
Publicado: (2024)
Retrieval-Augmented Decision Transformer: External Memory for In-context RL
por: Schmied, Thomas, et al.
Publicado: (2024)
por: Schmied, Thomas, et al.
Publicado: (2024)
Dual Optimistic Ascent (PI Control) is the Augmented Lagrangian Method in Disguise
por: Ramirez, Juan, et al.
Publicado: (2025)
por: Ramirez, Juan, et al.
Publicado: (2025)
Optimizers Qualitatively Alter Solutions And We Should Leverage This
por: Pascanu, Razvan, et al.
Publicado: (2025)
por: Pascanu, Razvan, et al.
Publicado: (2025)
LLMs Can't Play Hangman: On the Necessity of a Private Working Memory for Language Agents
por: Baldelli, Davide, et al.
Publicado: (2026)
por: Baldelli, Davide, et al.
Publicado: (2026)
Armijo Line-search Can Make (Stochastic) Gradient Descent Provably Faster
por: Vaswani, Sharan, et al.
Publicado: (2025)
por: Vaswani, Sharan, et al.
Publicado: (2025)
Do Large Language Models Know How Much They Know?
por: Prato, Gabriele, et al.
Publicado: (2025)
por: Prato, Gabriele, et al.
Publicado: (2025)
Do Robot Snakes Dream like Electric Sheep? Investigating the Effects of Architectural Inductive Biases on Hallucination
por: Huang, Jerry, et al.
Publicado: (2024)
por: Huang, Jerry, et al.
Publicado: (2024)
EpiK-Eval: Evaluation for Language Models as Epistemic Models
por: Prato, Gabriele, et al.
Publicado: (2023)
por: Prato, Gabriele, et al.
Publicado: (2023)
Neural Coherence : Find higher performance to out-of-distribution tasks from few samples
por: Guiroy, Simon, et al.
Publicado: (2025)
por: Guiroy, Simon, et al.
Publicado: (2025)
Bias in Motion: Theoretical Insights into the Dynamics of Bias in SGD Training
por: Jain, Anchit, et al.
Publicado: (2024)
por: Jain, Anchit, et al.
Publicado: (2024)
Dialectics of Alignment: Harnessing Unsafe Knowledge for Dynamic Safety Routing
por: Hashemzadeh, Maryam, et al.
Publicado: (2026)
por: Hashemzadeh, Maryam, et al.
Publicado: (2026)
Deep Grokking: Would Deep Neural Networks Generalize Better?
por: Fan, Simin, et al.
Publicado: (2024)
por: Fan, Simin, et al.
Publicado: (2024)
Faithfulness Measurable Masked Language Models
por: Madsen, Andreas, et al.
Publicado: (2023)
por: Madsen, Andreas, et al.
Publicado: (2023)
Are self-explanations from Large Language Models faithful?
por: Madsen, Andreas, et al.
Publicado: (2024)
por: Madsen, Andreas, et al.
Publicado: (2024)
Steering Large Language Model Activations in Sparse Spaces
por: Bayat, Reza, et al.
Publicado: (2025)
por: Bayat, Reza, et al.
Publicado: (2025)
(Accelerated) Noise-adaptive Stochastic Heavy-Ball Momentum
por: Dang, Anh, et al.
Publicado: (2024)
por: Dang, Anh, et al.
Publicado: (2024)
NoProp: Training Neural Networks without Full Back-propagation or Full Forward-propagation
por: Li, Qinyu, et al.
Publicado: (2025)
por: Li, Qinyu, et al.
Publicado: (2025)
Meta-learning how to Share Credit among Macro-Actions
por: Hosu, Ionel-Alexandru, et al.
Publicado: (2025)
por: Hosu, Ionel-Alexandru, et al.
Publicado: (2025)
Latent Space Representations of Neural Algorithmic Reasoners
por: Mirjanić, Vladimir V., et al.
Publicado: (2023)
por: Mirjanić, Vladimir V., et al.
Publicado: (2023)
Perplexity Cannot Always Tell Right from Wrong
por: Veličković, Petar, et al.
Publicado: (2026)
por: Veličković, Petar, et al.
Publicado: (2026)
Ejemplares similares
-
Torque-Aware Momentum
por: Malviya, Pranshu, et al.
Publicado: (2024) -
Lookbehind-SAM: k steps back, 1 step forward
por: Mordido, Gonçalo, et al.
Publicado: (2023) -
Manifold Metric: A Loss Landscape Approach for Predicting Model Performance
por: Malviya, Pranshu, et al.
Publicado: (2024) -
Layerwise LQR for Geometry-Aware Optimization of Deep Networks
por: Dufort-Labbé, Simon, et al.
Publicado: (2026) -
Exploring Quantization for Efficient Pre-Training of Transformer Language Models
por: Chitsaz, Kamran, et al.
Publicado: (2024)