:: Library Catalog

Imagen de Portada

Guardado en:

Detalles Bibliográficos
Autores principales:	Malviya, Pranshu, Mordido, Gonçalo, Baratin, Aristide, Harikandeh, Reza Babanezhad, Huang, Jerry, Lacoste-Julien, Simon, Pascanu, Razvan, Chandar, Sarath
Formato:	Preprint
Publicado:	2023
Materias:	Machine Learning Artificial Intelligence
Acceso en línea:	https://arxiv.org/abs/2307.09638
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

Ejemplares similares

Torque-Aware Momentum
por: Malviya, Pranshu, et al.
Publicado: (2024)

Lookbehind-SAM: k steps back, 1 step forward
por: Mordido, Gonçalo, et al.
Publicado: (2023)

Manifold Metric: A Loss Landscape Approach for Predicting Model Performance
por: Malviya, Pranshu, et al.
Publicado: (2024)

Layerwise LQR for Geometry-Aware Optimization of Deep Networks
por: Dufort-Labbé, Simon, et al.
Publicado: (2026)

Exploring Quantization for Efficient Pre-Training of Transformer Language Models
por: Chitsaz, Kamran, et al.
Publicado: (2024)

Why Don't Prompt-Based Fairness Metrics Correlate?
por: Zayed, Abdelrahman, et al.
Publicado: (2024)

Should We Attend More or Less? Modulating Attention for Fairness
por: Zayed, Abdelrahman, et al.
Publicado: (2023)

CoPeP: Benchmarking Continual Pretraining for Protein Language Models
por: Patil, Darshan, et al.
Publicado: (2026)

Navigating Potholes with Geometry-Aware Sharpness Minimization
por: Dufort-Labbé, Simon, et al.
Publicado: (2026)

Maxwell's Demon at Work: Efficient Pruning by Leveraging Saturation of Neurons
por: Dufort-Labbé, Simon, et al.
Publicado: (2024)

Revisiting Adam for Streaming Reinforcement Learning
por: Gogianu, Florin, et al.
Publicado: (2026)

Iterative Methods via Locally Evolving Set Process
por: Zhou, Baojian, et al.
Publicado: (2024)

Convergence of Steepest Descent and Adam under Non-Uniform Smoothness
por: Vaswani, Sharan, et al.
Publicado: (2026)

Mastering Memory Tasks with World Models
por: Samsami, Mohammad Reza, et al.
Publicado: (2024)

Dissecting Discrete Soft Actor-Critic: Limitations and Principled Alternatives
por: Asad, Reza, et al.
Publicado: (2025)

Any-Property-Conditional Molecule Generation with Self-Criticism using Spanning Trees
por: Jolicoeur-Martineau, Alexia, et al.
Publicado: (2024)

Lattice: Learning to Efficiently Compress the Memory
por: Karami, Mahdi, et al.
Publicado: (2025)

Lazy vs hasty: linearization in deep networks impacts learning schedule based on example difficulty
por: George, Thomas, et al.
Publicado: (2022)

Context-Aware Assistant Selection for Improved Inference Acceleration with Large Language Models
por: Huang, Jerry, et al.
Publicado: (2024)

Towards Practical Tool Usage for Continually Learning LLMs
por: Huang, Jerry, et al.
Publicado: (2024)

Retrieval-Augmented Decision Transformer: External Memory for In-context RL
por: Schmied, Thomas, et al.
Publicado: (2024)

Dual Optimistic Ascent (PI Control) is the Augmented Lagrangian Method in Disguise
por: Ramirez, Juan, et al.
Publicado: (2025)

Optimizers Qualitatively Alter Solutions And We Should Leverage This
por: Pascanu, Razvan, et al.
Publicado: (2025)

LLMs Can't Play Hangman: On the Necessity of a Private Working Memory for Language Agents
por: Baldelli, Davide, et al.
Publicado: (2026)

Armijo Line-search Can Make (Stochastic) Gradient Descent Provably Faster
por: Vaswani, Sharan, et al.
Publicado: (2025)

Do Large Language Models Know How Much They Know?
por: Prato, Gabriele, et al.
Publicado: (2025)

Do Robot Snakes Dream like Electric Sheep? Investigating the Effects of Architectural Inductive Biases on Hallucination
por: Huang, Jerry, et al.
Publicado: (2024)

EpiK-Eval: Evaluation for Language Models as Epistemic Models
por: Prato, Gabriele, et al.
Publicado: (2023)

Neural Coherence : Find higher performance to out-of-distribution tasks from few samples
por: Guiroy, Simon, et al.
Publicado: (2025)

Bias in Motion: Theoretical Insights into the Dynamics of Bias in SGD Training
por: Jain, Anchit, et al.
Publicado: (2024)

Dialectics of Alignment: Harnessing Unsafe Knowledge for Dynamic Safety Routing
por: Hashemzadeh, Maryam, et al.
Publicado: (2026)

Deep Grokking: Would Deep Neural Networks Generalize Better?
por: Fan, Simin, et al.
Publicado: (2024)

Faithfulness Measurable Masked Language Models
por: Madsen, Andreas, et al.
Publicado: (2023)

Are self-explanations from Large Language Models faithful?
por: Madsen, Andreas, et al.
Publicado: (2024)

Steering Large Language Model Activations in Sparse Spaces
por: Bayat, Reza, et al.
Publicado: (2025)

(Accelerated) Noise-adaptive Stochastic Heavy-Ball Momentum
por: Dang, Anh, et al.
Publicado: (2024)

NoProp: Training Neural Networks without Full Back-propagation or Full Forward-propagation
por: Li, Qinyu, et al.
Publicado: (2025)

Meta-learning how to Share Credit among Macro-Actions
por: Hosu, Ionel-Alexandru, et al.
Publicado: (2025)

Latent Space Representations of Neural Algorithmic Reasoners
por: Mirjanić, Vladimir V., et al.
Publicado: (2023)

Perplexity Cannot Always Tell Right from Wrong
por: Veličković, Petar, et al.
Publicado: (2026)