Saved in:
| Main Authors: | Balagansky, Nikita, Maksimov, Ian, Gavrilov, Daniil |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2410.07656 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Analyze Feature Flow to Enhance Interpretation and Steering in Language Models
by: Laptev, Daniil, et al.
Published: (2025)
by: Laptev, Daniil, et al.
Published: (2025)
Learn Your Reference Model for Real Good Alignment
by: Gorbatovski, Alexey, et al.
Published: (2024)
by: Gorbatovski, Alexey, et al.
Published: (2024)
Train One Sparse Autoencoder Across Multiple Sparsity Budgets to Preserve Interpretability and Accuracy
by: Balagansky, Nikita, et al.
Published: (2025)
by: Balagansky, Nikita, et al.
Published: (2025)
Next Embedding Prediction Makes World Models Stronger
by: Bredis, George, et al.
Published: (2026)
by: Bredis, George, et al.
Published: (2026)
Kronecker Factorization Improves Efficiency and Interpretability of Sparse Autoencoders
by: Kurochkin, Vadim, et al.
Published: (2025)
by: Kurochkin, Vadim, et al.
Published: (2025)
Diffusion Language Models Generation Can Be Halted Early
by: Vaina, Sofia Maria Lo Cicero, et al.
Published: (2023)
by: Vaina, Sofia Maria Lo Cicero, et al.
Published: (2023)
Teach Old SAEs New Domain Tricks with Boosting
by: Koriagin, Nikita, et al.
Published: (2025)
by: Koriagin, Nikita, et al.
Published: (2025)
You Do Not Fully Utilize Transformer's Representation Capacity
by: Gerasimov, Gleb, et al.
Published: (2025)
by: Gerasimov, Gleb, et al.
Published: (2025)
Small Vectors, Big Effects: A Mechanistic Study of RL-Induced Reasoning via Steering Vectors
by: Sinii, Viacheslav, et al.
Published: (2025)
by: Sinii, Viacheslav, et al.
Published: (2025)
Trust-Region Behavior Blending for On-Policy Distillation
by: Plyusov, Daniil, et al.
Published: (2026)
by: Plyusov, Daniil, et al.
Published: (2026)
Steering LLM Reasoning Through Bias-Only Adaptation
by: Sinii, Viacheslav, et al.
Published: (2025)
by: Sinii, Viacheslav, et al.
Published: (2025)
Linear Transformers with Learnable Kernel Functions are Better In-Context Models
by: Aksenov, Yaroslav, et al.
Published: (2024)
by: Aksenov, Yaroslav, et al.
Published: (2024)
Guided Star-Shaped Masked Diffusion
by: Meshchaninov, Viacheslav, et al.
Published: (2025)
by: Meshchaninov, Viacheslav, et al.
Published: (2025)
Revisiting Non-Acyclic GFlowNets in Discrete Environments
by: Morozov, Nikita, et al.
Published: (2025)
by: Morozov, Nikita, et al.
Published: (2025)
Learning Shortest Paths with Generative Flow Networks
by: Morozov, Nikita, et al.
Published: (2026)
by: Morozov, Nikita, et al.
Published: (2026)
gfnx: Fast and Scalable Library for Generative Flow Networks in JAX
by: Tiapkin, Daniil, et al.
Published: (2025)
by: Tiapkin, Daniil, et al.
Published: (2025)
VARAN: Variational Inference for Self-Supervised Speech Models Fine-Tuning on Downstream Tasks
by: Diatlova, Daria, et al.
Published: (2025)
by: Diatlova, Daria, et al.
Published: (2025)
Adversarial Schrödinger Bridge Matching
by: Gushchin, Nikita, et al.
Published: (2024)
by: Gushchin, Nikita, et al.
Published: (2024)
The Differences Between Direct Alignment Algorithms are a Blur
by: Gorbatovski, Alexey, et al.
Published: (2025)
by: Gorbatovski, Alexey, et al.
Published: (2025)
F-GRPO: Don't Let Your Policy Learn the Obvious and Forget the Rare
by: Plyusov, Daniil, et al.
Published: (2026)
by: Plyusov, Daniil, et al.
Published: (2026)
Enhancing Vision-Language Model Training with Reinforcement Learning in Synthetic Worlds for Real-World Success
by: Bredis, George, et al.
Published: (2025)
by: Bredis, George, et al.
Published: (2025)
Inverse Bridge Matching Distillation
by: Gushchin, Nikita, et al.
Published: (2025)
by: Gushchin, Nikita, et al.
Published: (2025)
ESSA: Evolutionary Strategies for Scalable Alignment
by: Korotyshova, Daria, et al.
Published: (2025)
by: Korotyshova, Daria, et al.
Published: (2025)
Evolution of SAE Features Across Layers in LLMs
by: Balcells, Daniel, et al.
Published: (2024)
by: Balcells, Daniel, et al.
Published: (2024)
Scalable Knowledge Editing for Mixture-of-Experts LLMs via Tensor-Structured Updates
by: Maksimov, Roman, et al.
Published: (2026)
by: Maksimov, Roman, et al.
Published: (2026)
PPFS: Predictive Permutation Feature Selection
by: Hassan, Atif, et al.
Published: (2021)
by: Hassan, Atif, et al.
Published: (2021)
Learning Unbiased Permutations via Flow Matching
by: Min, Yimeng, et al.
Published: (2026)
by: Min, Yimeng, et al.
Published: (2026)
Fair Feature Importance Scores via Feature Occlusion and Permutation
by: Little, Camille, et al.
Published: (2026)
by: Little, Camille, et al.
Published: (2026)
Optimizing Backward Policies in GFlowNets via Trajectory Likelihood Maximization
by: Gritsaev, Timofei, et al.
Published: (2024)
by: Gritsaev, Timofei, et al.
Published: (2024)
Generative Flow Networks as Entropy-Regularized RL
by: Tiapkin, Daniil, et al.
Published: (2023)
by: Tiapkin, Daniil, et al.
Published: (2023)
Trustworthy Feature Importance Avoids Unrestricted Permutations
by: Borgonovo, Emanuele, et al.
Published: (2026)
by: Borgonovo, Emanuele, et al.
Published: (2026)
Analysis of Linear Mode Connectivity via Permutation-Based Weight Matching: With Insights into Other Permutation Search Methods
by: Ito, Akira, et al.
Published: (2024)
by: Ito, Akira, et al.
Published: (2024)
Improving Generalization by Permutation Routing Across Model Copies
by: Kashiwamura, Shuhei, et al.
Published: (2026)
by: Kashiwamura, Shuhei, et al.
Published: (2026)
Unlocking the Duality between Flow and Field Matching
by: Shlenskii, Daniil, et al.
Published: (2026)
by: Shlenskii, Daniil, et al.
Published: (2026)
UPath: Universal Planner Across Topological Heterogeneity For Grid-Based Pathfinding
by: Ananikian, Aleksandr, et al.
Published: (2026)
by: Ananikian, Aleksandr, et al.
Published: (2026)
Adaptive Set-Mass Calibration with Conformal Prediction
by: Kazantsev, Daniil, et al.
Published: (2025)
by: Kazantsev, Daniil, et al.
Published: (2025)
On the Equivalence of Optimal Transport Problem and Action Matching with Optimal Vector Fields
by: Kornilov, Nikita, et al.
Published: (2025)
by: Kornilov, Nikita, et al.
Published: (2025)
Bayesian Inverse Problems Meet Flow Matching: Efficient and Flexible Inference via Transformers
by: Sherki, Daniil, et al.
Published: (2025)
by: Sherki, Daniil, et al.
Published: (2025)
AI Methods for Permutation Circuit Synthesis Across Generic Topologies
by: Villar, Victor, et al.
Published: (2025)
by: Villar, Victor, et al.
Published: (2025)
Midpoint Generative Models
by: Shlenskii, Daniil, et al.
Published: (2026)
by: Shlenskii, Daniil, et al.
Published: (2026)
Similar Items
-
Analyze Feature Flow to Enhance Interpretation and Steering in Language Models
by: Laptev, Daniil, et al.
Published: (2025) -
Learn Your Reference Model for Real Good Alignment
by: Gorbatovski, Alexey, et al.
Published: (2024) -
Train One Sparse Autoencoder Across Multiple Sparsity Budgets to Preserve Interpretability and Accuracy
by: Balagansky, Nikita, et al.
Published: (2025) -
Next Embedding Prediction Makes World Models Stronger
by: Bredis, George, et al.
Published: (2026) -
Kronecker Factorization Improves Efficiency and Interpretability of Sparse Autoencoders
by: Kurochkin, Vadim, et al.
Published: (2025)