Saved in:
| Main Authors: | Nayak, Nihal V., Rodriguez-Diaz, Paula, Hulkund, Neha, Beery, Sara, Alvarez-Melis, David |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.14696 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't
by: Dang, Quy-Anh, et al.
Published: (2025)
by: Dang, Quy-Anh, et al.
Published: (2025)
What is the Right Notion of Distance between Predict-then-Optimize Tasks?
by: Rodriguez-Diaz, Paula, et al.
Published: (2024)
by: Rodriguez-Diaz, Paula, et al.
Published: (2024)
Boomerang Distillation Enables Zero-Shot Model Size Interpolation
by: Kangaslahti, Sara, et al.
Published: (2025)
by: Kangaslahti, Sara, et al.
Published: (2025)
Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy and Research
by: Cooper, A. Feder, et al.
Published: (2024)
by: Cooper, A. Feder, et al.
Published: (2024)
MedCalc-Bench Doesn't Measure What You Think: A Benchmark Audit and the Case for Open-Book Evaluation
by: Krohn-Grimberghe, Artus
Published: (2026)
by: Krohn-Grimberghe, Artus
Published: (2026)
Testing Autonomous Driving Systems -- What Really Matters and What Doesn't
by: Li, Changwen, et al.
Published: (2025)
by: Li, Changwen, et al.
Published: (2025)
Revisiting Intermediate-Layer Matching in Knowledge Distillation: Layer-Selection Strategy Doesn't Matter (Much)
by: Yu, Zony, et al.
Published: (2025)
by: Yu, Zony, et al.
Published: (2025)
Semantics at an Angle: When Cosine Similarity Works Until It Doesn't
by: You, Kisung
Published: (2025)
by: You, Kisung
Published: (2025)
Privacy-preserving data release leveraging optimal transport and particle gradient descent
by: Donhauser, Konstantin, et al.
Published: (2024)
by: Donhauser, Konstantin, et al.
Published: (2024)
Infinite Width Models That Work: Why Feature Learning Doesn't Matter as Much as You Think
by: Sernau, Luke
Published: (2024)
by: Sernau, Luke
Published: (2024)
Recurrent Off-Policy Deep Reinforcement Learning Doesn't Have to be Slow
by: Clark, Tyler, et al.
Published: (2025)
by: Clark, Tyler, et al.
Published: (2025)
When More Data Doesn't Help: Limits of Adaptation in Multitask Learning
by: Hanneke, Steve, et al.
Published: (2026)
by: Hanneke, Steve, et al.
Published: (2026)
Teach AI What It Doesn't Know
by: Sean Du
Published: (2026)
by: Sean Du
Published: (2026)
Library Designs Revisited: What Works--What Doesn't.
by: Metz, T. John, et al.
Published: (1987)
by: Metz, T. John, et al.
Published: (1987)
Library Learning Doesn't: The Curious Case of the Single-Use "Library"
by: Berlot-Attwell, Ian, et al.
Published: (2024)
by: Berlot-Attwell, Ian, et al.
Published: (2024)
Explorations of the Softmax Space: Knowing When the Neural Network Doesn't Know
by: Sikar, Daniel, et al.
Published: (2025)
by: Sikar, Daniel, et al.
Published: (2025)
Continuous Language Model Interpolation for Dynamic and Controllable Text Generation
by: Kangaslahti, Sara, et al.
Published: (2024)
by: Kangaslahti, Sara, et al.
Published: (2024)
Understanding the Role of Functional Diversity in Weight-Ensembling with Ingredient Selection and Multidimensional Scaling
by: Rojas, Alex, et al.
Published: (2024)
by: Rojas, Alex, et al.
Published: (2024)
In-Service and the School Library Media Specialist: What Works and What Doesn't.
by: Turner, Philip M.
Published: (1988)
by: Turner, Philip M.
Published: (1988)
When Structure Doesn't Help: LLMs Do Not Read Text-Attributed Graphs as Effectively as We Expected
by: Xu, Haotian, et al.
Published: (2025)
by: Xu, Haotian, et al.
Published: (2025)
Learning to Generate Instruction Tuning Datasets for Zero-Shot Task Adaptation
by: Nayak, Nihal V., et al.
Published: (2024)
by: Nayak, Nihal V., et al.
Published: (2024)
One LR Doesn't Fit All: Heavy-Tail Guided Layerwise Learning Rates for LLMs
by: He, Di, et al.
Published: (2026)
by: He, Di, et al.
Published: (2026)
DataS^3: Dataset Subset Selection for Specialization
by: Hulkund, Neha, et al.
Published: (2025)
by: Hulkund, Neha, et al.
Published: (2025)
Curated Synthetic Data Doesn't Have to Collapse: A Theoretical Study of Generative Retraining with Pluralistic Preferences
by: Falahati, Ali, et al.
Published: (2026)
by: Falahati, Ali, et al.
Published: (2026)
Why Prompt Optimization Works, and Why It Sometimes Doesn't: A Causal-Inspired Edit-Level Analysis
by: Gong, Shuzhi, et al.
Published: (2026)
by: Gong, Shuzhi, et al.
Published: (2026)
Strongly Isomorphic Neural Optimal Transport Across Incomparable Spaces
by: Sotiropoulou, Athina, et al.
Published: (2024)
by: Sotiropoulou, Athina, et al.
Published: (2024)
Self-Training Doesn't Flatten Language -- It Restructures It: Surface Markers Amplify While Deep Syntax Dies
by: Liu, Ming
Published: (2026)
by: Liu, Ming
Published: (2026)
Decomposing Elements of Problem Solving: What "Math" Does RL Teach?
by: Qin, Tian, et al.
Published: (2025)
by: Qin, Tian, et al.
Published: (2025)
Learning What Matters: Probabilistic Task Selection via Mutual Information for Model Finetuning
by: Chanda, Prateek, et al.
Published: (2025)
by: Chanda, Prateek, et al.
Published: (2025)
Consensus-Driven Active Model Selection
by: Kay, Justin, et al.
Published: (2025)
by: Kay, Justin, et al.
Published: (2025)
Consciousness Doesn't Do That
by: Matthias Michel
Published: (2026)
by: Matthias Michel
Published: (2026)
When Online Instruction Doesn't Measure Up: How Can You Tell, and What Should You Do?
by: Rapchak, Marcia
Published: (2019)
by: Rapchak, Marcia
Published: (2019)
"Something Comes through or It Doesn't": Intensive Reading in Post-Qualitative Inquiry
by: Maggie MacLure
Published: (2024)
by: Maggie MacLure
Published: (2024)
Do Large Language Model Benchmarks Test Reliability?
by: Vendrow, Joshua, et al.
Published: (2025)
by: Vendrow, Joshua, et al.
Published: (2025)
What Matters in Data for DPO?
by: Pan, Yu, et al.
Published: (2025)
by: Pan, Yu, et al.
Published: (2025)
AVEX: What Matters for Animal Vocalization Encoding
by: Miron, Marius, et al.
Published: (2025)
by: Miron, Marius, et al.
Published: (2025)
How to Connect Speech Foundation Models and Large Language Models? What Matters and What Does Not
by: Verdini, Francesco, et al.
Published: (2024)
by: Verdini, Francesco, et al.
Published: (2024)
Distributional Dataset Distillation with Subtask Decomposition
by: Qin, Tian, et al.
Published: (2024)
by: Qin, Tian, et al.
Published: (2024)
Forget What Matters, Keep the Rest: Selective Unlearning of Informative Tokens
by: Koh, Seunghee, et al.
Published: (2026)
by: Koh, Seunghee, et al.
Published: (2026)
Aggregation Hides Out-of-Distribution Generalization Failures from Spurious Correlations
by: Salaudeen, Olawale, et al.
Published: (2025)
by: Salaudeen, Olawale, et al.
Published: (2025)
Similar Items
-
Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't
by: Dang, Quy-Anh, et al.
Published: (2025) -
What is the Right Notion of Distance between Predict-then-Optimize Tasks?
by: Rodriguez-Diaz, Paula, et al.
Published: (2024) -
Boomerang Distillation Enables Zero-Shot Model Size Interpolation
by: Kangaslahti, Sara, et al.
Published: (2025) -
Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy and Research
by: Cooper, A. Feder, et al.
Published: (2024) -
MedCalc-Bench Doesn't Measure What You Think: A Benchmark Audit and the Case for Open-Book Evaluation
by: Krohn-Grimberghe, Artus
Published: (2026)