Saved in:
| Main Authors: | Osooli, Hamid, Batool, Kareema, Gentry, Rick, Roy, Tiasa Singha, Gupta, Ashwin, Ramesh, Anirudha |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.25077 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Deceptive Risk Minimization: Out-of-Distribution Generalization by Deceiving Distribution Shift Detectors
by: Majumdar, Anirudha
Published: (2025)
by: Majumdar, Anirudha
Published: (2025)
Zero-Shot Coordination in Ad Hoc Teams with Generalized Policy Improvement and Difference Rewards
by: Nigam, Rupal, et al.
Published: (2025)
by: Nigam, Rupal, et al.
Published: (2025)
How Ensemble Learning Balances Accuracy and Overfitting: A Bias-Variance Perspective on Tabular Data
by: Mohammad, Zubair Ahmed
Published: (2025)
by: Mohammad, Zubair Ahmed
Published: (2025)
Weak-for-Strong: Training Weak Meta-Agent to Harness Strong Executors
by: Nie, Fan, et al.
Published: (2025)
by: Nie, Fan, et al.
Published: (2025)
Concept Distillation from Strong to Weak Models via Hypotheses-to-Theories Prompting
by: Boateng, Emmanuel Aboah, et al.
Published: (2024)
by: Boateng, Emmanuel Aboah, et al.
Published: (2024)
MACPO: Weak-to-Strong Alignment via Multi-Agent Contrastive Preference Optimization
by: Lyu, Yougang, et al.
Published: (2024)
by: Lyu, Yougang, et al.
Published: (2024)
DemoBias: An Empirical Study to Trace Demographic Biases in Vision Foundation Models
by: Sufian, Abu, et al.
Published: (2025)
by: Sufian, Abu, et al.
Published: (2025)
Well Begun is Half Done: Low-resource Preference Alignment by Weak-to-Strong Decoding
by: Song, Feifan, et al.
Published: (2025)
by: Song, Feifan, et al.
Published: (2025)
Weak-to-Strong Reasoning
by: Yang, Yuqing, et al.
Published: (2024)
by: Yang, Yuqing, et al.
Published: (2024)
On the Emergence of Weak-to-Strong Generalization: A Bias-Variance Perspective
by: Xu, Gengze, et al.
Published: (2025)
by: Xu, Gengze, et al.
Published: (2025)
Interpreting and Mitigating Unwanted Uncertainty in LLMs
by: Roy, Tiasa Singha, et al.
Published: (2025)
by: Roy, Tiasa Singha, et al.
Published: (2025)
Selective Weak-to-Strong Generalization
by: Lang, Hao, et al.
Published: (2025)
by: Lang, Hao, et al.
Published: (2025)
Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization
by: Yang, Wenkai, et al.
Published: (2024)
by: Yang, Wenkai, et al.
Published: (2024)
Weak-Driven Learning: How Weak Agents make Strong Agents Stronger
by: Chen, Zehao, et al.
Published: (2026)
by: Chen, Zehao, et al.
Published: (2026)
Synergistic Weak-Strong Collaboration by Aligning Preferences
by: Jiao, Yizhu, et al.
Published: (2025)
by: Jiao, Yizhu, et al.
Published: (2025)
Quantifying Variance in Evaluation Benchmarks
by: Madaan, Lovish, et al.
Published: (2024)
by: Madaan, Lovish, et al.
Published: (2024)
Mixture of Weak & Strong Experts on Graphs
by: Zeng, Hanqing, et al.
Published: (2023)
by: Zeng, Hanqing, et al.
Published: (2023)
Debate Helps Weak-to-Strong Generalization
by: Lang, Hao, et al.
Published: (2025)
by: Lang, Hao, et al.
Published: (2025)
Quantifying the Gain in Weak-to-Strong Generalization
by: Charikar, Moses, et al.
Published: (2024)
by: Charikar, Moses, et al.
Published: (2024)
Learning Under Laws: A Constraint-Projected Neural PDE Solver that Eliminates Hallucinations
by: Singha, Mainak
Published: (2025)
by: Singha, Mainak
Published: (2025)
Prototypicality Bias Reveals Blindspots in Multimodal Evaluation Metrics
by: Roy, Subhadeep, et al.
Published: (2026)
by: Roy, Subhadeep, et al.
Published: (2026)
An Information-Theoretic Perspective on Variance-Invariance-Covariance Regularization
by: Shwartz-Ziv, Ravid, et al.
Published: (2023)
by: Shwartz-Ziv, Ravid, et al.
Published: (2023)
Weak-to-Strong Generalization under Distribution Shifts
by: Jeon, Myeongho, et al.
Published: (2025)
by: Jeon, Myeongho, et al.
Published: (2025)
Incentivizing Strong Reasoning from Weak Supervision
by: Yuan, Yige, et al.
Published: (2025)
by: Yuan, Yige, et al.
Published: (2025)
The Blessing of Dimensionality in LLM Fine-tuning: A Variance-Curvature Perspective
by: Liang, Qiyao, et al.
Published: (2026)
by: Liang, Qiyao, et al.
Published: (2026)
Evaluating LLM Alignment With Human Trust Models
by: Debnath, Anushka, et al.
Published: (2026)
by: Debnath, Anushka, et al.
Published: (2026)
Mental Health Equity in LLMs: Leveraging Multi-Hop Question Answering to Detect Amplified and Silenced Perspectives
by: Haider, Batool, et al.
Published: (2025)
by: Haider, Batool, et al.
Published: (2025)
On Strong and Weak Admissibility in Non-Flat Assumption-Based Argumentation
by: Berthold, Matti, et al.
Published: (2025)
by: Berthold, Matti, et al.
Published: (2025)
Thinking Forward and Backward: Effective Backward Planning with Large Language Models
by: Ren, Allen Z., et al.
Published: (2024)
by: Ren, Allen Z., et al.
Published: (2024)
Detecting Prefix Bias in LLM-based Reward Models
by: Kumar, Ashwin, et al.
Published: (2025)
by: Kumar, Ashwin, et al.
Published: (2025)
AI Alignment Strategies from a Risk Perspective: Independent Safety Mechanisms or Shared Failures?
by: Dung, Leonard, et al.
Published: (2025)
by: Dung, Leonard, et al.
Published: (2025)
A Compression Perspective on Simplicity Bias
by: Marty, Tom, et al.
Published: (2026)
by: Marty, Tom, et al.
Published: (2026)
BioDiffusion: A Versatile Diffusion Model for Biomedical Signal Synthesis
by: Li, Xiaomin, et al.
Published: (2024)
by: Li, Xiaomin, et al.
Published: (2024)
Evaluate Bias without Manual Test Sets: A Concept Representation Perspective for LLMs
by: Gao, Lang, et al.
Published: (2025)
by: Gao, Lang, et al.
Published: (2025)
Resource-Constrained Heuristic for Max-SAT
by: Matejek, Brian, et al.
Published: (2024)
by: Matejek, Brian, et al.
Published: (2024)
On the Convergence of Experience Replay in Policy Optimization: Characterizing Bias, Variance, and Finite-Time Convergence
by: Zheng, Hua, et al.
Published: (2021)
by: Zheng, Hua, et al.
Published: (2021)
How Sharp and Bias-Robust is a Model? Dual Evaluation Perspectives on Knowledge Graph Completion
by: Moon, Sooho, et al.
Published: (2025)
by: Moon, Sooho, et al.
Published: (2025)
Domain Generalization In Robust Invariant Representation
by: Gupta, Gauri, et al.
Published: (2023)
by: Gupta, Gauri, et al.
Published: (2023)
WESE: Weak Exploration to Strong Exploitation for LLM Agents
by: Huang, Xu, et al.
Published: (2024)
by: Huang, Xu, et al.
Published: (2024)
Generalizing Trust: Weak-to-Strong Trustworthiness in Language Models
by: Pawelczyk, Martin, et al.
Published: (2024)
by: Pawelczyk, Martin, et al.
Published: (2024)
Similar Items
-
Deceptive Risk Minimization: Out-of-Distribution Generalization by Deceiving Distribution Shift Detectors
by: Majumdar, Anirudha
Published: (2025) -
Zero-Shot Coordination in Ad Hoc Teams with Generalized Policy Improvement and Difference Rewards
by: Nigam, Rupal, et al.
Published: (2025) -
How Ensemble Learning Balances Accuracy and Overfitting: A Bias-Variance Perspective on Tabular Data
by: Mohammad, Zubair Ahmed
Published: (2025) -
Weak-for-Strong: Training Weak Meta-Agent to Harness Strong Executors
by: Nie, Fan, et al.
Published: (2025) -
Concept Distillation from Strong to Weak Models via Hypotheses-to-Theories Prompting
by: Boateng, Emmanuel Aboah, et al.
Published: (2024)