:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Osooli, Hamid, Batool, Kareema, Gentry, Rick, Roy, Tiasa Singha, Gupta, Ashwin, Ramesh, Anirudha
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2604.25077
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Deceptive Risk Minimization: Out-of-Distribution Generalization by Deceiving Distribution Shift Detectors
by: Majumdar, Anirudha
Published: (2025)

Zero-Shot Coordination in Ad Hoc Teams with Generalized Policy Improvement and Difference Rewards
by: Nigam, Rupal, et al.
Published: (2025)

How Ensemble Learning Balances Accuracy and Overfitting: A Bias-Variance Perspective on Tabular Data
by: Mohammad, Zubair Ahmed
Published: (2025)

Weak-for-Strong: Training Weak Meta-Agent to Harness Strong Executors
by: Nie, Fan, et al.
Published: (2025)

Concept Distillation from Strong to Weak Models via Hypotheses-to-Theories Prompting
by: Boateng, Emmanuel Aboah, et al.
Published: (2024)

MACPO: Weak-to-Strong Alignment via Multi-Agent Contrastive Preference Optimization
by: Lyu, Yougang, et al.
Published: (2024)

DemoBias: An Empirical Study to Trace Demographic Biases in Vision Foundation Models
by: Sufian, Abu, et al.
Published: (2025)

Well Begun is Half Done: Low-resource Preference Alignment by Weak-to-Strong Decoding
by: Song, Feifan, et al.
Published: (2025)

Weak-to-Strong Reasoning
by: Yang, Yuqing, et al.
Published: (2024)

On the Emergence of Weak-to-Strong Generalization: A Bias-Variance Perspective
by: Xu, Gengze, et al.
Published: (2025)

Interpreting and Mitigating Unwanted Uncertainty in LLMs
by: Roy, Tiasa Singha, et al.
Published: (2025)

Selective Weak-to-Strong Generalization
by: Lang, Hao, et al.
Published: (2025)

Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization
by: Yang, Wenkai, et al.
Published: (2024)

Weak-Driven Learning: How Weak Agents make Strong Agents Stronger
by: Chen, Zehao, et al.
Published: (2026)

Synergistic Weak-Strong Collaboration by Aligning Preferences
by: Jiao, Yizhu, et al.
Published: (2025)

Quantifying Variance in Evaluation Benchmarks
by: Madaan, Lovish, et al.
Published: (2024)

Mixture of Weak & Strong Experts on Graphs
by: Zeng, Hanqing, et al.
Published: (2023)

Debate Helps Weak-to-Strong Generalization
by: Lang, Hao, et al.
Published: (2025)

Quantifying the Gain in Weak-to-Strong Generalization
by: Charikar, Moses, et al.
Published: (2024)

Learning Under Laws: A Constraint-Projected Neural PDE Solver that Eliminates Hallucinations
by: Singha, Mainak
Published: (2025)

Prototypicality Bias Reveals Blindspots in Multimodal Evaluation Metrics
by: Roy, Subhadeep, et al.
Published: (2026)

An Information-Theoretic Perspective on Variance-Invariance-Covariance Regularization
by: Shwartz-Ziv, Ravid, et al.
Published: (2023)

Weak-to-Strong Generalization under Distribution Shifts
by: Jeon, Myeongho, et al.
Published: (2025)

Incentivizing Strong Reasoning from Weak Supervision
by: Yuan, Yige, et al.
Published: (2025)

The Blessing of Dimensionality in LLM Fine-tuning: A Variance-Curvature Perspective
by: Liang, Qiyao, et al.
Published: (2026)

Evaluating LLM Alignment With Human Trust Models
by: Debnath, Anushka, et al.
Published: (2026)

Mental Health Equity in LLMs: Leveraging Multi-Hop Question Answering to Detect Amplified and Silenced Perspectives
by: Haider, Batool, et al.
Published: (2025)

On Strong and Weak Admissibility in Non-Flat Assumption-Based Argumentation
by: Berthold, Matti, et al.
Published: (2025)

Thinking Forward and Backward: Effective Backward Planning with Large Language Models
by: Ren, Allen Z., et al.
Published: (2024)

Detecting Prefix Bias in LLM-based Reward Models
by: Kumar, Ashwin, et al.
Published: (2025)

AI Alignment Strategies from a Risk Perspective: Independent Safety Mechanisms or Shared Failures?
by: Dung, Leonard, et al.
Published: (2025)

A Compression Perspective on Simplicity Bias
by: Marty, Tom, et al.
Published: (2026)

BioDiffusion: A Versatile Diffusion Model for Biomedical Signal Synthesis
by: Li, Xiaomin, et al.
Published: (2024)

Evaluate Bias without Manual Test Sets: A Concept Representation Perspective for LLMs
by: Gao, Lang, et al.
Published: (2025)

Resource-Constrained Heuristic for Max-SAT
by: Matejek, Brian, et al.
Published: (2024)

On the Convergence of Experience Replay in Policy Optimization: Characterizing Bias, Variance, and Finite-Time Convergence
by: Zheng, Hua, et al.
Published: (2021)

How Sharp and Bias-Robust is a Model? Dual Evaluation Perspectives on Knowledge Graph Completion
by: Moon, Sooho, et al.
Published: (2025)

Domain Generalization In Robust Invariant Representation
by: Gupta, Gauri, et al.
Published: (2023)

WESE: Weak Exploration to Strong Exploitation for LLM Agents
by: Huang, Xu, et al.
Published: (2024)

Generalizing Trust: Weak-to-Strong Trustworthiness in Language Models
by: Pawelczyk, Martin, et al.
Published: (2024)