Saved in:
| Main Authors: | Dorner, Florian E., Chen, Yatong, Cruz, André F., Yang, Fanny |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.12399 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
How Benchmark Prediction from Fewer Data Misses the Mark
by: Zhang, Guanhua, et al.
Published: (2025)
by: Zhang, Guanhua, et al.
Published: (2025)
Don't Label Twice: Quantity Beats Quality when Comparing Binary Classifiers on a Budget
by: Dorner, Florian E., et al.
Published: (2024)
by: Dorner, Florian E., et al.
Published: (2024)
Conditional Prediction ROC Bands for Graph Classification
by: Wu, Yujia, et al.
Published: (2024)
by: Wu, Yujia, et al.
Published: (2024)
Multiclass ROC
by: Wang, Liang, et al.
Published: (2024)
by: Wang, Liang, et al.
Published: (2024)
Limits to scalable evaluation at the frontier: LLM as Judge won't beat twice the data
by: Dorner, Florian E., et al.
Published: (2024)
by: Dorner, Florian E., et al.
Published: (2024)
Knee or ROC
by: Wendt, Veronica, et al.
Published: (2024)
by: Wendt, Veronica, et al.
Published: (2024)
Interactive proofs for verifying (quantum) learning and testing
by: Caro, Matthias C., et al.
Published: (2024)
by: Caro, Matthias C., et al.
Published: (2024)
Strategic Hypothesis Testing
by: Hossain, Safwan, et al.
Published: (2025)
by: Hossain, Safwan, et al.
Published: (2025)
SubROC: AUC-Based Discovery of Exceptional Subgroup Performance for Binary Classifiers
by: Siegl, Tom, et al.
Published: (2025)
by: Siegl, Tom, et al.
Published: (2025)
A Multiclass ROC Curve
by: Giudici, Paolo, et al.
Published: (2025)
by: Giudici, Paolo, et al.
Published: (2025)
Training on the Test Task Confounds Evaluation and Emergence
by: Dominguez-Olmedo, Ricardo, et al.
Published: (2024)
by: Dominguez-Olmedo, Ricardo, et al.
Published: (2024)
Performative Prediction with Bandit Feedback: Learning through Reparameterization
by: Chen, Yatong, et al.
Published: (2023)
by: Chen, Yatong, et al.
Published: (2023)
Federated Computation of ROC and PR Curves
by: Xu, Xuefeng, et al.
Published: (2025)
by: Xu, Xuefeng, et al.
Published: (2025)
Incentivizing Honesty among Competitors in Collaborative Learning and Optimization
by: Dorner, Florian E., et al.
Published: (2023)
by: Dorner, Florian E., et al.
Published: (2023)
To Give or Not to Give? The Impacts of Strategically Withheld Recourse
by: Chen, Yatong, et al.
Published: (2025)
by: Chen, Yatong, et al.
Published: (2025)
Leaderboard Incentives: Model Rankings under Strategic Post-Training
by: Chen, Yatong, et al.
Published: (2026)
by: Chen, Yatong, et al.
Published: (2026)
s1: Simple test-time scaling
by: Muennighoff, Niklas, et al.
Published: (2025)
by: Muennighoff, Niklas, et al.
Published: (2025)
Scaling Up ROC-Optimizing Support Vector Machines
by: Bae, Gimun, et al.
Published: (2025)
by: Bae, Gimun, et al.
Published: (2025)
FROC: Building Fair ROC from a Trained Classifier
by: Vummintala, Avyukta Manjunatha, et al.
Published: (2024)
by: Vummintala, Avyukta Manjunatha, et al.
Published: (2024)
Learning Pareto manifolds in high dimensions: How can regularization help?
by: Wegel, Tobias, et al.
Published: (2025)
by: Wegel, Tobias, et al.
Published: (2025)
Product distribution learning with imperfect advice
by: Bhattacharyya, Arnab, et al.
Published: (2025)
by: Bhattacharyya, Arnab, et al.
Published: (2025)
FACROC: a fairness measure for FAir Clustering through ROC curves
by: Quy, Tai Le, et al.
Published: (2025)
by: Quy, Tai Le, et al.
Published: (2025)
Artificial intelligence for methane detection: from continuous monitoring to verified mitigation
by: Mateo-Garcia, Gonzalo, et al.
Published: (2025)
by: Mateo-Garcia, Gonzalo, et al.
Published: (2025)
Whose Preferences? Differences in Fairness Preferences and Their Impact on the Fairness of AI Utilizing Human Feedback
by: Lerner, Emilia Agis, et al.
Published: (2024)
by: Lerner, Emilia Agis, et al.
Published: (2024)
Predicting Blood Type: Assessing Model Performance with ROC Analysis
by: Altayar, Malik A., et al.
Published: (2025)
by: Altayar, Malik A., et al.
Published: (2025)
Area under the ROC Curve has the Most Consistent Evaluation for Binary Classification
by: Li, Jing
Published: (2024)
by: Li, Jing
Published: (2024)
Delay, Plateau, or Collapse: Evaluating the Impact of Systematic Verification Error on RLVR
by: Egashira, Kazuki, et al.
Published: (2026)
by: Egashira, Kazuki, et al.
Published: (2026)
Thought calibration: Efficient and confident test-time scaling
by: Wu, Menghua, et al.
Published: (2025)
by: Wu, Menghua, et al.
Published: (2025)
Towards a unified and verified understanding of group-operation networks
by: Wu, Wilson, et al.
Published: (2024)
by: Wu, Wilson, et al.
Published: (2024)
Interval-Based AUC (iAUC): Extending ROC Analysis to Uncertainty-Aware Classification
by: Li, Yuqi, et al.
Published: (2026)
by: Li, Yuqi, et al.
Published: (2026)
Tournament Leave-pair-out Cross-validation for Receiver Operating Characteristic (ROC) Analysis
by: Perez, Ileana Montoya, et al.
Published: (2018)
by: Perez, Ileana Montoya, et al.
Published: (2018)
Position: Stop Evaluating AI with Human Tests, Develop Principled, AI-specific Tests instead
by: Sühr, Tom, et al.
Published: (2025)
by: Sühr, Tom, et al.
Published: (2025)
Efficient line search for optimizing Area Under the ROC Curve in gradient descent
by: Fowler, Jadon, et al.
Published: (2024)
by: Fowler, Jadon, et al.
Published: (2024)
Learning multivariate Gaussians with imperfect advice
by: Bhattacharyya, Arnab, et al.
Published: (2024)
by: Bhattacharyya, Arnab, et al.
Published: (2024)
Online bipartite matching with imperfect advice
by: Choo, Davin, et al.
Published: (2024)
by: Choo, Davin, et al.
Published: (2024)
What should post-training optimize? A test-time scaling law perspective
by: Li, Muheng, et al.
Published: (2026)
by: Li, Muheng, et al.
Published: (2026)
Collapsing ROC approach for risk prediction research on both common and rare variants
by: Wei, Changshuai, et al.
Published: (2025)
by: Wei, Changshuai, et al.
Published: (2025)
Imitating from auxiliary imperfect demonstrations via Adversarial Density Weighted Regression
by: Zhang, Ziqi, et al.
Published: (2024)
by: Zhang, Ziqi, et al.
Published: (2024)
Connections between reinforcement learning with feedback,test-time scaling, and diffusion guidance: An anthology
by: Jiao, Yuchen, et al.
Published: (2025)
by: Jiao, Yuchen, et al.
Published: (2025)
How does over-squashing affect the power of GNNs?
by: Di Giovanni, Francesco, et al.
Published: (2023)
by: Di Giovanni, Francesco, et al.
Published: (2023)
Similar Items
-
How Benchmark Prediction from Fewer Data Misses the Mark
by: Zhang, Guanhua, et al.
Published: (2025) -
Don't Label Twice: Quantity Beats Quality when Comparing Binary Classifiers on a Budget
by: Dorner, Florian E., et al.
Published: (2024) -
Conditional Prediction ROC Bands for Graph Classification
by: Wu, Yujia, et al.
Published: (2024) -
Multiclass ROC
by: Wang, Liang, et al.
Published: (2024) -
Limits to scalable evaluation at the frontier: LLM as Judge won't beat twice the data
by: Dorner, Florian E., et al.
Published: (2024)