Saved in:
| Main Authors: | Wang, Clinton J., Lee, Dean, Menghini, Cristina, Mols, Johannes, Doughty, Jack, Khoja, Adam, Lynch, Jayson, Hendryx, Sean, Yue, Summer, Hendrycks, Dan |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.08859 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems
by: Ren, Richard, et al.
Published: (2025)
by: Ren, Richard, et al.
Published: (2025)
MultiChallenge: A Realistic Multi-Turn Conversation Evaluation Benchmark Challenging to Frontier LLMs
by: Sirdeshmukh, Ved, et al.
Published: (2025)
by: Sirdeshmukh, Ved, et al.
Published: (2025)
Reducing Political Manipulation with Consistency Training
by: Phan, Long, et al.
Published: (2026)
by: Phan, Long, et al.
Published: (2026)
ToolComp: A Multi-Tool Reasoning & Process Supervision Benchmark
by: Nath, Vaskar, et al.
Published: (2025)
by: Nath, Vaskar, et al.
Published: (2025)
Slant/Gokigen Naname is NP-complete, and Some Variations are in P
by: Lynch, Jayson, et al.
Published: (2025)
by: Lynch, Jayson, et al.
Published: (2025)
Subquadratic Approximation Algorithms for Separating Two Points with Objects in the Plane
by: Lynch, Jayson, et al.
Published: (2025)
by: Lynch, Jayson, et al.
Published: (2025)
Pre-Training Multimodal Hallucination Detectors with Corrupted Grounding Data
by: Whitehead, Spencer, et al.
Published: (2024)
by: Whitehead, Spencer, et al.
Published: (2024)
Progress over Points: Reframing LM Benchmarks Around Scientific Objectives
by: Jin, Alwin, et al.
Published: (2025)
by: Jin, Alwin, et al.
Published: (2025)
Introduction to AI Safety, Ethics, and Society
by: Hendrycks, Dan
Published: (2024)
by: Hendrycks, Dan
Published: (2024)
Introduction to AI Safety, Ethics, and Society
by: Hendrycks, Dan
Published: (2024)
by: Hendrycks, Dan
Published: (2024)
Agent-RLVR: Training Software Engineering Agents via Guidance and Environment Rewards
by: Da, Jeff, et al.
Published: (2025)
by: Da, Jeff, et al.
Published: (2025)
EvilGenie: A Reward Hacking Benchmark
by: Gabor, Jonathan, et al.
Published: (2025)
by: Gabor, Jonathan, et al.
Published: (2025)
Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?
by: Ren, Richard, et al.
Published: (2024)
by: Ren, Richard, et al.
Published: (2024)
Análisis. ¿Del siglo norteamericano al siglo del Pacífico asiático?
by: Manfred Mols
Published: (2010)
by: Manfred Mols
Published: (2010)
Posibilidades de México para mejorar su sistema político / Manfred Mols
by: Mols, Manfred
Published: (1940)
by: Mols, Manfred
Published: (1940)
The Asian Crisis and the Roles and Future of APEC and ASEAN as Instruments of Crisis Management
by: Manfred Mols
Published: (2000)
by: Manfred Mols
Published: (2000)
Revisiting the Superficial Alignment Hypothesis
by: Raghavendra, Mohit, et al.
Published: (2024)
by: Raghavendra, Mohit, et al.
Published: (2024)
Stimulus control of punishment effects: determining the controlling variables
by: Adam H. Doughty
Published: (2007)
by: Adam H. Doughty
Published: (2007)
Adaptive Guidance Accelerates Reinforcement Learning of Reasoning Models
by: Nath, Vaskar, et al.
Published: (2025)
by: Nath, Vaskar, et al.
Published: (2025)
Weaver: Interweaving SQL and LLM for Table Reasoning
by: Khoja, Rohit, et al.
Published: (2025)
by: Khoja, Rohit, et al.
Published: (2025)
Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs
by: Mazeika, Mantas, et al.
Published: (2025)
by: Mazeika, Mantas, et al.
Published: (2025)
Going 3D with Technology: An Overarching Approach for Language Teachers
by: Jason D. Hendryx
Published: (2016)
by: Jason D. Hendryx
Published: (2016)
A Baseline Analysis of Reward Models' Ability To Accurately Analyze Foundation Models Under Distribution Shift
by: LeVine, Will, et al.
Published: (2023)
by: LeVine, Will, et al.
Published: (2023)
Private equity acquisitions and product market decisions: Evidence from trademarks
by: Moazzam Khoja
Published: (2025)
by: Moazzam Khoja
Published: (2025)
Planning In Natural Language Improves LLM Search For Code Generation
by: Wang, Evan, et al.
Published: (2024)
by: Wang, Evan, et al.
Published: (2024)
Western Arabia in the Leiden Collections. Traces of a Colourful Past
by: Mols, Luitgard, et al.
Published: (2017)
by: Mols, Luitgard, et al.
Published: (2017)
Daily NOx emissions and lifetimes from Paris between May 2018 and July 2023 inferred from TROPOMI
by: Mols, Alba, et al.
Published: (2025)
by: Mols, Alba, et al.
Published: (2025)
Meek Models Shall Inherit the Earth
by: Gundlach, Hans, et al.
Published: (2025)
by: Gundlach, Hans, et al.
Published: (2025)
Superintelligence Strategy: Expert Version
by: Hendrycks, Dan, et al.
Published: (2025)
by: Hendrycks, Dan, et al.
Published: (2025)
SuiteEval: Simplifying Retrieval Benchmarks
by: Parry, Andrew, et al.
Published: (2026)
by: Parry, Andrew, et al.
Published: (2026)
Presentación
by: Raúl A. Menghini
Published: (2022)
by: Raúl A. Menghini
Published: (2022)
Pini, M., Más Rocha, S., Gorostiaga, J., Tello, C., y Asprella, G. (coords.) La Educación Secundaria, ¿Modelo en (re)construcción?. Buenos Aires: Editorial Aique, pags. 252.
by: Raúl A. Menghini
Published: (2016)
by: Raúl A. Menghini
Published: (2016)
If CLIP Could Talk: Understanding Vision-Language Model Representations Through Their Preferred Concept Descriptions
by: Esfandiarpoor, Reza, et al.
Published: (2024)
by: Esfandiarpoor, Reza, et al.
Published: (2024)
Enhancing CLIP with CLIP: Exploring Pseudolabeling for Limited-Label Prompt Tuning
by: Menghini, Cristina, et al.
Published: (2023)
by: Menghini, Cristina, et al.
Published: (2023)
LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet
by: Li, Nathaniel, et al.
Published: (2024)
by: Li, Nathaniel, et al.
Published: (2024)
Virology Capabilities Test (VCT): A Multimodal Virology Q&A Benchmark
by: Götting, Jasper, et al.
Published: (2025)
by: Götting, Jasper, et al.
Published: (2025)
The discrete empirical interpolation method in class identification and data summarization
by: Emily P. Hendryx Lyons
Published: (2024)
by: Emily P. Hendryx Lyons
Published: (2024)
Forging the Ideal Educated Girl (Volume 1.0)
by: Khoja-Moolji, Shenila
Published: (2020)
by: Khoja-Moolji, Shenila
Published: (2020)
Forging the Ideal Educated Girl
by: Khoja-Moolji, Shenila
Published: (2018)
by: Khoja-Moolji, Shenila
Published: (2018)
D-REX: A Benchmark for Detecting Deceptive Reasoning in Large Language Models
by: Krishna, Satyapriya, et al.
Published: (2025)
by: Krishna, Satyapriya, et al.
Published: (2025)
Similar Items
-
The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems
by: Ren, Richard, et al.
Published: (2025) -
MultiChallenge: A Realistic Multi-Turn Conversation Evaluation Benchmark Challenging to Frontier LLMs
by: Sirdeshmukh, Ved, et al.
Published: (2025) -
Reducing Political Manipulation with Consistency Training
by: Phan, Long, et al.
Published: (2026) -
ToolComp: A Multi-Tool Reasoning & Process Supervision Benchmark
by: Nath, Vaskar, et al.
Published: (2025) -
Slant/Gokigen Naname is NP-complete, and Some Variations are in P
by: Lynch, Jayson, et al.
Published: (2025)