:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Clinton J., Lee, Dean, Menghini, Cristina, Mols, Johannes, Doughty, Jack, Khoja, Adam, Lynch, Jayson, Hendryx, Sean, Yue, Summer, Hendrycks, Dan
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence Computation and Language
Online Access:	https://arxiv.org/abs/2502.08859
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems
by: Ren, Richard, et al.
Published: (2025)

MultiChallenge: A Realistic Multi-Turn Conversation Evaluation Benchmark Challenging to Frontier LLMs
by: Sirdeshmukh, Ved, et al.
Published: (2025)

Reducing Political Manipulation with Consistency Training
by: Phan, Long, et al.
Published: (2026)

ToolComp: A Multi-Tool Reasoning & Process Supervision Benchmark
by: Nath, Vaskar, et al.
Published: (2025)

Slant/Gokigen Naname is NP-complete, and Some Variations are in P
by: Lynch, Jayson, et al.
Published: (2025)

Subquadratic Approximation Algorithms for Separating Two Points with Objects in the Plane
by: Lynch, Jayson, et al.
Published: (2025)

Pre-Training Multimodal Hallucination Detectors with Corrupted Grounding Data
by: Whitehead, Spencer, et al.
Published: (2024)

Progress over Points: Reframing LM Benchmarks Around Scientific Objectives
by: Jin, Alwin, et al.
Published: (2025)

Introduction to AI Safety, Ethics, and Society
by: Hendrycks, Dan
Published: (2024)

Introduction to AI Safety, Ethics, and Society
by: Hendrycks, Dan
Published: (2024)

Agent-RLVR: Training Software Engineering Agents via Guidance and Environment Rewards
by: Da, Jeff, et al.
Published: (2025)

EvilGenie: A Reward Hacking Benchmark
by: Gabor, Jonathan, et al.
Published: (2025)

Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?
by: Ren, Richard, et al.
Published: (2024)

Análisis. ¿Del siglo norteamericano al siglo del Pacífico asiático?
by: Manfred Mols
Published: (2010)

Posibilidades de México para mejorar su sistema político / Manfred Mols
by: Mols, Manfred
Published: (1940)

The Asian Crisis and the Roles and Future of APEC and ASEAN as Instruments of Crisis Management
by: Manfred Mols
Published: (2000)

Revisiting the Superficial Alignment Hypothesis
by: Raghavendra, Mohit, et al.
Published: (2024)

Stimulus control of punishment effects: determining the controlling variables
by: Adam H. Doughty
Published: (2007)

Adaptive Guidance Accelerates Reinforcement Learning of Reasoning Models
by: Nath, Vaskar, et al.
Published: (2025)

Weaver: Interweaving SQL and LLM for Table Reasoning
by: Khoja, Rohit, et al.
Published: (2025)

Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs
by: Mazeika, Mantas, et al.
Published: (2025)

Going 3D with Technology: An Overarching Approach for Language Teachers
by: Jason D. Hendryx
Published: (2016)

A Baseline Analysis of Reward Models' Ability To Accurately Analyze Foundation Models Under Distribution Shift
by: LeVine, Will, et al.
Published: (2023)

Private equity acquisitions and product market decisions: Evidence from trademarks
by: Moazzam Khoja
Published: (2025)

Planning In Natural Language Improves LLM Search For Code Generation
by: Wang, Evan, et al.
Published: (2024)

Western Arabia in the Leiden Collections. Traces of a Colourful Past
by: Mols, Luitgard, et al.
Published: (2017)

Daily NOx emissions and lifetimes from Paris between May 2018 and July 2023 inferred from TROPOMI
by: Mols, Alba, et al.
Published: (2025)

Meek Models Shall Inherit the Earth
by: Gundlach, Hans, et al.
Published: (2025)

Superintelligence Strategy: Expert Version
by: Hendrycks, Dan, et al.
Published: (2025)

SuiteEval: Simplifying Retrieval Benchmarks
by: Parry, Andrew, et al.
Published: (2026)

Presentación
by: Raúl A. Menghini
Published: (2022)

Pini, M., Más Rocha, S., Gorostiaga, J., Tello, C., y Asprella, G. (coords.) La Educación Secundaria, ¿Modelo en (re)construcción?. Buenos Aires: Editorial Aique, pags. 252.
by: Raúl A. Menghini
Published: (2016)

If CLIP Could Talk: Understanding Vision-Language Model Representations Through Their Preferred Concept Descriptions
by: Esfandiarpoor, Reza, et al.
Published: (2024)

Enhancing CLIP with CLIP: Exploring Pseudolabeling for Limited-Label Prompt Tuning
by: Menghini, Cristina, et al.
Published: (2023)

LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet
by: Li, Nathaniel, et al.
Published: (2024)

Virology Capabilities Test (VCT): A Multimodal Virology Q&A Benchmark
by: Götting, Jasper, et al.
Published: (2025)

The discrete empirical interpolation method in class identification and data summarization
by: Emily P. Hendryx Lyons
Published: (2024)

Forging the Ideal Educated Girl (Volume 1.0)
by: Khoja-Moolji, Shenila
Published: (2020)

Forging the Ideal Educated Girl
by: Khoja-Moolji, Shenila
Published: (2018)

D-REX: A Benchmark for Detecting Deceptive Reasoning in Large Language Models
by: Krishna, Satyapriya, et al.
Published: (2025)