:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Freeman, Joshua, Rippe, Chloe, Debenedetti, Edoardo, Andriushchenko, Maksym
Format:	Preprint
Published:	2024
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2412.06370
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Does Refusal Training in LLMs Generalize to the Past Tense?
by: Andriushchenko, Maksym, et al.
Published: (2024)

Strategic Dishonesty Can Undermine AI Safety Evaluations of Frontier LLMs
by: Panfilov, Alexander, et al.
Published: (2025)

Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks
by: Andriushchenko, Maksym, et al.
Published: (2024)

QuantSightBench: Evaluating LLM Quantitative Forecasting with Prediction Intervals
by: Qin, Jeremy, et al.
Published: (2026)

Is In-Context Learning Sufficient for Instruction Following in LLMs?
by: Zhao, Hao, et al.
Published: (2024)

LLMs and Memorization: On Quality and Specificity of Copyright Compliance
by: Mueller, Felix B, et al.
Published: (2024)

Characterizing the Consistency of the Emergent Misalignment Persona
by: Weckauff, Anietta, et al.
Published: (2026)

OpenAI o1 System Card
by: OpenAI, et al.
Published: (2024)

An Empirical Study of OpenAI API Discussions on Stack Overflow
by: Chen, Xiang, et al.
Published: (2025)

OpenAI GPT-5 System Card
by: Singh, Aaditya, et al.
Published: (2025)

Privacy and Security Threat for OpenAI GPTs
by: Wenying, Wei, et al.
Published: (2025)

Instrumental Choices: Measuring the Propensity of LLM Agents to Pursue Instrumental Behaviors
by: Wiedermann-Möller, Jonas, et al.
Published: (2026)

Monitoring Decomposition Attacks in LLMs with Lightweight Sequential Monitors
by: Yueh-Han, Chen, et al.
Published: (2025)

HalluHard: A Hard Multi-Turn Hallucination Benchmark
by: Fan, Dongyang, et al.
Published: (2026)

A Case Study of Web App Coding with OpenAI Reasoning Models
by: Cui, Yi
Published: (2024)

Capability-Based Scaling Trends for LLM-Based Red-Teaming
by: Panfilov, Alexander, et al.
Published: (2025)

MTUncertainty: Assessing the Need for Post-editing of Machine Translation Outputs by Fine-tuning OpenAI LLMs
by: Gladkoff, Serge, et al.
Published: (2023)

Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs
by: Panfilov, Alexander, et al.
Published: (2026)

Understanding and Benchmarking Artificial Intelligence: OpenAI's o3 Is Not AGI
by: Pfister, Rolf, et al.
Published: (2025)

OpenAI's Approach to External Red Teaming for AI Models and Systems
by: Ahmad, Lama, et al.
Published: (2025)

Competition Report: Finding Universal Jailbreak Backdoors in Aligned LLMs
by: Rando, Javier, et al.
Published: (2024)

JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
by: Chao, Patrick, et al.
Published: (2024)

Dominion: A New Frontier for AI Research
by: Halawi, Danny, et al.
Published: (2024)

AutoAdvExBench: Benchmarking autonomous exploitation of adversarial example defenses
by: Carlini, Nicholas, et al.
Published: (2025)

Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols
by: Terekhov, Mikhail, et al.
Published: (2025)

LLMs unlock new paths to monetizing exploits
by: Carlini, Nicholas, et al.
Published: (2025)

LLMs Still Can't Plan; Can LRMs? A Preliminary Evaluation of OpenAI's o1 on PlanBench
by: Valmeekam, Karthik, et al.
Published: (2024)

We Should Separate Memorization from Copyright
by: Haviv, Adi, et al.
Published: (2026)

On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and Generalizability
by: Wang, Kevin, et al.
Published: (2024)

Depictions of Depression in Generative AI Video Models: A Preliminary Study of OpenAI's Sora 2
by: Flathers, Matthew, et al.
Published: (2026)

HDDLGym: A Tool for Studying Multi-Agent Hierarchical Problems Defined in HDDL with OpenAI Gym
by: La, Ngoc, et al.
Published: (2025)

Agent Skills Enable a New Class of Realistic and Trivially Simple Prompt Injections
by: Schmotz, David, et al.
Published: (2025)

Can OpenAI o1 outperform humans in higher-order cognitive thinking?
by: Latif, Ehsan, et al.
Published: (2024)

Lawsuit
Published: (2022)

Incentives or Ontology? A Structural Rebuttal to OpenAI's Hallucination Thesis
by: Ackermann, Richard, et al.
Published: (2025)

Evaluating Text Summaries Generated by Large Language Models Using OpenAI's GPT
by: Shakil, Hassan, et al.
Published: (2024)

Voices from the Frontier: A Comprehensive Analysis of the OpenAI Developer Forum
by: Hou, Xinyi, et al.
Published: (2024)

Speech Emotion Recognition Leveraging OpenAI's Whisper Representations and Attentive Pooling Methods
by: Shendabadi, Ali, et al.
Published: (2026)

A Systematic Assessment of OpenAI o1-Preview for Higher Order Thinking in Education
by: Latif, Ehsan, et al.
Published: (2024)

Benchmarking Floworks against OpenAI & Anthropic: A Novel Framework for Enhanced LLM Function Calling
by: Bhan, Nirav, et al.
Published: (2024)