Saved in:
| Main Authors: | Freeman, Joshua, Rippe, Chloe, Debenedetti, Edoardo, Andriushchenko, Maksym |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2412.06370 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Does Refusal Training in LLMs Generalize to the Past Tense?
by: Andriushchenko, Maksym, et al.
Published: (2024)
by: Andriushchenko, Maksym, et al.
Published: (2024)
Strategic Dishonesty Can Undermine AI Safety Evaluations of Frontier LLMs
by: Panfilov, Alexander, et al.
Published: (2025)
by: Panfilov, Alexander, et al.
Published: (2025)
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks
by: Andriushchenko, Maksym, et al.
Published: (2024)
by: Andriushchenko, Maksym, et al.
Published: (2024)
QuantSightBench: Evaluating LLM Quantitative Forecasting with Prediction Intervals
by: Qin, Jeremy, et al.
Published: (2026)
by: Qin, Jeremy, et al.
Published: (2026)
Is In-Context Learning Sufficient for Instruction Following in LLMs?
by: Zhao, Hao, et al.
Published: (2024)
by: Zhao, Hao, et al.
Published: (2024)
LLMs and Memorization: On Quality and Specificity of Copyright Compliance
by: Mueller, Felix B, et al.
Published: (2024)
by: Mueller, Felix B, et al.
Published: (2024)
Characterizing the Consistency of the Emergent Misalignment Persona
by: Weckauff, Anietta, et al.
Published: (2026)
by: Weckauff, Anietta, et al.
Published: (2026)
OpenAI o1 System Card
by: OpenAI, et al.
Published: (2024)
by: OpenAI, et al.
Published: (2024)
An Empirical Study of OpenAI API Discussions on Stack Overflow
by: Chen, Xiang, et al.
Published: (2025)
by: Chen, Xiang, et al.
Published: (2025)
OpenAI GPT-5 System Card
by: Singh, Aaditya, et al.
Published: (2025)
by: Singh, Aaditya, et al.
Published: (2025)
Privacy and Security Threat for OpenAI GPTs
by: Wenying, Wei, et al.
Published: (2025)
by: Wenying, Wei, et al.
Published: (2025)
Instrumental Choices: Measuring the Propensity of LLM Agents to Pursue Instrumental Behaviors
by: Wiedermann-Möller, Jonas, et al.
Published: (2026)
by: Wiedermann-Möller, Jonas, et al.
Published: (2026)
Monitoring Decomposition Attacks in LLMs with Lightweight Sequential Monitors
by: Yueh-Han, Chen, et al.
Published: (2025)
by: Yueh-Han, Chen, et al.
Published: (2025)
HalluHard: A Hard Multi-Turn Hallucination Benchmark
by: Fan, Dongyang, et al.
Published: (2026)
by: Fan, Dongyang, et al.
Published: (2026)
A Case Study of Web App Coding with OpenAI Reasoning Models
by: Cui, Yi
Published: (2024)
by: Cui, Yi
Published: (2024)
Capability-Based Scaling Trends for LLM-Based Red-Teaming
by: Panfilov, Alexander, et al.
Published: (2025)
by: Panfilov, Alexander, et al.
Published: (2025)
MTUncertainty: Assessing the Need for Post-editing of Machine Translation Outputs by Fine-tuning OpenAI LLMs
by: Gladkoff, Serge, et al.
Published: (2023)
by: Gladkoff, Serge, et al.
Published: (2023)
Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs
by: Panfilov, Alexander, et al.
Published: (2026)
by: Panfilov, Alexander, et al.
Published: (2026)
Understanding and Benchmarking Artificial Intelligence: OpenAI's o3 Is Not AGI
by: Pfister, Rolf, et al.
Published: (2025)
by: Pfister, Rolf, et al.
Published: (2025)
OpenAI's Approach to External Red Teaming for AI Models and Systems
by: Ahmad, Lama, et al.
Published: (2025)
by: Ahmad, Lama, et al.
Published: (2025)
Competition Report: Finding Universal Jailbreak Backdoors in Aligned LLMs
by: Rando, Javier, et al.
Published: (2024)
by: Rando, Javier, et al.
Published: (2024)
JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
by: Chao, Patrick, et al.
Published: (2024)
by: Chao, Patrick, et al.
Published: (2024)
Dominion: A New Frontier for AI Research
by: Halawi, Danny, et al.
Published: (2024)
by: Halawi, Danny, et al.
Published: (2024)
AutoAdvExBench: Benchmarking autonomous exploitation of adversarial example defenses
by: Carlini, Nicholas, et al.
Published: (2025)
by: Carlini, Nicholas, et al.
Published: (2025)
Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols
by: Terekhov, Mikhail, et al.
Published: (2025)
by: Terekhov, Mikhail, et al.
Published: (2025)
LLMs unlock new paths to monetizing exploits
by: Carlini, Nicholas, et al.
Published: (2025)
by: Carlini, Nicholas, et al.
Published: (2025)
LLMs Still Can't Plan; Can LRMs? A Preliminary Evaluation of OpenAI's o1 on PlanBench
by: Valmeekam, Karthik, et al.
Published: (2024)
by: Valmeekam, Karthik, et al.
Published: (2024)
We Should Separate Memorization from Copyright
by: Haviv, Adi, et al.
Published: (2026)
by: Haviv, Adi, et al.
Published: (2026)
On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and Generalizability
by: Wang, Kevin, et al.
Published: (2024)
by: Wang, Kevin, et al.
Published: (2024)
Depictions of Depression in Generative AI Video Models: A Preliminary Study of OpenAI's Sora 2
by: Flathers, Matthew, et al.
Published: (2026)
by: Flathers, Matthew, et al.
Published: (2026)
HDDLGym: A Tool for Studying Multi-Agent Hierarchical Problems Defined in HDDL with OpenAI Gym
by: La, Ngoc, et al.
Published: (2025)
by: La, Ngoc, et al.
Published: (2025)
Agent Skills Enable a New Class of Realistic and Trivially Simple Prompt Injections
by: Schmotz, David, et al.
Published: (2025)
by: Schmotz, David, et al.
Published: (2025)
Can OpenAI o1 outperform humans in higher-order cognitive thinking?
by: Latif, Ehsan, et al.
Published: (2024)
by: Latif, Ehsan, et al.
Published: (2024)
Lawsuit
Published: (2022)
Published: (2022)
Incentives or Ontology? A Structural Rebuttal to OpenAI's Hallucination Thesis
by: Ackermann, Richard, et al.
Published: (2025)
by: Ackermann, Richard, et al.
Published: (2025)
Evaluating Text Summaries Generated by Large Language Models Using OpenAI's GPT
by: Shakil, Hassan, et al.
Published: (2024)
by: Shakil, Hassan, et al.
Published: (2024)
Voices from the Frontier: A Comprehensive Analysis of the OpenAI Developer Forum
by: Hou, Xinyi, et al.
Published: (2024)
by: Hou, Xinyi, et al.
Published: (2024)
Speech Emotion Recognition Leveraging OpenAI's Whisper Representations and Attentive Pooling Methods
by: Shendabadi, Ali, et al.
Published: (2026)
by: Shendabadi, Ali, et al.
Published: (2026)
A Systematic Assessment of OpenAI o1-Preview for Higher Order Thinking in Education
by: Latif, Ehsan, et al.
Published: (2024)
by: Latif, Ehsan, et al.
Published: (2024)
Benchmarking Floworks against OpenAI & Anthropic: A Novel Framework for Enhanced LLM Function Calling
by: Bhan, Nirav, et al.
Published: (2024)
by: Bhan, Nirav, et al.
Published: (2024)
Similar Items
-
Does Refusal Training in LLMs Generalize to the Past Tense?
by: Andriushchenko, Maksym, et al.
Published: (2024) -
Strategic Dishonesty Can Undermine AI Safety Evaluations of Frontier LLMs
by: Panfilov, Alexander, et al.
Published: (2025) -
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks
by: Andriushchenko, Maksym, et al.
Published: (2024) -
QuantSightBench: Evaluating LLM Quantitative Forecasting with Prediction Intervals
by: Qin, Jeremy, et al.
Published: (2026) -
Is In-Context Learning Sufficient for Instruction Following in LLMs?
by: Zhao, Hao, et al.
Published: (2024)