:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhou, Xuhui, Kim, Hyunwoo, Brahman, Faeze, Jiang, Liwei, Zhu, Hao, Lu, Ximing, Xu, Frank, Lin, Bill Yuchen, Choi, Yejin, Mireshghallah, Niloofar, Bras, Ronan Le, Sap, Maarten
Format:	Preprint
Published:	2024
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2409.16427
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Leftover Lunch: Advantage-based Offline Reinforcement Learning for Language Models
by: Baheti, Ashutosh, et al.
Published: (2023)

Multi-Attribute Constraint Satisfaction via Language Model Rewriting
by: Baheti, Ashutosh, et al.
Published: (2024)

WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models
by: Jiang, Liwei, et al.
Published: (2024)

Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory
by: Mireshghallah, Niloofar, et al.
Published: (2023)

From Dogwhistles to Bullhorns: Unveiling Coded Rhetoric with Language Models
by: Mendelsohn, Julia, et al.
Published: (2023)

WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
by: Lin, Bill Yuchen, et al.
Published: (2024)

Trust or Escalate: LLM Judges with Provable Guarantees for Human Agreement
by: Jung, Jaehun, et al.
Published: (2024)

Information-Theoretic Distillation for Reference-less Summarization
by: Jung, Jaehun, et al.
Published: (2024)

AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents
by: Su, Zhe, et al.
Published: (2024)

Impossible Distillation: from Low-Quality Model to High-Quality Dataset & Model for Summarization and Paraphrasing
by: Jung, Jaehun, et al.
Published: (2023)

MacGyver: Are Large Language Models Creative Problem Solvers?
by: Tian, Yufei, et al.
Published: (2023)

Agent Lumos: Unified and Modular Training for Open-Source Language Agents
by: Yin, Da, et al.
Published: (2023)

Is this the real life? Is this just fantasy? The Misleading Success of Simulating Social Interactions With LLMs
by: Zhou, Xuhui, et al.
Published: (2024)

Trust No Bot: Discovering Personal Disclosures in Human-LLM Conversations in the Wild
by: Mireshghallah, Niloofar, et al.
Published: (2024)

ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning
by: Lin, Bill Yuchen, et al.
Published: (2025)

Information-Guided Identification of Training Data Imprint in (Proprietary) Large Language Models
by: Ravichander, Abhilasha, et al.
Published: (2025)

AI as Humanity's Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution of Machine Text against Web Text
by: Lu, Ximing, et al.
Published: (2024)

How to Train Your Fact Verifier: Knowledge Transfer with Multimodal Open Models
by: Lee, Jaeyoung, et al.
Published: (2024)

ParaPO: Aligning Language Models to Reduce Verbatim Reproduction of Pre-training Data
by: Chen, Tong, et al.
Published: (2025)

SimpleToM: Exposing the Gap between Explicit ToM Inference and Implicit ToM Application in LLMs
by: Gu, Yuling, et al.
Published: (2024)

GoodPoint: Learning Constructive Scientific Paper Feedback from Author Responses
by: Mun, Jimin, et al.
Published: (2026)

Social World Models
by: Zhou, Xuhui, et al.
Published: (2025)

What Makes it Ok to Set a Fire? Iterative Self-distillation of Contexts and Rationales for Disambiguating Defeasible Social and Moral Situations
by: Rao, Kavel, et al.
Published: (2023)

RESTOR: Knowledge Recovery in Machine Unlearning
by: Rezaei, Keivan, et al.
Published: (2024)

Spectrum Tuning: Post-Training for Distributional Coverage and In-Context Steerability
by: Sorensen, Taylor, et al.
Published: (2025)

WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs
by: Han, Seungju, et al.
Published: (2024)

Alpaca against Vicuna: Using LLMs to Uncover Memorization of LLMs
by: Kassem, Aly M., et al.
Published: (2024)

PPMI: Privacy-Preserving LLM Interaction with Socratic Chain-of-Thought Reasoning and Homomorphically Encrypted Vector Databases
by: Bae, Yubeen, et al.
Published: (2025)

Let Them Down Easy! Contextual Effects of LLM Guardrails on User Perceptions and Preferences
by: Zheng, Mingqian, et al.
Published: (2025)

Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties
by: Sorensen, Taylor, et al.
Published: (2023)

In Search of the Long-Tail: Systematic Generation of Long-Tail Inferential Knowledge via Logical Rule Guided Search
by: Li, Huihan, et al.
Published: (2023)

A Roadmap to Pluralistic Alignment
by: Sorensen, Taylor, et al.
Published: (2024)

Position: Privacy Is Not Just Memorization!
by: Mireshghallah, Niloofar, et al.
Published: (2025)

Synthetic Data Can Mislead Evaluations: Membership Inference as Machine Text Detection
by: Naseh, Ali, et al.
Published: (2025)

Ambig-SWE: Interactive Agents to Overcome Underspecificity in Software Engineering
by: Vijayvargiya, Sanidhya, et al.
Published: (2025)

A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage
by: Xin, Rui, et al.
Published: (2025)

The Surprising Effectiveness of Membership Inference with Simple N-Gram Coverage
by: Hallinan, Skyler, et al.
Published: (2025)

SafetyAnalyst: Interpretable, Transparent, and Steerable Safety Moderation for AI Behavior
by: Li, Jing-Jing, et al.
Published: (2024)

A Call for Clarity in Beam Search: How It Works and When It Stops
by: Kasai, Jungo, et al.
Published: (2022)

Particip-AI: A Democratic Surveying Framework for Anticipating Future AI Use Cases, Harms and Benefits
by: Mun, Jimin, et al.
Published: (2024)