:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ravichander, Abhilasha, Fisher, Jillian, Sorensen, Taylor, Lu, Ximing, Lin, Yuchen, Antoniak, Maria, Mireshghallah, Niloofar, Bhagavatula, Chandra, Choi, Yejin
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2503.12072
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Spectrum Tuning: Post-Training for Distributional Coverage and In-Context Steerability
by: Sorensen, Taylor, et al.
Published: (2025)

The Surprising Effectiveness of Membership Inference with Simple N-Gram Coverage
by: Hallinan, Skyler, et al.
Published: (2025)

Trust No Bot: Discovering Personal Disclosures in Human-LLM Conversations in the Wild
by: Mireshghallah, Niloofar, et al.
Published: (2024)

A Roadmap to Pluralistic Alignment
by: Sorensen, Taylor, et al.
Published: (2024)

HALoGEN: Fantastic LLM Hallucinations and Where to Find Them
by: Ravichander, Abhilasha, et al.
Published: (2025)

Agent Lumos: Unified and Modular Training for Open-Source Language Agents
by: Yin, Da, et al.
Published: (2023)

Impossible Distillation: from Low-Quality Model to High-Quality Dataset & Model for Summarization and Paraphrasing
by: Jung, Jaehun, et al.
Published: (2023)

JAMDEC: Unsupervised Authorship Obfuscation using Constrained Decoding over Small Language Models
by: Fisher, Jillian, et al.
Published: (2024)

StyleRemix: Interpretable Authorship Obfuscation via Distillation and Perturbation of Style Elements
by: Fisher, Jillian, et al.
Published: (2024)

RESTOR: Knowledge Recovery in Machine Unlearning
by: Rezaei, Keivan, et al.
Published: (2024)

Why and How LLMs Hallucinate: Connecting the Dots with Subsequence Associations
by: Sun, Yiyou, et al.
Published: (2025)

Synthetic Data Can Mislead Evaluations: Membership Inference as Machine Text Detection
by: Naseh, Ali, et al.
Published: (2025)

Opt-ICL at LeWiDi-2025: Maximizing In-Context Signal from Rater Examples via Meta-Learning
by: Sorensen, Taylor, et al.
Published: (2025)

WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models
by: Jiang, Liwei, et al.
Published: (2024)

Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties
by: Sorensen, Taylor, et al.
Published: (2023)

MacGyver: Are Large Language Models Creative Problem Solvers?
by: Tian, Yufei, et al.
Published: (2023)

AI as Humanity's Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution of Machine Text against Web Text
by: Lu, Ximing, et al.
Published: (2024)

What Has Been Lost with Synthetic Evaluation?
by: Gill, Alexander, et al.
Published: (2025)

Artifacts or Abduction: How Do LLMs Answer Multiple-Choice Questions Without the Question?
by: Balepur, Nishant, et al.
Published: (2024)

Revisiting the Past: Data Unlearning with Model State History
by: Rezaei, Keivan, et al.
Published: (2025)

Modular Pluralism: Pluralistic Alignment via Multi-LLM Collaboration
by: Feng, Shangbin, et al.
Published: (2024)

Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory
by: Mireshghallah, Niloofar, et al.
Published: (2023)

Phenomenal Yet Puzzling: Testing Inductive Reasoning Capabilities of Language Models with Hypothesis Refinement
by: Qiu, Linlu, et al.
Published: (2023)

Position: Privacy Is Not Just Memorization!
by: Mireshghallah, Niloofar, et al.
Published: (2025)

WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
by: Lin, Bill Yuchen, et al.
Published: (2024)

HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions
by: Zhou, Xuhui, et al.
Published: (2024)

Operationalizing Data Minimization for Privacy-Preserving LLM Prompting
by: Zhou, Jijie, et al.
Published: (2025)

Do Membership Inference Attacks Work on Large Language Models?
by: Duan, Michael, et al.
Published: (2024)

PPMI: Privacy-Preserving LLM Interaction with Socratic Chain-of-Thought Reasoning and Homomorphically Encrypted Vector Databases
by: Bae, Yubeen, et al.
Published: (2025)

The Curious Case of Factuality Finetuning: Models' Internal Beliefs Can Improve Factuality
by: Newman, Benjamin, et al.
Published: (2025)

Can Language Models Reason about Individualistic Human Values and Preferences?
by: Jiang, Liwei, et al.
Published: (2024)

Alignment Whack-a-Mole : Finetuning Activates Verbatim Recall of Copyrighted Books in Large Language Models
by: Liu, Xinyue, et al.
Published: (2026)

Smaller Language Models are Better Black-box Machine-Generated Text Detectors
by: Mireshghallah, Niloofar, et al.
Published: (2023)

Reinforcement Learning Improves Traversal of Hierarchical Knowledge in LLMs
by: Zhang, Renfei, et al.
Published: (2025)

CopyBench: Measuring Literal and Non-Literal Reproduction of Copyright-Protected Text in Language Model Generation
by: Chen, Tong, et al.
Published: (2024)

Privacy Ripple Effects from Adding or Removing Personal Information in Language Model Training
by: Borkar, Jaydeep, et al.
Published: (2025)

Alpaca against Vicuna: Using LLMs to Uncover Memorization of LLMs
by: Kassem, Aly M., et al.
Published: (2024)

A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage
by: Xin, Rui, et al.
Published: (2025)

Fractional Rotation, Full Potential? Investigating Performance and Convergence of Partial RoPE
by: Khan, Mohammad Aflah, et al.
Published: (2026)

Can Large Language Models Really Recognize Your Name?
by: Pham, Dzung, et al.
Published: (2025)