Saved in:
| Main Authors: | Ravichander, Abhilasha, Fisher, Jillian, Sorensen, Taylor, Lu, Ximing, Lin, Yuchen, Antoniak, Maria, Mireshghallah, Niloofar, Bhagavatula, Chandra, Choi, Yejin |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.12072 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Spectrum Tuning: Post-Training for Distributional Coverage and In-Context Steerability
by: Sorensen, Taylor, et al.
Published: (2025)
by: Sorensen, Taylor, et al.
Published: (2025)
The Surprising Effectiveness of Membership Inference with Simple N-Gram Coverage
by: Hallinan, Skyler, et al.
Published: (2025)
by: Hallinan, Skyler, et al.
Published: (2025)
Trust No Bot: Discovering Personal Disclosures in Human-LLM Conversations in the Wild
by: Mireshghallah, Niloofar, et al.
Published: (2024)
by: Mireshghallah, Niloofar, et al.
Published: (2024)
A Roadmap to Pluralistic Alignment
by: Sorensen, Taylor, et al.
Published: (2024)
by: Sorensen, Taylor, et al.
Published: (2024)
HALoGEN: Fantastic LLM Hallucinations and Where to Find Them
by: Ravichander, Abhilasha, et al.
Published: (2025)
by: Ravichander, Abhilasha, et al.
Published: (2025)
Agent Lumos: Unified and Modular Training for Open-Source Language Agents
by: Yin, Da, et al.
Published: (2023)
by: Yin, Da, et al.
Published: (2023)
Impossible Distillation: from Low-Quality Model to High-Quality Dataset & Model for Summarization and Paraphrasing
by: Jung, Jaehun, et al.
Published: (2023)
by: Jung, Jaehun, et al.
Published: (2023)
JAMDEC: Unsupervised Authorship Obfuscation using Constrained Decoding over Small Language Models
by: Fisher, Jillian, et al.
Published: (2024)
by: Fisher, Jillian, et al.
Published: (2024)
StyleRemix: Interpretable Authorship Obfuscation via Distillation and Perturbation of Style Elements
by: Fisher, Jillian, et al.
Published: (2024)
by: Fisher, Jillian, et al.
Published: (2024)
RESTOR: Knowledge Recovery in Machine Unlearning
by: Rezaei, Keivan, et al.
Published: (2024)
by: Rezaei, Keivan, et al.
Published: (2024)
Why and How LLMs Hallucinate: Connecting the Dots with Subsequence Associations
by: Sun, Yiyou, et al.
Published: (2025)
by: Sun, Yiyou, et al.
Published: (2025)
Synthetic Data Can Mislead Evaluations: Membership Inference as Machine Text Detection
by: Naseh, Ali, et al.
Published: (2025)
by: Naseh, Ali, et al.
Published: (2025)
Opt-ICL at LeWiDi-2025: Maximizing In-Context Signal from Rater Examples via Meta-Learning
by: Sorensen, Taylor, et al.
Published: (2025)
by: Sorensen, Taylor, et al.
Published: (2025)
WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models
by: Jiang, Liwei, et al.
Published: (2024)
by: Jiang, Liwei, et al.
Published: (2024)
Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties
by: Sorensen, Taylor, et al.
Published: (2023)
by: Sorensen, Taylor, et al.
Published: (2023)
MacGyver: Are Large Language Models Creative Problem Solvers?
by: Tian, Yufei, et al.
Published: (2023)
by: Tian, Yufei, et al.
Published: (2023)
AI as Humanity's Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution of Machine Text against Web Text
by: Lu, Ximing, et al.
Published: (2024)
by: Lu, Ximing, et al.
Published: (2024)
What Has Been Lost with Synthetic Evaluation?
by: Gill, Alexander, et al.
Published: (2025)
by: Gill, Alexander, et al.
Published: (2025)
Artifacts or Abduction: How Do LLMs Answer Multiple-Choice Questions Without the Question?
by: Balepur, Nishant, et al.
Published: (2024)
by: Balepur, Nishant, et al.
Published: (2024)
Revisiting the Past: Data Unlearning with Model State History
by: Rezaei, Keivan, et al.
Published: (2025)
by: Rezaei, Keivan, et al.
Published: (2025)
Modular Pluralism: Pluralistic Alignment via Multi-LLM Collaboration
by: Feng, Shangbin, et al.
Published: (2024)
by: Feng, Shangbin, et al.
Published: (2024)
Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory
by: Mireshghallah, Niloofar, et al.
Published: (2023)
by: Mireshghallah, Niloofar, et al.
Published: (2023)
Phenomenal Yet Puzzling: Testing Inductive Reasoning Capabilities of Language Models with Hypothesis Refinement
by: Qiu, Linlu, et al.
Published: (2023)
by: Qiu, Linlu, et al.
Published: (2023)
Position: Privacy Is Not Just Memorization!
by: Mireshghallah, Niloofar, et al.
Published: (2025)
by: Mireshghallah, Niloofar, et al.
Published: (2025)
WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
by: Lin, Bill Yuchen, et al.
Published: (2024)
by: Lin, Bill Yuchen, et al.
Published: (2024)
HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions
by: Zhou, Xuhui, et al.
Published: (2024)
by: Zhou, Xuhui, et al.
Published: (2024)
Operationalizing Data Minimization for Privacy-Preserving LLM Prompting
by: Zhou, Jijie, et al.
Published: (2025)
by: Zhou, Jijie, et al.
Published: (2025)
Do Membership Inference Attacks Work on Large Language Models?
by: Duan, Michael, et al.
Published: (2024)
by: Duan, Michael, et al.
Published: (2024)
PPMI: Privacy-Preserving LLM Interaction with Socratic Chain-of-Thought Reasoning and Homomorphically Encrypted Vector Databases
by: Bae, Yubeen, et al.
Published: (2025)
by: Bae, Yubeen, et al.
Published: (2025)
The Curious Case of Factuality Finetuning: Models' Internal Beliefs Can Improve Factuality
by: Newman, Benjamin, et al.
Published: (2025)
by: Newman, Benjamin, et al.
Published: (2025)
Can Language Models Reason about Individualistic Human Values and Preferences?
by: Jiang, Liwei, et al.
Published: (2024)
by: Jiang, Liwei, et al.
Published: (2024)
Alignment Whack-a-Mole : Finetuning Activates Verbatim Recall of Copyrighted Books in Large Language Models
by: Liu, Xinyue, et al.
Published: (2026)
by: Liu, Xinyue, et al.
Published: (2026)
Smaller Language Models are Better Black-box Machine-Generated Text Detectors
by: Mireshghallah, Niloofar, et al.
Published: (2023)
by: Mireshghallah, Niloofar, et al.
Published: (2023)
Reinforcement Learning Improves Traversal of Hierarchical Knowledge in LLMs
by: Zhang, Renfei, et al.
Published: (2025)
by: Zhang, Renfei, et al.
Published: (2025)
CopyBench: Measuring Literal and Non-Literal Reproduction of Copyright-Protected Text in Language Model Generation
by: Chen, Tong, et al.
Published: (2024)
by: Chen, Tong, et al.
Published: (2024)
Privacy Ripple Effects from Adding or Removing Personal Information in Language Model Training
by: Borkar, Jaydeep, et al.
Published: (2025)
by: Borkar, Jaydeep, et al.
Published: (2025)
Alpaca against Vicuna: Using LLMs to Uncover Memorization of LLMs
by: Kassem, Aly M., et al.
Published: (2024)
by: Kassem, Aly M., et al.
Published: (2024)
A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage
by: Xin, Rui, et al.
Published: (2025)
by: Xin, Rui, et al.
Published: (2025)
Fractional Rotation, Full Potential? Investigating Performance and Convergence of Partial RoPE
by: Khan, Mohammad Aflah, et al.
Published: (2026)
by: Khan, Mohammad Aflah, et al.
Published: (2026)
Can Large Language Models Really Recognize Your Name?
by: Pham, Dzung, et al.
Published: (2025)
by: Pham, Dzung, et al.
Published: (2025)
Similar Items
-
Spectrum Tuning: Post-Training for Distributional Coverage and In-Context Steerability
by: Sorensen, Taylor, et al.
Published: (2025) -
The Surprising Effectiveness of Membership Inference with Simple N-Gram Coverage
by: Hallinan, Skyler, et al.
Published: (2025) -
Trust No Bot: Discovering Personal Disclosures in Human-LLM Conversations in the Wild
by: Mireshghallah, Niloofar, et al.
Published: (2024) -
A Roadmap to Pluralistic Alignment
by: Sorensen, Taylor, et al.
Published: (2024) -
HALoGEN: Fantastic LLM Hallucinations and Where to Find Them
by: Ravichander, Abhilasha, et al.
Published: (2025)