:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Li, Bryan, Luo, Fiona, Haider, Samar, Agashe, Adwait, Li, Tammy, Liu, Runqi, Miao, Muqing, Ramakrishnan, Shriya, Yuan, Yuan, Callison-Burch, Chris
Format:	Preprint
Published:	2024
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2410.01171
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

This Land is {Your, My} Land: Evaluating Geopolitical Biases in Language Models
by: Li, Bryan, et al.
Published: (2023)

Uncovering Differences in Persuasive Language in Russian versus English Wikipedia
by: Li, Bryan, et al.
Published: (2024)

Overhearing LLM Agents: A Survey, Taxonomy, and Roadmap
by: Zhu, Andrew, et al.
Published: (2025)

Autorubric: Unifying Rubric-based LLM Evaluation
by: Rao, Delip, et al.
Published: (2026)

Agreement Metrics for LLM-as-Judge Evaluation: What to Report and Why
by: Rao, Delip, et al.
Published: (2026)

BibTeX Citation Hallucinations in Scientific Publishing Agents: Evaluation and Mitigation
by: Rao, Delip, et al.
Published: (2026)

What Do Claim Verification Datasets Actually Test? A Reasoning Trace Analysis
by: Rao, Delip, et al.
Published: (2026)

The Media Bias Detector: A Framework for Annotating and Analyzing the News at Scale
by: Haider, Samar, et al.
Published: (2025)

mStyleDistance: Multilingual Style Embeddings and their Evaluation
by: Qiu, Justin, et al.
Published: (2025)

Media Bias Detector: Designing and Implementing a Tool for Real-Time Selection and Framing Bias Analysis in News Coverage
by: Wang, Jenny S, et al.
Published: (2025)

Choice-75: A Dataset on Decision Branching in Script Learning
by: Hou, Zhaoyi Joey, et al.
Published: (2023)

FanOutQA: A Multi-Hop, Multi-Document Question Answering Benchmark for Large Language Models
by: Zhu, Andrew, et al.
Published: (2024)

FineInstructions: Scaling Synthetic Instructions to Pre-Training Scale
by: Patel, Ajay, et al.
Published: (2026)

ThinknCheck: Grounded Claim Verification with Compact, Reasoning-Driven, and Interpretable Models
by: Rao, Delip, et al.
Published: (2026)

You Have Thirteen Hours in Which to Solve the Labyrinth: Enhancing AI Game Masters with Function Calling
by: Song, Jaewoo, et al.
Published: (2024)

ReDel: A Toolkit for LLM-Powered Recursive Multi-Agent Systems
by: Zhu, Andrew, et al.
Published: (2024)

Towards Faithful Model Explanation in NLP: A Survey
by: Lyu, Qing, et al.
Published: (2022)

Toward Beginner-Friendly LLMs for Language Learning: Controlling Difficulty in Conversation
by: Jin, Meiqing, et al.
Published: (2025)

Detecting and Correcting Reference Hallucinations in Commercial LLMs and Deep Research Agents
by: Rao, Delip, et al.
Published: (2026)

DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows
by: Patel, Ajay, et al.
Published: (2024)

First Steps Towards Overhearing LLM Agents: A Case Study With Dungeons & Dragons Gameplay
by: Zhu, Andrew, et al.
Published: (2025)

Evaluating Vision-Language Models on Bistable Images
by: Panagopoulou, Artemis, et al.
Published: (2024)

Low-Resource Authorship Style Transfer: Can Non-Famous Authors Be Imitated?
by: Patel, Ajay, et al.
Published: (2022)

CtrlRAG: Black-box Document Poisoning Attacks for Retrieval-Augmented Generation of Large Language Models
by: Sui, Runqi
Published: (2025)

OpenPI2.0: An Improved Dataset for Entity Tracking in Texts
by: Zhang, Li, et al.
Published: (2023)

WHAT-IF: Exploring Branching Narratives by Meta-Prompting Large Language Models
by: Huang, Runsheng "Anson", et al.
Published: (2024)

When Verification Fails: How Compositionally Infeasible Claims Escape Rejection
by: Liu, Muxin, et al.
Published: (2026)

MiRAGeNews: Multimodal Realistic AI-Generated News Detection
by: Huang, Runsheng, et al.
Published: (2024)

WithdrarXiv: A Large-Scale Dataset for Retraction Study
by: Rao, Delip, et al.
Published: (2024)

HOLODECK 2.0: Vision-Language-Guided 3D World Generation with Editing
by: Bian, Zixuan, et al.
Published: (2025)

NSF-SciFy: Mining the NSF Awards Database for Scientific Claims
by: Rao, Delip, et al.
Published: (2025)

GenAI Content Detection Task 3: Cross-Domain Machine-Generated Text Detection Challenge
by: Dugan, Liam, et al.
Published: (2025)

Large Language Models Can Self-Improve At Web Agent Tasks
by: Patel, Ajay, et al.
Published: (2024)

CALYPSO: LLMs as Dungeon Masters' Assistants
by: Zhu, Andrew, et al.
Published: (2023)

RAVENEA: A Benchmark for Multimodal Retrieval-Augmented Visual Culture Understanding
by: Li, Jiaang, et al.
Published: (2025)

Oreo: A Plug-in Context Reconstructor to Enhance Retrieval-Augmented Generation
by: Li, Sha, et al.
Published: (2025)

PDDLEGO: Iterative Planning in Textual Environments
by: Zhang, Li, et al.
Published: (2024)

Machine Text Detectors are Membership Inference Attacks
by: Koike, Ryuto, et al.
Published: (2025)

Multilingual Generative Retrieval via Cross-lingual Semantic Compression
by: Huang, Yuxin, et al.
Published: (2025)

XRAG: Cross-lingual Retrieval-Augmented Generation
by: Liu, Wei, et al.
Published: (2025)