Saved in:
| Main Authors: | Li, Bryan, Luo, Fiona, Haider, Samar, Agashe, Adwait, Li, Tammy, Liu, Runqi, Miao, Muqing, Ramakrishnan, Shriya, Yuan, Yuan, Callison-Burch, Chris |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2410.01171 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
This Land is {Your, My} Land: Evaluating Geopolitical Biases in Language Models
by: Li, Bryan, et al.
Published: (2023)
by: Li, Bryan, et al.
Published: (2023)
Uncovering Differences in Persuasive Language in Russian versus English Wikipedia
by: Li, Bryan, et al.
Published: (2024)
by: Li, Bryan, et al.
Published: (2024)
Overhearing LLM Agents: A Survey, Taxonomy, and Roadmap
by: Zhu, Andrew, et al.
Published: (2025)
by: Zhu, Andrew, et al.
Published: (2025)
Autorubric: Unifying Rubric-based LLM Evaluation
by: Rao, Delip, et al.
Published: (2026)
by: Rao, Delip, et al.
Published: (2026)
Agreement Metrics for LLM-as-Judge Evaluation: What to Report and Why
by: Rao, Delip, et al.
Published: (2026)
by: Rao, Delip, et al.
Published: (2026)
BibTeX Citation Hallucinations in Scientific Publishing Agents: Evaluation and Mitigation
by: Rao, Delip, et al.
Published: (2026)
by: Rao, Delip, et al.
Published: (2026)
What Do Claim Verification Datasets Actually Test? A Reasoning Trace Analysis
by: Rao, Delip, et al.
Published: (2026)
by: Rao, Delip, et al.
Published: (2026)
The Media Bias Detector: A Framework for Annotating and Analyzing the News at Scale
by: Haider, Samar, et al.
Published: (2025)
by: Haider, Samar, et al.
Published: (2025)
mStyleDistance: Multilingual Style Embeddings and their Evaluation
by: Qiu, Justin, et al.
Published: (2025)
by: Qiu, Justin, et al.
Published: (2025)
Media Bias Detector: Designing and Implementing a Tool for Real-Time Selection and Framing Bias Analysis in News Coverage
by: Wang, Jenny S, et al.
Published: (2025)
by: Wang, Jenny S, et al.
Published: (2025)
Choice-75: A Dataset on Decision Branching in Script Learning
by: Hou, Zhaoyi Joey, et al.
Published: (2023)
by: Hou, Zhaoyi Joey, et al.
Published: (2023)
FanOutQA: A Multi-Hop, Multi-Document Question Answering Benchmark for Large Language Models
by: Zhu, Andrew, et al.
Published: (2024)
by: Zhu, Andrew, et al.
Published: (2024)
FineInstructions: Scaling Synthetic Instructions to Pre-Training Scale
by: Patel, Ajay, et al.
Published: (2026)
by: Patel, Ajay, et al.
Published: (2026)
ThinknCheck: Grounded Claim Verification with Compact, Reasoning-Driven, and Interpretable Models
by: Rao, Delip, et al.
Published: (2026)
by: Rao, Delip, et al.
Published: (2026)
You Have Thirteen Hours in Which to Solve the Labyrinth: Enhancing AI Game Masters with Function Calling
by: Song, Jaewoo, et al.
Published: (2024)
by: Song, Jaewoo, et al.
Published: (2024)
ReDel: A Toolkit for LLM-Powered Recursive Multi-Agent Systems
by: Zhu, Andrew, et al.
Published: (2024)
by: Zhu, Andrew, et al.
Published: (2024)
Towards Faithful Model Explanation in NLP: A Survey
by: Lyu, Qing, et al.
Published: (2022)
by: Lyu, Qing, et al.
Published: (2022)
Toward Beginner-Friendly LLMs for Language Learning: Controlling Difficulty in Conversation
by: Jin, Meiqing, et al.
Published: (2025)
by: Jin, Meiqing, et al.
Published: (2025)
Detecting and Correcting Reference Hallucinations in Commercial LLMs and Deep Research Agents
by: Rao, Delip, et al.
Published: (2026)
by: Rao, Delip, et al.
Published: (2026)
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows
by: Patel, Ajay, et al.
Published: (2024)
by: Patel, Ajay, et al.
Published: (2024)
First Steps Towards Overhearing LLM Agents: A Case Study With Dungeons & Dragons Gameplay
by: Zhu, Andrew, et al.
Published: (2025)
by: Zhu, Andrew, et al.
Published: (2025)
Evaluating Vision-Language Models on Bistable Images
by: Panagopoulou, Artemis, et al.
Published: (2024)
by: Panagopoulou, Artemis, et al.
Published: (2024)
Low-Resource Authorship Style Transfer: Can Non-Famous Authors Be Imitated?
by: Patel, Ajay, et al.
Published: (2022)
by: Patel, Ajay, et al.
Published: (2022)
CtrlRAG: Black-box Document Poisoning Attacks for Retrieval-Augmented Generation of Large Language Models
by: Sui, Runqi
Published: (2025)
by: Sui, Runqi
Published: (2025)
OpenPI2.0: An Improved Dataset for Entity Tracking in Texts
by: Zhang, Li, et al.
Published: (2023)
by: Zhang, Li, et al.
Published: (2023)
WHAT-IF: Exploring Branching Narratives by Meta-Prompting Large Language Models
by: Huang, Runsheng "Anson", et al.
Published: (2024)
by: Huang, Runsheng "Anson", et al.
Published: (2024)
When Verification Fails: How Compositionally Infeasible Claims Escape Rejection
by: Liu, Muxin, et al.
Published: (2026)
by: Liu, Muxin, et al.
Published: (2026)
MiRAGeNews: Multimodal Realistic AI-Generated News Detection
by: Huang, Runsheng, et al.
Published: (2024)
by: Huang, Runsheng, et al.
Published: (2024)
WithdrarXiv: A Large-Scale Dataset for Retraction Study
by: Rao, Delip, et al.
Published: (2024)
by: Rao, Delip, et al.
Published: (2024)
HOLODECK 2.0: Vision-Language-Guided 3D World Generation with Editing
by: Bian, Zixuan, et al.
Published: (2025)
by: Bian, Zixuan, et al.
Published: (2025)
NSF-SciFy: Mining the NSF Awards Database for Scientific Claims
by: Rao, Delip, et al.
Published: (2025)
by: Rao, Delip, et al.
Published: (2025)
GenAI Content Detection Task 3: Cross-Domain Machine-Generated Text Detection Challenge
by: Dugan, Liam, et al.
Published: (2025)
by: Dugan, Liam, et al.
Published: (2025)
Large Language Models Can Self-Improve At Web Agent Tasks
by: Patel, Ajay, et al.
Published: (2024)
by: Patel, Ajay, et al.
Published: (2024)
CALYPSO: LLMs as Dungeon Masters' Assistants
by: Zhu, Andrew, et al.
Published: (2023)
by: Zhu, Andrew, et al.
Published: (2023)
RAVENEA: A Benchmark for Multimodal Retrieval-Augmented Visual Culture Understanding
by: Li, Jiaang, et al.
Published: (2025)
by: Li, Jiaang, et al.
Published: (2025)
Oreo: A Plug-in Context Reconstructor to Enhance Retrieval-Augmented Generation
by: Li, Sha, et al.
Published: (2025)
by: Li, Sha, et al.
Published: (2025)
PDDLEGO: Iterative Planning in Textual Environments
by: Zhang, Li, et al.
Published: (2024)
by: Zhang, Li, et al.
Published: (2024)
Machine Text Detectors are Membership Inference Attacks
by: Koike, Ryuto, et al.
Published: (2025)
by: Koike, Ryuto, et al.
Published: (2025)
Multilingual Generative Retrieval via Cross-lingual Semantic Compression
by: Huang, Yuxin, et al.
Published: (2025)
by: Huang, Yuxin, et al.
Published: (2025)
XRAG: Cross-lingual Retrieval-Augmented Generation
by: Liu, Wei, et al.
Published: (2025)
by: Liu, Wei, et al.
Published: (2025)
Similar Items
-
This Land is {Your, My} Land: Evaluating Geopolitical Biases in Language Models
by: Li, Bryan, et al.
Published: (2023) -
Uncovering Differences in Persuasive Language in Russian versus English Wikipedia
by: Li, Bryan, et al.
Published: (2024) -
Overhearing LLM Agents: A Survey, Taxonomy, and Roadmap
by: Zhu, Andrew, et al.
Published: (2025) -
Autorubric: Unifying Rubric-based LLM Evaluation
by: Rao, Delip, et al.
Published: (2026) -
Agreement Metrics for LLM-as-Judge Evaluation: What to Report and Why
by: Rao, Delip, et al.
Published: (2026)