:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Jiang, Yuru, Ding, Wenxuan, Feng, Shangbin, Durrett, Greg, Tsvetkov, Yulia
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2506.04721
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Among Us: Measuring and Mitigating Malicious Contributions in Model Collaboration Systems
by: Yang, Ziyuan, et al.
Published: (2026)

Varying Shades of Wrong: Aligning LLMs with Wrong Answers Only
by: Yao, Jihan, et al.
Published: (2024)

RankAlign: A Ranking View of the Generator-Validator Gap in Large Language Models
by: Rodriguez, Juan Diego, et al.
Published: (2025)

ScienceMeter: Tracking Scientific Knowledge Updates in Language Models
by: Wang, Yike, et al.
Published: (2025)

Knowledge Crosswords: Geometric Knowledge Reasoning with Large Language Models
by: Ding, Wenxuan, et al.
Published: (2023)

Don't Hallucinate, Abstain: Identifying LLM Knowledge Gaps via Multi-LLM Collaboration
by: Feng, Shangbin, et al.
Published: (2024)

The Single-Multi Evolution Loop for Self-Improving Model Collaboration Systems
by: Feng, Shangbin, et al.
Published: (2026)

Data Swarms: Optimizable Generation of Synthetic Evaluation Data
by: Feng, Shangbin, et al.
Published: (2025)

Calibrate-Then-Act: Cost-Aware Exploration in LLM Agents
by: Ding, Wenxuan, et al.
Published: (2026)

Can Language Models Solve Graph Problems in Natural Language?
by: Wang, Heng, et al.
Published: (2023)

Knowledge Card: Filling LLMs' Knowledge Gaps with Plug-in Specialized Language Models
by: Feng, Shangbin, et al.
Published: (2023)

What Does the Bot Say? Opportunities and Risks of Large Language Models in Social Media Bot Detection
by: Feng, Shangbin, et al.
Published: (2024)

Teaching LLMs to Abstain across Languages via Multilingual Feedback
by: Feng, Shangbin, et al.
Published: (2024)

Resolving Knowledge Conflicts in Large Language Models
by: Wang, Yike, et al.
Published: (2023)

MentorCollab: Selective Large-to-Small Inference-Time Guidance for Efficient Reasoning
by: Wang, Haojin, et al.
Published: (2026)

KGQuiz: Evaluating the Generalization of Encoded Knowledge in Large Language Models
by: Bai, Yuyang, et al.
Published: (2023)

Don't Throw Away Your Pretrained Model
by: Feng, Shangbin, et al.
Published: (2025)

Small Reward Models via Backward Inference
by: Wang, Yike, et al.
Published: (2026)

Know Your Limits: A Survey of Abstention in Large Language Models
by: Wen, Bingbing, et al.
Published: (2024)

GuessBench: Sensemaking Multimodal Creativity in the Wild
by: Zhu, Zifeng, et al.
Published: (2025)

DELL: Generating Reactions and Explanations for LLM-Based Misinformation Detection
by: Wan, Herun, et al.
Published: (2024)

P^3SUM: Preserving Author's Perspective in News Summarization with Diffusion Language Models
by: Liu, Yuhan, et al.
Published: (2023)

FACTS&EVIDENCE: An Interactive Tool for Transparent Fine-Grained Factual Verification of Machine-Generated Text
by: Boonsanong, Varich, et al.
Published: (2025)

Generalizable LLM Learning of Graph Synthetic Data with Post-training Alignment
by: Zhang, Yizhuo, et al.
Published: (2025)

From Distributional to Overton Pluralism: Investigating Large Language Model Alignment
by: Lake, Thom, et al.
Published: (2024)

Modular Pluralism: Pluralistic Alignment via Multi-LLM Collaboration
by: Feng, Shangbin, et al.
Published: (2024)

Finding Flawed Fictions: Evaluating Complex Reasoning in Language Models via Plot Hole Detection
by: Ahuja, Kabir, et al.
Published: (2025)

Can LLM Graph Reasoning Generalize beyond Pattern Memorization?
by: Zhang, Yizhuo, et al.
Published: (2024)

MoCo: A One-Stop Shop for Model Collaboration Research
by: Feng, Shangbin, et al.
Published: (2026)

Molecular Facts: Desiderata for Decontextualization in LLM Fact Verification
by: Gunjal, Anisha, et al.
Published: (2024)

Stumbling Blocks: Stress Testing the Robustness of Machine-Generated Text Detectors Under Attacks
by: Wang, Yichen, et al.
Published: (2024)

MediQ: Question-Asking LLMs and a Benchmark for Reliable Interactive Clinical Reasoning
by: Li, Shuyue Stella, et al.
Published: (2024)

When One LLM Drools, Multi-LLM Collaboration Rules
by: Feng, Shangbin, et al.
Published: (2025)

SynthesizRR: Generating Diverse Datasets with Retrieval Augmentation
by: Divekar, Abhishek, et al.
Published: (2024)

Understanding Synthetic Context Extension via Retrieval Heads
by: Zhao, Xinyu, et al.
Published: (2024)

LoFiT: Localized Fine-tuning on LLM Representations
by: Yin, Fangcong, et al.
Published: (2024)

Using Natural Language Explanations to Rescale Human Judgments
by: Wadhwa, Manya, et al.
Published: (2023)

Learning to Refine with Fine-Grained Natural Language Feedback
by: Wadhwa, Manya, et al.
Published: (2024)

MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents
by: Tang, Liyan, et al.
Published: (2024)

Heterogeneous Swarms: Jointly Optimizing Model Roles and Weights for Multi-LLM Systems
by: Feng, Shangbin, et al.
Published: (2025)