:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Shaib, Chantal, Govindarajan, Venkata S., Barrow, Joe, Sun, Jiuding, Siu, Alexa F., Wallace, Byron C., Nenkova, Ani
Format:	Preprint
Published:	2024
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2403.00553
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

How Much Annotation is Needed to Compare Summarization Models?
by: Shaib, Chantal, et al.
Published: (2024)

Measuring Lexical Diversity of Synthetic Data Generated through Fine-Grained Persona Prompting
by: Kambhatla, Gauri, et al.
Published: (2025)

Measuring AI "Slop" in Text
by: Shaib, Chantal, et al.
Published: (2025)

Detection and Measurement of Syntactic Templates in Generated Text
by: Shaib, Chantal, et al.
Published: (2024)

Who Taught You That? Tracing Teachers in Model Distillation
by: Wadhwa, Somin, et al.
Published: (2025)

Learning the Wrong Lessons: Syntactic-Domain Spurious Correlations in Language Models
by: Shaib, Chantal, et al.
Published: (2025)

SQLSpace: A Representation Space for Text-to-SQL to Discover and Mitigate Robustness Gaps
by: Srikanth, Neha, et al.
Published: (2025)

Faithfulness vs. Safety: Evaluating LLM Behavior Under Counterfactual Medical Evidence
by: Mo, Kaijie, et al.
Published: (2026)

Future Lens: Anticipating Subsequent Tokens from a Single Hidden State
by: Pal, Koyena, et al.
Published: (2023)

Dark & Stormy: Modeling Humor in Sentences from the Bulwer-Lytton Fiction Contest
by: Govindarajan, Venkata S, et al.
Published: (2025)

Do Automatic Factuality Metrics Measure Factuality? A Critical Evaluation
by: Ramprasad, Sanjana, et al.
Published: (2024)

Open (Clinical) LLMs are Sensitive to Instruction Phrasings
by: Arroyo, Alberto Mario Ceballos, et al.
Published: (2024)

Compared to What? Baselines and Metrics for Counterfactual Prompting
by: Yang, Zihao, et al.
Published: (2026)

Can SAEs reveal and mitigate racial biases of LLMs in healthcare?
by: Ahsan, Hiba, et al.
Published: (2025)

Few-Shot Dialogue Summarization via Skeleton-Assisted Prompt Transfer in Prompt Tuning
by: Xie, Kaige, et al.
Published: (2023)

Lil-Bevo: Explorations of Strategies for Training Language Models in More Humanlike Ways
by: Govindarajan, Venkata S, et al.
Published: (2023)

Decomposing Generalization: Models of Generic, Habitual, and Episodic Statements
by: Govindarajan, Venkata Subrahmanyan, et al.
Published: (2019)

Measuring the inhomogeneous obscuration of agn with mid-infrared observations
by: M. Nenkova
Published: (2007)

Leveraging ChatGPT in Pharmacovigilance Event Extraction: An Empirical Study
by: Sun, Zhaoyue, et al.
Published: (2024)

Circuit Distillation
by: Wadhwa, Somin, et al.
Published: (2025)

Revisiting Relation Extraction in the era of Large Language Models
by: Wadhwa, Somin, et al.
Published: (2023)

Investigating Mysteries of CoT-Augmented Distillation
by: Wadhwa, Somin, et al.
Published: (2024)

Augmenting Rating-Scale Measures with Text-Derived Items Using the Information-Determined Scoring (IDS) Framework
by: Watson, Joe, et al.
Published: (2025)

Counterfactual Probing for the Influence of Affect and Specificity on Intergroup Bias
by: Govindarajan, Venkata S, et al.
Published: (2023)

VickreyFeedback: Cost-efficient Data Construction for Reinforcement Learning from Human Feedback
by: Zhang, Guoxi, et al.
Published: (2024)

Do they mean 'us'? Interpreting Referring Expressions in Intergroup Bias
by: Govindarajan, Venkata S, et al.
Published: (2024)

CommonForms: A Large, Diverse Dataset for Form Field Detection
by: Barrow, Joe
Published: (2025)

Vector Arithmetic in Concept and Token Subspaces
by: Feucht, Sheridan, et al.
Published: (2025)

Help! Need Advice on Identifying Advice
by: Govindarajan, Venkata Subrahmanyan, et al.
Published: (2020)

Multimodal QUD: Inquisitive Questions from Scientific Figures
by: Wu, Yating, et al.
Published: (2026)

Chain of Logic: Rule-Based Reasoning with Large Language Models
by: Servantez, Sergio, et al.
Published: (2024)

Automatically Extracting Numerical Results from Randomized Controlled Trials with Large Language Models
by: Yun, Hye Sun, et al.
Published: (2024)

InfoLossQA: Characterizing and Recovering Information Loss in Text Simplification
by: Trienes, Jan, et al.
Published: (2024)

Don't Pay Attention, PLANT It: Pretraining Attention via Learning-to-Rank
by: Roy, Debjyoti Saha, et al.
Published: (2024)

How people talk about each other: Modeling Generalized Intergroup Bias and Emotion
by: Govindarajan, Venkata S, et al.
Published: (2022)

Evaluating the Factuality of Zero-shot Summarizers Across Varied Domains
by: Ramprasad, Sanjana, et al.
Published: (2024)

Learning from Natural Language Explanations for Generalizable Entity Matching
by: Wadhwa, Somin, et al.
Published: (2024)

Do Multi-Document Summarization Models Synthesize?
by: DeYoung, Jay, et al.
Published: (2023)

ttta: Tools for Temporal Text Analysis
by: Lange, Kai-Robin, et al.
Published: (2025)

SafePassage: High-Fidelity Information Extraction with Black Box LLMs
by: Barrow, Joe, et al.
Published: (2025)