Saved in:
| Main Authors: | Bai, Fan, Harrigian, Keith, Stremmel, Joel, Hassanzadeh, Hamid, Saeedi, Ardavan, Dredze, Mark |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2412.04573 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
LLMs are Better Than You Think: Label-Guided In-Context Learning for Named Entity Recognition
by: Bai, Fan, et al.
Published: (2025)
by: Bai, Fan, et al.
Published: (2025)
Are Clinical T5 Models Better for Clinical Text?
by: Li, Yahan, et al.
Published: (2024)
by: Li, Yahan, et al.
Published: (2024)
Generative Active Testing: Efficient LLM Evaluation via Proxy Task Adaptation
by: Ramakrishnan, Aashish Anantha, et al.
Published: (2026)
by: Ramakrishnan, Aashish Anantha, et al.
Published: (2026)
Task Matters: Knowledge Requirements Shape LLM Responses to Context-Memory Conflict
by: Sun, Kaiser, et al.
Published: (2025)
by: Sun, Kaiser, et al.
Published: (2025)
Consistency Training by Synthetic Question Generation for Conversational Question Answering
by: Hemati, Hamed Hematian, et al.
Published: (2024)
by: Hemati, Hamed Hematian, et al.
Published: (2024)
Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions
by: Chen, Hanjie, et al.
Published: (2024)
by: Chen, Hanjie, et al.
Published: (2024)
RAG LLMs are Not Safer: A Safety Analysis of Retrieval-Augmented Generation for Large Language Models
by: An, Bang, et al.
Published: (2025)
by: An, Bang, et al.
Published: (2025)
Amuro and Char: Analyzing the Relationship between Pre-Training and Fine-Tuning of Large Language Models
by: Sun, Kaiser, et al.
Published: (2024)
by: Sun, Kaiser, et al.
Published: (2024)
DnDScore: Decontextualization and Decomposition for Factuality Verification in Long-Form Text Generation
by: Wanner, Miriam, et al.
Published: (2024)
by: Wanner, Miriam, et al.
Published: (2024)
Schema-Driven Information Extraction from Heterogeneous Tables
by: Bai, Fan, et al.
Published: (2023)
by: Bai, Fan, et al.
Published: (2023)
Evaluating Biases in Context-Dependent Health Questions
by: Levy, Sharon, et al.
Published: (2024)
by: Levy, Sharon, et al.
Published: (2024)
Syn-QA2: Evaluating False Assumptions in Long-tail Questions with Synthetic QA Datasets
by: Daswani, Ashwin, et al.
Published: (2024)
by: Daswani, Ashwin, et al.
Published: (2024)
Can one size fit all?: Measuring Failure in Multi-Document Summarization Domain Transfer
by: DeLucia, Alexandra, et al.
Published: (2025)
by: DeLucia, Alexandra, et al.
Published: (2025)
Evaluating the Evaluators: Are readability metrics good measures of readability?
by: Cachola, Isabel, et al.
Published: (2025)
by: Cachola, Isabel, et al.
Published: (2025)
Evaluating Implicit Biases in LLM Reasoning through Logic Grid Puzzles
by: Jahara, Fatima, et al.
Published: (2025)
by: Jahara, Fatima, et al.
Published: (2025)
Reading, Not Thinking: Understanding and Bridging the Modality Gap When Text Becomes Pixels in Multimodal LLMs
by: Sun, Kaiser, et al.
Published: (2026)
by: Sun, Kaiser, et al.
Published: (2026)
Generalizing Visual Question Answering from Synthetic to Human-Written Questions via a Chain of QA with a Large Language Model
by: Kim, Taehee, et al.
Published: (2024)
by: Kim, Taehee, et al.
Published: (2024)
LLMs in Biomedicine: A study on clinical Named Entity Recognition
by: Monajatipoor, Masoud, et al.
Published: (2024)
by: Monajatipoor, Masoud, et al.
Published: (2024)
From Policy to Logic for Efficient and Interpretable Coverage Assessment
by: Pokharel, Rhitabrat, et al.
Published: (2026)
by: Pokharel, Rhitabrat, et al.
Published: (2026)
ExpertQA: Expert-Curated Questions and Attributed Answers
by: Malaviya, Chaitanya, et al.
Published: (2023)
by: Malaviya, Chaitanya, et al.
Published: (2023)
Towards Better Question Generation in QA-based Event Extraction
by: Hong, Zijin, et al.
Published: (2024)
by: Hong, Zijin, et al.
Published: (2024)
Weird Generalization is Weirdly Brittle
by: Wanner, Miriam, et al.
Published: (2026)
by: Wanner, Miriam, et al.
Published: (2026)
NeoQA: Evidence-based Question Answering with Generated News Events
by: Glockner, Max, et al.
Published: (2025)
by: Glockner, Max, et al.
Published: (2025)
Prompting-based Synthetic Data Generation for Few-Shot Question Answering
by: Schmidt, Maximilian, et al.
Published: (2024)
by: Schmidt, Maximilian, et al.
Published: (2024)
SciFaultyQA: Benchmarking LLMs on Faulty Science Question Detection with a GAN-Inspired Approach to Synthetic Dataset Generation
by: Kundu, Debarshi
Published: (2024)
by: Kundu, Debarshi
Published: (2024)
ResearchQA: Evaluating Scholarly Question Answering at Scale Across 75 Fields with Survey-Mined Questions and Rubrics
by: Yifei, Li S., et al.
Published: (2025)
by: Yifei, Li S., et al.
Published: (2025)
PolQA: Polish Question Answering Dataset
by: Rybak, Piotr, et al.
Published: (2022)
by: Rybak, Piotr, et al.
Published: (2022)
Building Open-Retrieval Conversational Question Answering Systems by Generating Synthetic Data and Decontextualizing User Questions
by: Vlachos, Christos, et al.
Published: (2025)
by: Vlachos, Christos, et al.
Published: (2025)
Synthetic Context Generation for Question Generation
by: Liu, Naiming, et al.
Published: (2024)
by: Liu, Naiming, et al.
Published: (2024)
Making FETCH! Happen: Finding Emergent Dog Whistles Through Common Habitats
by: Sasse, Kuleen, et al.
Published: (2024)
by: Sasse, Kuleen, et al.
Published: (2024)
MedScore: Generalizable Factuality Evaluation of Free-Form Medical Answers by Domain-adapted Claim Decomposition and Verification
by: Huang, Heyuan, et al.
Published: (2025)
by: Huang, Heyuan, et al.
Published: (2025)
JDocQA: Japanese Document Question Answering Dataset for Generative Language Models
by: Onami, Eri, et al.
Published: (2024)
by: Onami, Eri, et al.
Published: (2024)
On the Failure of Latent State Persistence in Large Language Models
by: Huang, Jen-tse, et al.
Published: (2025)
by: Huang, Jen-tse, et al.
Published: (2025)
Assessing The Potential Of Mid-Sized Language Models For Clinical QA
by: Bolton, Elliot, et al.
Published: (2024)
by: Bolton, Elliot, et al.
Published: (2024)
pdfQA: Diverse, Challenging, and Realistic Question Answering over PDFs
by: Schimanski, Tobias, et al.
Published: (2026)
by: Schimanski, Tobias, et al.
Published: (2026)
DebateQA: Evaluating Question Answering on Debatable Knowledge
by: Xu, Rongwu, et al.
Published: (2024)
by: Xu, Rongwu, et al.
Published: (2024)
A Closer Look at Claim Decomposition
by: Wanner, Miriam, et al.
Published: (2024)
by: Wanner, Miriam, et al.
Published: (2024)
Give me a hint: Can LLMs take a hint to solve math problems?
by: Agrawal, Vansh, et al.
Published: (2024)
by: Agrawal, Vansh, et al.
Published: (2024)
Synthetic Multimodal Question Generation
by: Wu, Ian, et al.
Published: (2024)
by: Wu, Ian, et al.
Published: (2024)
Improving Clinical NLP Performance through Language Model-Generated Synthetic Clinical Data
by: Chen, Shan, et al.
Published: (2024)
by: Chen, Shan, et al.
Published: (2024)
Similar Items
-
LLMs are Better Than You Think: Label-Guided In-Context Learning for Named Entity Recognition
by: Bai, Fan, et al.
Published: (2025) -
Are Clinical T5 Models Better for Clinical Text?
by: Li, Yahan, et al.
Published: (2024) -
Generative Active Testing: Efficient LLM Evaluation via Proxy Task Adaptation
by: Ramakrishnan, Aashish Anantha, et al.
Published: (2026) -
Task Matters: Knowledge Requirements Shape LLM Responses to Context-Memory Conflict
by: Sun, Kaiser, et al.
Published: (2025) -
Consistency Training by Synthetic Question Generation for Conversational Question Answering
by: Hemati, Hamed Hematian, et al.
Published: (2024)