Saved in:
| Main Authors: | Glockner, Max, Jiang, Xiang, Ribeiro, Leonardo F. R., Gurevych, Iryna, Dreyer, Markus |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.05949 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Grounding Fallacies Misrepresenting Scientific Publications in Evidence
by: Glockner, Max, et al.
Published: (2024)
by: Glockner, Max, et al.
Published: (2024)
PeerQA: A Scientific Question Answering Dataset from Peer Reviews
by: Baumgärtner, Tim, et al.
Published: (2025)
by: Baumgärtner, Tim, et al.
Published: (2025)
ConspirED: A Dataset for Cognitive Traits of Conspiracy Theories and Large Language Model Safety
by: Bates, Luke, et al.
Published: (2025)
by: Bates, Luke, et al.
Published: (2025)
Self-Rationalization in the Wild: A Large Scale Out-of-Distribution Evaluation on NLI-related tasks
by: Yang, Jing, et al.
Published: (2025)
by: Yang, Jing, et al.
Published: (2025)
Missci: Reconstructing Fallacies in Misrepresented Science
by: Glockner, Max, et al.
Published: (2024)
by: Glockner, Max, et al.
Published: (2024)
M2QA: Multi-domain Multilingual Question Answering
by: Engländer, Leon, et al.
Published: (2024)
by: Engländer, Leon, et al.
Published: (2024)
Localizing and Mitigating Errors in Long-form Question Answering
by: Sachdeva, Rachneet, et al.
Published: (2024)
by: Sachdeva, Rachneet, et al.
Published: (2024)
DARA: Decomposition-Alignment-Reasoning Autonomous Language Agent for Question Answering over Knowledge Graphs
by: Fang, Haishuo, et al.
Published: (2024)
by: Fang, Haishuo, et al.
Published: (2024)
SciCoQA: Quality Assurance for Scientific Paper--Code Alignment
by: Baumgärtner, Tim, et al.
Published: (2026)
by: Baumgärtner, Tim, et al.
Published: (2026)
Automatic Reviewers Fail to Detect Faulty Reasoning in Research Papers: A New Counterfactual Evaluation Framework
by: Dycke, Nils, et al.
Published: (2025)
by: Dycke, Nils, et al.
Published: (2025)
Turning Logic Against Itself : Probing Model Defenses Through Contrastive Questions
by: Sachdeva, Rachneet, et al.
Published: (2025)
by: Sachdeva, Rachneet, et al.
Published: (2025)
Expert Preference-based Evaluation of Automated Related Work Generation
by: Şahinuç, Furkan, et al.
Published: (2025)
by: Şahinuç, Furkan, et al.
Published: (2025)
HistoryBankQA: Multilingual Temporal Question Answering on Historical Events
by: Mandal, Biswadip, et al.
Published: (2025)
by: Mandal, Biswadip, et al.
Published: (2025)
Towards Better Question Generation in QA-based Event Extraction
by: Hong, Zijin, et al.
Published: (2024)
by: Hong, Zijin, et al.
Published: (2024)
Enhancing Depression Detection via Question-wise Modality Fusion
by: Mandal, Aishik, et al.
Published: (2025)
by: Mandal, Aishik, et al.
Published: (2025)
Citation Failure: Definition, Analysis and Efficient Mitigation
by: Buchmann, Jan, et al.
Published: (2025)
by: Buchmann, Jan, et al.
Published: (2025)
Like a Good Nearest Neighbor: Practical Content Moderation and Text Classification
by: Bates, Luke, et al.
Published: (2023)
by: Bates, Luke, et al.
Published: (2023)
IRCoder: Intermediate Representations Make Language Models Robust Multilingual Code Generators
by: Paul, Indraneil, et al.
Published: (2024)
by: Paul, Indraneil, et al.
Published: (2024)
Dive into the Chasm: Probing the Gap between In- and Cross-Topic Generalization
by: Waldis, Andreas, et al.
Published: (2024)
by: Waldis, Andreas, et al.
Published: (2024)
PolQA: Polish Question Answering Dataset
by: Rybak, Piotr, et al.
Published: (2022)
by: Rybak, Piotr, et al.
Published: (2022)
NewsRECON: News article REtrieval for image CONtextualization
by: Tonglet, Jonathan, et al.
Published: (2026)
by: Tonglet, Jonathan, et al.
Published: (2026)
Measuring Retrieval Complexity in Question Answering Systems
by: Gabburo, Matteo, et al.
Published: (2024)
by: Gabburo, Matteo, et al.
Published: (2024)
Hierarchical Latent Structures in Data Generation Process Unify Mechanistic Phenomena across Scale
by: Rohweder, Jonas, et al.
Published: (2026)
by: Rohweder, Jonas, et al.
Published: (2026)
MAGneT: Coordinated Multi-Agent Generation of Synthetic Multi-Turn Mental Health Counseling Sessions
by: Mandal, Aishik, et al.
Published: (2025)
by: Mandal, Aishik, et al.
Published: (2025)
pdfQA: Diverse, Challenging, and Realistic Question Answering over PDFs
by: Schimanski, Tobias, et al.
Published: (2026)
by: Schimanski, Tobias, et al.
Published: (2026)
RefusalBench: Generative Evaluation of Selective Refusal in Grounded Language Models
by: Muhamed, Aashiq, et al.
Published: (2025)
by: Muhamed, Aashiq, et al.
Published: (2025)
DOCE: Finding the Sweet Spot for Execution-Based Code Generation
by: Li, Haau-Sing, et al.
Published: (2024)
by: Li, Haau-Sing, et al.
Published: (2024)
Commitment Checklist: Auditing Author Commitments in Peer Review
by: Chen, Chung-Chi, et al.
Published: (2026)
by: Chen, Chung-Chi, et al.
Published: (2026)
DebateQA: Evaluating Question Answering on Debatable Knowledge
by: Xu, Rongwu, et al.
Published: (2024)
by: Xu, Rongwu, et al.
Published: (2024)
Constrained C-Test Generation via Mixed-Integer Programming
by: Lee, Ji-Ung, et al.
Published: (2024)
by: Lee, Ji-Ung, et al.
Published: (2024)
JDocQA: Japanese Document Question Answering Dataset for Generative Language Models
by: Onami, Eri, et al.
Published: (2024)
by: Onami, Eri, et al.
Published: (2024)
Systematic Task Exploration with LLMs: A Study in Citation Text Generation
by: Şahinuç, Furkan, et al.
Published: (2024)
by: Şahinuç, Furkan, et al.
Published: (2024)
Identifying Aspects in Peer Reviews
by: Lu, Sheng, et al.
Published: (2025)
by: Lu, Sheng, et al.
Published: (2025)
Token Weighting for Long-Range Language Modeling
by: Helm, Falko, et al.
Published: (2025)
by: Helm, Falko, et al.
Published: (2025)
COVE: COntext and VEracity prediction for out-of-context images
by: Tonglet, Jonathan, et al.
Published: (2025)
by: Tonglet, Jonathan, et al.
Published: (2025)
M4FC: a Multimodal, Multilingual, Multicultural, Multitask Real-World Fact-Checking Dataset
by: Geng, Jiahui, et al.
Published: (2025)
by: Geng, Jiahui, et al.
Published: (2025)
How to Handle Different Types of Out-of-Distribution Scenarios in Computational Argumentation? A Comprehensive and Fine-Grained Field Study
by: Waldis, Andreas, et al.
Published: (2023)
by: Waldis, Andreas, et al.
Published: (2023)
Robust Utility-Preserving Text Anonymization Based on Large Language Models
by: Yang, Tianyu, et al.
Published: (2024)
by: Yang, Tianyu, et al.
Published: (2024)
Re3: A Holistic Framework and Dataset for Modeling Collaborative Document Revision
by: Ruan, Qian, et al.
Published: (2024)
by: Ruan, Qian, et al.
Published: (2024)
Overview of PerpectiveArg2024: The First Shared Task on Perspective Argument Retrieval
by: Falk, Neele, et al.
Published: (2024)
by: Falk, Neele, et al.
Published: (2024)
Similar Items
-
Grounding Fallacies Misrepresenting Scientific Publications in Evidence
by: Glockner, Max, et al.
Published: (2024) -
PeerQA: A Scientific Question Answering Dataset from Peer Reviews
by: Baumgärtner, Tim, et al.
Published: (2025) -
ConspirED: A Dataset for Cognitive Traits of Conspiracy Theories and Large Language Model Safety
by: Bates, Luke, et al.
Published: (2025) -
Self-Rationalization in the Wild: A Large Scale Out-of-Distribution Evaluation on NLI-related tasks
by: Yang, Jing, et al.
Published: (2025) -
Missci: Reconstructing Fallacies in Misrepresented Science
by: Glockner, Max, et al.
Published: (2024)