Inhaltsangabe: :: Library Catalog

Gespeichert in:

Bibliographische Detailangaben
1. Verfasser:	Singh, Rajshree
Format:	Recurso digital
Sprache:
Veröffentlicht:	Zenodo 2026
Schlagworte:	legal QA, faithfulness, NLP
Online-Zugang:	https://doi.org/10.5281/zenodo.19202908
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Inhaltsangabe:

This dataset supports research on faithfulness in citation-grounded legal question answering (QA). It integrates and extends two publicly available sources to construct a grounded benchmark for Indian Supreme Court judgments. The first source is the IndicLegalQA dataset (Veningston & Mishra, 2024), which contains 10,002 question–answer pairs derived from 1,256 Supreme Court cases. Each QA pair captures key legal facts, issues, or principles along with metadata such as case names and judgment dates. The second source is a large-scale Indian Supreme Court judgments dataset from Kaggle, comprising approximately 47,000 cases with structured metadata and links to judgment PDFs. We align these two datasets through a multi-stage pipeline involving: <ul> <li>normalization of case names and dates,</li> <li>fuzzy matching between QA entries and judgment metadata, and</li> <li>resolution of metadata links to actual judgment PDF files.</li> </ul> This results in a grounded dataset where each QA instance is linked to its source judgment document. Dataset Statistics Total QA pairs: 10,002 Grounded QA pairs: 8,337 Unique judgment documents: 1,003 Chunked retrieval corpus: 23,577 text chunks Included Files qa_judgment_master_resolved.csv → Grounded QA–judgment mapping dataset faithfulness_annotation_labeled_batch30.csv → Human-annotated subset for faithfulness evaluation  retrieval and chunk corpora files Purpose This dataset enables: <ul> <li>evaluation of citation-aware legal QA systems,</li> <li>analysis of faithfulness vs. grounding, and</li> <li>development of retrieval + generation pipelines for legal AI.</li> </ul> Our experiments show that even perfectly cited answers can be unfaithful, highlighting the need for faithfulness-aware evaluation frameworks. Data Sources Veningston, K., & Mishra, A. (2024). IndicLegalQA Dataset. Mendeley Data. https://doi.org/10.17632/gf8n8cnmvc.2 Indian Supreme Court Judgments Dataset (Kaggle): https://www.kaggle.com/datasets/vangap/indian-supreme-court-judgments

Ähnliche Einträge