Saved in:
| Main Authors: | Xie, Huiyuan, Steffek, Felix, de Faria, Joana Ribeiro, Carter, Christine, Rutherford, Jonathan |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2409.08098 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Automatic Information Extraction From Employment Tribunal Judgements Using Large Language Models
by: de Faria, Joana Ribeiro, et al.
Published: (2024)
by: de Faria, Joana Ribeiro, et al.
Published: (2024)
Topic Classification of Case Law Using a Large Language Model and a New Taxonomy for UK Law: AI Insights into Summary Judgment
by: Sargeant, Holli, et al.
Published: (2024)
by: Sargeant, Holli, et al.
Published: (2024)
Large Language Models' Complicit Responses to Illicit Instructions across Socio-Legal Contexts
by: Wang, Xing, et al.
Published: (2025)
by: Wang, Xing, et al.
Published: (2025)
LLM vs. Lawyers: Identifying a Subset of Summary Judgments in a Large UK Case Law Dataset
by: Izzidien, Ahmed, et al.
Published: (2024)
by: Izzidien, Ahmed, et al.
Published: (2024)
AnnoCaseLaw: A Richly-Annotated Dataset For Benchmarking Explainable Legal Judgment Prediction
by: Sesodia, Magnus, et al.
Published: (2025)
by: Sesodia, Magnus, et al.
Published: (2025)
Representing the Under-Represented: Cultural and Core Capability Benchmarks for Developing Thai Large Language Models
by: Kim, Dahyun, et al.
Published: (2024)
by: Kim, Dahyun, et al.
Published: (2024)
TACLer: Tailored Curriculum Reinforcement Learning for Efficient Reasoning
by: Lai, Huiyuan, et al.
Published: (2026)
by: Lai, Huiyuan, et al.
Published: (2026)
The Cambridge Law Corpus: A Dataset for Legal AI Research
by: Östling, Andreas, et al.
Published: (2023)
by: Östling, Andreas, et al.
Published: (2023)
CaseReportBench: An LLM Benchmark Dataset for Dense Information Extraction in Clinical Case Reports
by: Zhang, Xiao Yu Cindy, et al.
Published: (2025)
by: Zhang, Xiao Yu Cindy, et al.
Published: (2025)
AyutthayaAlpha: A Thai-Latin Script Transliteration Transformer
by: Lauc, Davor, et al.
Published: (2024)
by: Lauc, Davor, et al.
Published: (2024)
Evaluating the Quality of Benchmark Datasets for Low-Resource Languages: A Case Study on Turkish
by: Cengiz, Ayşe Aysu, et al.
Published: (2025)
by: Cengiz, Ayşe Aysu, et al.
Published: (2025)
CliniBench: A Clinical Outcome Prediction Benchmark for Generative and Encoder-Based Language Models
by: Grundmann, Paul, et al.
Published: (2025)
by: Grundmann, Paul, et al.
Published: (2025)
PhayaThaiBERT: Enhancing a Pretrained Thai Language Model with Unassimilated Loanwords
by: Sriwirote, Panyut, et al.
Published: (2023)
by: Sriwirote, Panyut, et al.
Published: (2023)
MASSW: A New Dataset and Benchmark Tasks for AI-Assisted Scientific Workflows
by: Zhang, Xingjian, et al.
Published: (2024)
by: Zhang, Xingjian, et al.
Published: (2024)
DatasetResearch: Benchmarking Agent Systems for Demand-Driven Dataset Discovery
by: Li, Keyu, et al.
Published: (2025)
by: Li, Keyu, et al.
Published: (2025)
OpenJAI-v1.0: An Open Thai Large Language Model
by: Trakuekul, Pontakorn, et al.
Published: (2025)
by: Trakuekul, Pontakorn, et al.
Published: (2025)
Topic-Conversation Relevance (TCR) Dataset and Benchmarks
by: Fan, Yaran, et al.
Published: (2024)
by: Fan, Yaran, et al.
Published: (2024)
C2RUST-BENCH: A Minimized, Representative Dataset for C-to-Rust Transpilation Evaluation
by: Sirlanci, Melih, et al.
Published: (2025)
by: Sirlanci, Melih, et al.
Published: (2025)
GaRAGe: A Benchmark with Grounding Annotations for RAG Evaluation
by: Sorodoc, Ionut-Teodor, et al.
Published: (2025)
by: Sorodoc, Ionut-Teodor, et al.
Published: (2025)
Who Benchmarks the Benchmarks? A Case Study of LLM Evaluation in Icelandic
by: Ingimundarson, Finnur Ágúst, et al.
Published: (2026)
by: Ingimundarson, Finnur Ágúst, et al.
Published: (2026)
Towards Explainability in Legal Outcome Prediction Models
by: Valvoda, Josef, et al.
Published: (2024)
by: Valvoda, Josef, et al.
Published: (2024)
MiroEval: Benchmarking Multimodal Deep Research Agents in Process and Outcome
by: Ye, Fangda, et al.
Published: (2026)
by: Ye, Fangda, et al.
Published: (2026)
Jawaher: A Multidialectal Dataset of Arabic Proverbs for LLM Benchmarking
by: Magdy, Samar M., et al.
Published: (2025)
by: Magdy, Samar M., et al.
Published: (2025)
SwaQuAD-24: QA Benchmark Dataset in Swahili
by: Kondoro, Alfred Malengo
Published: (2024)
by: Kondoro, Alfred Malengo
Published: (2024)
BR-TaxQA-R: A Dataset for Question Answering with References for Brazilian Personal Income Tax Law, including case law
by: Júnior, Juvenal Domingos, et al.
Published: (2025)
by: Júnior, Juvenal Domingos, et al.
Published: (2025)
Exposing Assumptions in AI Benchmarks through Cognitive Modelling
by: Rystrøm, Jonathan H., et al.
Published: (2024)
by: Rystrøm, Jonathan H., et al.
Published: (2024)
TEG-DB: A Comprehensive Dataset and Benchmark of Textual-Edge Graphs
by: Li, Zhuofeng, et al.
Published: (2024)
by: Li, Zhuofeng, et al.
Published: (2024)
PulseLM: A Foundation Dataset and Benchmark for PPG-Text Learning
by: Pham, Hung Manh, et al.
Published: (2026)
by: Pham, Hung Manh, et al.
Published: (2026)
Emotion and Intent Joint Understanding in Multimodal Conversation: A Benchmarking Dataset
by: Liu, Rui, et al.
Published: (2024)
by: Liu, Rui, et al.
Published: (2024)
LitBench: A Benchmark and Dataset for Reliable Evaluation of Creative Writing
by: Fein, Daniel, et al.
Published: (2025)
by: Fein, Daniel, et al.
Published: (2025)
Ax-to-Grind Urdu: Benchmark Dataset for Urdu Fake News Detection
by: Harris, Sheetal, et al.
Published: (2024)
by: Harris, Sheetal, et al.
Published: (2024)
BLUCK: A Benchmark Dataset for Bengali Linguistic Understanding and Cultural Knowledge
by: Kabir, Daeen, et al.
Published: (2025)
by: Kabir, Daeen, et al.
Published: (2025)
Blind Men and the Elephant: Diverse Perspectives on Gender Stereotypes in Benchmark Datasets
by: Zakizadeh, Mahdi, et al.
Published: (2025)
by: Zakizadeh, Mahdi, et al.
Published: (2025)
Breaking the Silence: A Dataset and Benchmark for Bangla Text-to-Gloss Translation
by: Abdullah, Sharif Mohammad, et al.
Published: (2025)
by: Abdullah, Sharif Mohammad, et al.
Published: (2025)
Order Matters in Hallucination: Reasoning Order as Benchmark and Reflexive Prompting for Large-Language-Models
by: Xie, Zikai
Published: (2024)
by: Xie, Zikai
Published: (2024)
Can Large Language Models Predict the Outcome of Judicial Decisions?
by: Kmainasi, Mohamed Bayan, et al.
Published: (2025)
by: Kmainasi, Mohamed Bayan, et al.
Published: (2025)
HalluVerse25: Fine-grained Multilingual Benchmark Dataset for LLM Hallucinations
by: Abdaljalil, Samir, et al.
Published: (2025)
by: Abdaljalil, Samir, et al.
Published: (2025)
LaMPilot: An Open Benchmark Dataset for Autonomous Driving with Language Model Programs
by: Ma, Yunsheng, et al.
Published: (2023)
by: Ma, Yunsheng, et al.
Published: (2023)
MCFEND: A Multi-source Benchmark Dataset for Chinese Fake News Detection
by: Li, Yupeng, et al.
Published: (2024)
by: Li, Yupeng, et al.
Published: (2024)
LLM-Generated Negative News Headlines Dataset: Creation and Benchmarking Against Real Journalism
by: Babalola, Olusola, et al.
Published: (2025)
by: Babalola, Olusola, et al.
Published: (2025)
Similar Items
-
Automatic Information Extraction From Employment Tribunal Judgements Using Large Language Models
by: de Faria, Joana Ribeiro, et al.
Published: (2024) -
Topic Classification of Case Law Using a Large Language Model and a New Taxonomy for UK Law: AI Insights into Summary Judgment
by: Sargeant, Holli, et al.
Published: (2024) -
Large Language Models' Complicit Responses to Illicit Instructions across Socio-Legal Contexts
by: Wang, Xing, et al.
Published: (2025) -
LLM vs. Lawyers: Identifying a Subset of Summary Judgments in a Large UK Case Law Dataset
by: Izzidien, Ahmed, et al.
Published: (2024) -
AnnoCaseLaw: A Richly-Annotated Dataset For Benchmarking Explainable Legal Judgment Prediction
by: Sesodia, Magnus, et al.
Published: (2025)