:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Schmoll, Jonathan, Jatowt, Adam
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2512.24289
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Navigating Tomorrow: Reliably Assessing Large Language Models Performance on Future Event Prediction
by: Nako, Petraq, et al.
Published: (2025)

Analyzing the Role of Context in Forecasting with Large Language Models
by: Mutschlechner, Gerrit, et al.
Published: (2025)

Temporal Blind Spots in Large Language Models
by: Wallat, Jonas, et al.
Published: (2024)

Pretraining Exposure Explains Popularity Judgments in Large Language Models
by: Mozafari, Jamshid, et al.
Published: (2026)

Statements: Universal Information Extraction from Tables with Large Language Models for ESG KPIs
by: Mishra, Lokesh, et al.
Published: (2024)

Question Difficulty Estimation for Large Language Models via Answer Plausibility Scoring
by: Mozafari, Jamshid, et al.
Published: (2026)

SustainableQA: A Comprehensive Question Answering Dataset for Corporate Sustainability and EU Taxonomy Reporting
by: Ali, Mohammed, et al.
Published: (2025)

Towards Effective Time-Aware Language Representation: Exploring Enhanced Temporal Understanding in Language Models
by: Wang, Jiexin, et al.
Published: (2024)

Evaluating List Construction and Temporal Understanding capabilities of Large Language Models
by: Dumitru, Alexandru, et al.
Published: (2025)

Transformers and Language Models in Form Understanding: A Comprehensive Review of Scanned Document Analysis
by: Abdallah, Abdelrahman, et al.
Published: (2024)

Wisdom of the Crowds in Forecasting: Forecast Summarization for Supporting Future Event Prediction
by: Saha, Anisha, et al.
Published: (2025)

Temporal Validity Change Prediction
by: Wenzel, Georg, et al.
Published: (2024)

ChroniclingAmericaQA: A Large-scale Question Answering Dataset based on Historical American Newspaper Pages
by: Piryani, Bhawna, et al.
Published: (2024)

Exploring NLP Benchmarks in an Extremely Low-Resource Setting
by: Nuha, Ulin, et al.
Published: (2025)

Generator-Retriever-Generator Approach for Open-Domain Question Answering
by: Abdallah, Abdelrahman, et al.
Published: (2023)

AMuRD: Annotated Arabic-English Receipt Dataset for Key Information Extraction and Classification
by: Abdallah, Abdelrahman, et al.
Published: (2023)

Enriching Taxonomies Using Large Language Models
by: Ghamlouch, Zeinab, et al.
Published: (2025)

TriviaHG: A Dataset for Automatic Hint Generation from Factoid Questions
by: Mozafari, Jamshid, et al.
Published: (2024)

How often do Answers Change? Estimating Recency Requirements in Question Answering
by: Piryani, Bhawna, et al.
Published: (2026)

How Good are LLM-based Rerankers? An Empirical Analysis of State-of-the-Art Reranking Models
by: Abdallah, Abdelrahman, et al.
Published: (2025)

Wrong Answers Can Also Be Useful: PlausibleQA -- A Large-Scale QA Dataset with Answer Plausibility Scores
by: Mozafari, Jamshid, et al.
Published: (2025)

Evaluating Answer Reranking Strategies in Time-sensitive Question Answering
by: Kardan, Mehmet, et al.
Published: (2025)

Context Convergence Improves Answering Inferential Questions
by: Mozafari, Jamshid, et al.
Published: (2026)

WikiHint: A Human-Annotated Dataset for Hint Ranking and Generation
by: Mozafari, Jamshid, et al.
Published: (2024)

ASRank: Zero-Shot Re-Ranking with Answer Scent for Document Retrieval
by: Abdallah, Abdelrahman, et al.
Published: (2025)

Detecting Temporal Ambiguity in Questions
by: Piryani, Bhawna, et al.
Published: (2024)

ComplexTempQA:A 100m Dataset for Complex Temporal Question Answering
by: Gruber, Raphael, et al.
Published: (2024)

Multi-hop Question Answering
by: Mavi, Vaibhav, et al.
Published: (2022)

Is Your AI-Generated Code Really Safe? Evaluating Large Language Models on Secure Code Generation with CodeSecEval
by: Wang, Jiexin, et al.
Published: (2024)

Detecting Future-related Contexts of Entity Mentions
by: Prashar, Puneet, et al.
Published: (2025)

PARSE: An Open-Domain Reasoning Question Answering Benchmark for Persian
by: Mozafari, Jamshid, et al.
Published: (2026)

Taxonomy Inference for Tabular Data Using Large Language Models
by: Wu, Zhenyu, et al.
Published: (2025)

A Taxonomy for Data Contamination in Large Language Models
by: Palavalli, Medha, et al.
Published: (2024)

Navigating the Landscape of Hint Generation Research: From the Past to the Future
by: Jangra, Anubhav, et al.
Published: (2024)

Taxonomy-based CheckList for Large Language Model Evaluation
by: Zhang, Damin
Published: (2023)

TaxoAlign: Scholarly Taxonomy Generation Using Language Models
by: Lahiri, Avishek, et al.
Published: (2025)

DeAR: Dual-Stage Document Reranking with Reasoning Agents via LLM Distillation
by: Abdallah, Abdelrahman, et al.
Published: (2025)

HintEval: A Comprehensive Framework for Hint Generation and Evaluation for Questions
by: Mozafari, Jamshid, et al.
Published: (2025)

Inferential Question Answering
by: Mozafari, Jamshid, et al.
Published: (2026)

Exploring Hint Generation Approaches in Open-Domain Question Answering
by: Mozafari, Jamshid, et al.
Published: (2024)