:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wallat, Jonas, Jatowt, Adam, Anand, Avishek
Format:	Preprint
Published:	2024
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2401.12078
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

A Study into Investigating Temporal Robustness of LLMs
by: Wallat, Jonas, et al.
Published: (2025)

TempRetriever: Fusion-based Temporal Dense Passage Retrieval for Time-Sensitive Questions
by: Abdallah, Abdelrahman, et al.
Published: (2025)

Evaluating List Construction and Temporal Understanding capabilities of Large Language Models
by: Dumitru, Alexandru, et al.
Published: (2025)

Correctness is not Faithfulness in RAG Attributions
by: Wallat, Jonas, et al.
Published: (2024)

It's High Time: A Survey of Temporal Question Answering
by: Piryani, Bhawna, et al.
Published: (2025)

Analyzing the Role of Context in Forecasting with Large Language Models
by: Mutschlechner, Gerrit, et al.
Published: (2025)

Towards Effective Time-Aware Language Representation: Exploring Enhanced Temporal Understanding in Language Models
by: Wang, Jiexin, et al.
Published: (2024)

Automated Analysis of Sustainability Reports: Using Large Language Models for the Extraction and Prediction of EU Taxonomy-Compliant KPIs
by: Schmoll, Jonathan, et al.
Published: (2025)

Navigating Tomorrow: Reliably Assessing Large Language Models Performance on Future Event Prediction
by: Nako, Petraq, et al.
Published: (2025)

Pretraining Exposure Explains Popularity Judgments in Large Language Models
by: Mozafari, Jamshid, et al.
Published: (2026)

Question Difficulty Estimation for Large Language Models via Answer Plausibility Scoring
by: Mozafari, Jamshid, et al.
Published: (2026)

Linguistic Blind Spots of Large Language Models
by: Cheng, Jiali, et al.
Published: (2025)

Temporal Validity Change Prediction
by: Wenzel, Georg, et al.
Published: (2024)

Seeing Isn't Believing: Uncovering Blind Spots in Evaluator Vision-Language Models
by: Khan, Mohammed Safi Ur Rahman, et al.
Published: (2026)

Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds
by: Basmov, Victoria, et al.
Published: (2023)

Detecting Temporal Ambiguity in Questions
by: Piryani, Bhawna, et al.
Published: (2024)

Blind Spot Navigation in Large Language Model Reasoning with Thought Space Explorer
by: Zhang, Jinghan, et al.
Published: (2024)

ComplexTempQA:A 100m Dataset for Complex Temporal Question Answering
by: Gruber, Raphael, et al.
Published: (2024)

ChroniclingAmericaQA: A Large-scale Question Answering Dataset based on Historical American Newspaper Pages
by: Piryani, Bhawna, et al.
Published: (2024)

Unchecked and Overlooked: Addressing the Checkbox Blind Spot in Large Language Models with CheckboxQA
by: Turski, Michał, et al.
Published: (2025)

Generator-Retriever-Generator Approach for Open-Domain Question Answering
by: Abdallah, Abdelrahman, et al.
Published: (2023)

Exploring NLP Benchmarks in an Extremely Low-Resource Setting
by: Nuha, Ulin, et al.
Published: (2025)

Fluent but Unfeeling: The Emotional Blind Spots of Language Models
by: Shu, Bangzhao, et al.
Published: (2025)

Transformers and Language Models in Form Understanding: A Comprehensive Review of Scanned Document Analysis
by: Abdallah, Abdelrahman, et al.
Published: (2024)

Belief in the Machine: Investigating Epistemological Blind Spots of Language Models
by: Suzgun, Mirac, et al.
Published: (2024)

Illuminating Blind Spots of Language Models with Targeted Agent-in-the-Loop Synthetic Data
by: Lippmann, Philip, et al.
Published: (2024)

Wisdom of the Crowds in Forecasting: Forecast Summarization for Supporting Future Event Prediction
by: Saha, Anisha, et al.
Published: (2025)

TriviaHG: A Dataset for Automatic Hint Generation from Factoid Questions
by: Mozafari, Jamshid, et al.
Published: (2024)

How often do Answers Change? Estimating Recency Requirements in Question Answering
by: Piryani, Bhawna, et al.
Published: (2026)

Self-Correction Bench: Uncovering and Addressing the Self-Correction Blind Spot in Large Language Models
by: Tsui, Ken
Published: (2025)

Wrong Answers Can Also Be Useful: PlausibleQA -- A Large-Scale QA Dataset with Answer Plausibility Scores
by: Mozafari, Jamshid, et al.
Published: (2025)

WikiHint: A Human-Annotated Dataset for Hint Ranking and Generation
by: Mozafari, Jamshid, et al.
Published: (2024)

Context Convergence Improves Answering Inferential Questions
by: Mozafari, Jamshid, et al.
Published: (2026)

Evaluating Answer Reranking Strategies in Time-sensitive Question Answering
by: Kardan, Mehmet, et al.
Published: (2025)

QuanTemp: A real-world open-domain benchmark for fact-checking numerical claims
by: V, Venktesh, et al.
Published: (2024)

Large Language Models are Algorithmically Blind
by: Venkatesh, Sohan, et al.
Published: (2026)

DEXTER: A Benchmark for open-domain Complex Question Answering using LLMs
by: Prabhu, Venktesh V. Deepali, et al.
Published: (2024)

A Benchmark for Open-Domain Numerical Fact-Checking Enhanced by Claim Decomposition
by: Venktesh, V, et al.
Published: (2025)

Trust but Verify! A Survey on Verification Design for Test-time Scaling
by: Venktesh, V, et al.
Published: (2025)

Finding Blind Spots in Evaluator LLMs with Interpretable Checklists
by: Doddapaneni, Sumanth, et al.
Published: (2024)