Saved in:
| Main Authors: | Wallat, Jonas, Jatowt, Adam, Anand, Avishek |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2401.12078 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
A Study into Investigating Temporal Robustness of LLMs
by: Wallat, Jonas, et al.
Published: (2025)
by: Wallat, Jonas, et al.
Published: (2025)
TempRetriever: Fusion-based Temporal Dense Passage Retrieval for Time-Sensitive Questions
by: Abdallah, Abdelrahman, et al.
Published: (2025)
by: Abdallah, Abdelrahman, et al.
Published: (2025)
Evaluating List Construction and Temporal Understanding capabilities of Large Language Models
by: Dumitru, Alexandru, et al.
Published: (2025)
by: Dumitru, Alexandru, et al.
Published: (2025)
Correctness is not Faithfulness in RAG Attributions
by: Wallat, Jonas, et al.
Published: (2024)
by: Wallat, Jonas, et al.
Published: (2024)
It's High Time: A Survey of Temporal Question Answering
by: Piryani, Bhawna, et al.
Published: (2025)
by: Piryani, Bhawna, et al.
Published: (2025)
Analyzing the Role of Context in Forecasting with Large Language Models
by: Mutschlechner, Gerrit, et al.
Published: (2025)
by: Mutschlechner, Gerrit, et al.
Published: (2025)
Towards Effective Time-Aware Language Representation: Exploring Enhanced Temporal Understanding in Language Models
by: Wang, Jiexin, et al.
Published: (2024)
by: Wang, Jiexin, et al.
Published: (2024)
Automated Analysis of Sustainability Reports: Using Large Language Models for the Extraction and Prediction of EU Taxonomy-Compliant KPIs
by: Schmoll, Jonathan, et al.
Published: (2025)
by: Schmoll, Jonathan, et al.
Published: (2025)
Navigating Tomorrow: Reliably Assessing Large Language Models Performance on Future Event Prediction
by: Nako, Petraq, et al.
Published: (2025)
by: Nako, Petraq, et al.
Published: (2025)
Pretraining Exposure Explains Popularity Judgments in Large Language Models
by: Mozafari, Jamshid, et al.
Published: (2026)
by: Mozafari, Jamshid, et al.
Published: (2026)
Question Difficulty Estimation for Large Language Models via Answer Plausibility Scoring
by: Mozafari, Jamshid, et al.
Published: (2026)
by: Mozafari, Jamshid, et al.
Published: (2026)
Linguistic Blind Spots of Large Language Models
by: Cheng, Jiali, et al.
Published: (2025)
by: Cheng, Jiali, et al.
Published: (2025)
Temporal Validity Change Prediction
by: Wenzel, Georg, et al.
Published: (2024)
by: Wenzel, Georg, et al.
Published: (2024)
Seeing Isn't Believing: Uncovering Blind Spots in Evaluator Vision-Language Models
by: Khan, Mohammed Safi Ur Rahman, et al.
Published: (2026)
by: Khan, Mohammed Safi Ur Rahman, et al.
Published: (2026)
Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds
by: Basmov, Victoria, et al.
Published: (2023)
by: Basmov, Victoria, et al.
Published: (2023)
Detecting Temporal Ambiguity in Questions
by: Piryani, Bhawna, et al.
Published: (2024)
by: Piryani, Bhawna, et al.
Published: (2024)
Blind Spot Navigation in Large Language Model Reasoning with Thought Space Explorer
by: Zhang, Jinghan, et al.
Published: (2024)
by: Zhang, Jinghan, et al.
Published: (2024)
ComplexTempQA:A 100m Dataset for Complex Temporal Question Answering
by: Gruber, Raphael, et al.
Published: (2024)
by: Gruber, Raphael, et al.
Published: (2024)
ChroniclingAmericaQA: A Large-scale Question Answering Dataset based on Historical American Newspaper Pages
by: Piryani, Bhawna, et al.
Published: (2024)
by: Piryani, Bhawna, et al.
Published: (2024)
Unchecked and Overlooked: Addressing the Checkbox Blind Spot in Large Language Models with CheckboxQA
by: Turski, Michał, et al.
Published: (2025)
by: Turski, Michał, et al.
Published: (2025)
Generator-Retriever-Generator Approach for Open-Domain Question Answering
by: Abdallah, Abdelrahman, et al.
Published: (2023)
by: Abdallah, Abdelrahman, et al.
Published: (2023)
Exploring NLP Benchmarks in an Extremely Low-Resource Setting
by: Nuha, Ulin, et al.
Published: (2025)
by: Nuha, Ulin, et al.
Published: (2025)
Fluent but Unfeeling: The Emotional Blind Spots of Language Models
by: Shu, Bangzhao, et al.
Published: (2025)
by: Shu, Bangzhao, et al.
Published: (2025)
Transformers and Language Models in Form Understanding: A Comprehensive Review of Scanned Document Analysis
by: Abdallah, Abdelrahman, et al.
Published: (2024)
by: Abdallah, Abdelrahman, et al.
Published: (2024)
Belief in the Machine: Investigating Epistemological Blind Spots of Language Models
by: Suzgun, Mirac, et al.
Published: (2024)
by: Suzgun, Mirac, et al.
Published: (2024)
Illuminating Blind Spots of Language Models with Targeted Agent-in-the-Loop Synthetic Data
by: Lippmann, Philip, et al.
Published: (2024)
by: Lippmann, Philip, et al.
Published: (2024)
Wisdom of the Crowds in Forecasting: Forecast Summarization for Supporting Future Event Prediction
by: Saha, Anisha, et al.
Published: (2025)
by: Saha, Anisha, et al.
Published: (2025)
TriviaHG: A Dataset for Automatic Hint Generation from Factoid Questions
by: Mozafari, Jamshid, et al.
Published: (2024)
by: Mozafari, Jamshid, et al.
Published: (2024)
How often do Answers Change? Estimating Recency Requirements in Question Answering
by: Piryani, Bhawna, et al.
Published: (2026)
by: Piryani, Bhawna, et al.
Published: (2026)
Self-Correction Bench: Uncovering and Addressing the Self-Correction Blind Spot in Large Language Models
by: Tsui, Ken
Published: (2025)
by: Tsui, Ken
Published: (2025)
Wrong Answers Can Also Be Useful: PlausibleQA -- A Large-Scale QA Dataset with Answer Plausibility Scores
by: Mozafari, Jamshid, et al.
Published: (2025)
by: Mozafari, Jamshid, et al.
Published: (2025)
WikiHint: A Human-Annotated Dataset for Hint Ranking and Generation
by: Mozafari, Jamshid, et al.
Published: (2024)
by: Mozafari, Jamshid, et al.
Published: (2024)
Context Convergence Improves Answering Inferential Questions
by: Mozafari, Jamshid, et al.
Published: (2026)
by: Mozafari, Jamshid, et al.
Published: (2026)
Evaluating Answer Reranking Strategies in Time-sensitive Question Answering
by: Kardan, Mehmet, et al.
Published: (2025)
by: Kardan, Mehmet, et al.
Published: (2025)
QuanTemp: A real-world open-domain benchmark for fact-checking numerical claims
by: V, Venktesh, et al.
Published: (2024)
by: V, Venktesh, et al.
Published: (2024)
Large Language Models are Algorithmically Blind
by: Venkatesh, Sohan, et al.
Published: (2026)
by: Venkatesh, Sohan, et al.
Published: (2026)
DEXTER: A Benchmark for open-domain Complex Question Answering using LLMs
by: Prabhu, Venktesh V. Deepali, et al.
Published: (2024)
by: Prabhu, Venktesh V. Deepali, et al.
Published: (2024)
A Benchmark for Open-Domain Numerical Fact-Checking Enhanced by Claim Decomposition
by: Venktesh, V, et al.
Published: (2025)
by: Venktesh, V, et al.
Published: (2025)
Trust but Verify! A Survey on Verification Design for Test-time Scaling
by: Venktesh, V, et al.
Published: (2025)
by: Venktesh, V, et al.
Published: (2025)
Finding Blind Spots in Evaluator LLMs with Interpretable Checklists
by: Doddapaneni, Sumanth, et al.
Published: (2024)
by: Doddapaneni, Sumanth, et al.
Published: (2024)
Similar Items
-
A Study into Investigating Temporal Robustness of LLMs
by: Wallat, Jonas, et al.
Published: (2025) -
TempRetriever: Fusion-based Temporal Dense Passage Retrieval for Time-Sensitive Questions
by: Abdallah, Abdelrahman, et al.
Published: (2025) -
Evaluating List Construction and Temporal Understanding capabilities of Large Language Models
by: Dumitru, Alexandru, et al.
Published: (2025) -
Correctness is not Faithfulness in RAG Attributions
by: Wallat, Jonas, et al.
Published: (2024) -
It's High Time: A Survey of Temporal Question Answering
by: Piryani, Bhawna, et al.
Published: (2025)