Saved in:
| Main Authors: | Martins, Bruno, Szymański, Piotr, Gramacki, Piotr |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.14345 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Evaluation of Code LLMs on Geospatial Code Generation
by: Gramacki, Piotr, et al.
Published: (2024)
by: Gramacki, Piotr, et al.
Published: (2024)
Temporal Fact Conflicts in LLMs: Reproducibility Insights from Unifying DYNAMICQA and MULAN
by: Dey, Ritajit, et al.
Published: (2026)
by: Dey, Ritajit, et al.
Published: (2026)
Silver Retriever: Advancing Neural Passage Retrieval for Polish Question Answering
by: Rybak, Piotr, et al.
Published: (2023)
by: Rybak, Piotr, et al.
Published: (2023)
A Comprehensive Evaluation of Large Language Models on Temporal Event Forecasting
by: Chang, He, et al.
Published: (2024)
by: Chang, He, et al.
Published: (2024)
DeepResearchGym: A Free, Transparent, and Reproducible Evaluation Sandbox for Deep Research
by: Coelho, João, et al.
Published: (2025)
by: Coelho, João, et al.
Published: (2025)
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents
by: Du, Mingxuan, et al.
Published: (2025)
by: Du, Mingxuan, et al.
Published: (2025)
BrowseComp-Plus: A More Fair and Transparent Evaluation Benchmark of Deep-Research Agent
by: Chen, Zijian, et al.
Published: (2025)
by: Chen, Zijian, et al.
Published: (2025)
Reproducing Complex Set-Compositional Information Retrieval
by: Degenhart, Vincent, et al.
Published: (2026)
by: Degenhart, Vincent, et al.
Published: (2026)
ORBIT -- Open Recommendation Benchmark for Reproducible Research with Hidden Tests
by: He, Jingyuan, et al.
Published: (2025)
by: He, Jingyuan, et al.
Published: (2025)
PyTorch-IE: Fast and Reproducible Prototyping for Information Extraction
by: Binder, Arne, et al.
Published: (2024)
by: Binder, Arne, et al.
Published: (2024)
SCTc-TE: A Comprehensive Formulation and Benchmark for Temporal Event Forecasting
by: Ma, Yunshan, et al.
Published: (2023)
by: Ma, Yunshan, et al.
Published: (2023)
A Reproducibility Study of PLAID
by: MacAvaney, Sean, et al.
Published: (2024)
by: MacAvaney, Sean, et al.
Published: (2024)
It's High Time: A Survey of Temporal Question Answering
by: Piryani, Bhawna, et al.
Published: (2025)
by: Piryani, Bhawna, et al.
Published: (2025)
SparseCL: Sparse Contrastive Learning for Contradiction Retrieval
by: Xu, Haike, et al.
Published: (2024)
by: Xu, Haike, et al.
Published: (2024)
Doc-Researcher: A Unified System for Multimodal Document Parsing and Deep Research
by: Dong, Kuicai, et al.
Published: (2025)
by: Dong, Kuicai, et al.
Published: (2025)
TComQA: Extracting Temporal Commonsense from Text
by: Nair, Lekshmi R, et al.
Published: (2025)
by: Nair, Lekshmi R, et al.
Published: (2025)
Retrieval of Temporal Event Sequences from Textual Descriptions
by: Liu, Zefang, et al.
Published: (2024)
by: Liu, Zefang, et al.
Published: (2024)
Faithful Temporal Question Answering over Heterogeneous Sources
by: Jia, Zhen, et al.
Published: (2024)
by: Jia, Zhen, et al.
Published: (2024)
A Reproducibility Study of LLM-Based Query Reformulation
by: Bigdeli, Amin, et al.
Published: (2026)
by: Bigdeli, Amin, et al.
Published: (2026)
Set-Aligning Framework for Auto-Regressive Event Temporal Graph Generation
by: Tan, Xingwei, et al.
Published: (2024)
by: Tan, Xingwei, et al.
Published: (2024)
Subtopic-aware View Sampling and Temporal Aggregation for Long-form Document Matching
by: Zhou, Youchao, et al.
Published: (2024)
by: Zhou, Youchao, et al.
Published: (2024)
QueryGym: A Toolkit for Reproducible LLM-Based Query Reformulation
by: Bigdeli, Amin, et al.
Published: (2025)
by: Bigdeli, Amin, et al.
Published: (2025)
TempRetriever: Fusion-based Temporal Dense Passage Retrieval for Time-Sensitive Questions
by: Abdallah, Abdelrahman, et al.
Published: (2025)
by: Abdallah, Abdelrahman, et al.
Published: (2025)
Towards Personalized Deep Research: Benchmarks and Evaluations
by: Liang, Yuan, et al.
Published: (2025)
by: Liang, Yuan, et al.
Published: (2025)
A Hybrid RAG System with Comprehensive Enhancement on Complex Reasoning
by: Yuan, Ye, et al.
Published: (2024)
by: Yuan, Ye, et al.
Published: (2024)
Reproducing HotFlip for Corpus Poisoning Attacks in Dense Retrieval
by: Li, Yongkang, et al.
Published: (2025)
by: Li, Yongkang, et al.
Published: (2025)
CoIR: A Comprehensive Benchmark for Code Information Retrieval Models
by: Li, Xiangyang, et al.
Published: (2024)
by: Li, Xiangyang, et al.
Published: (2024)
Dwell in the Beginning: How Language Models Embed Long Documents for Dense Retrieval
by: Coelho, João, et al.
Published: (2024)
by: Coelho, João, et al.
Published: (2024)
GDLLM: A Global Distance-aware Modeling Approach Based on Large Language Models for Event Temporal Relation Extraction
by: Zhao, Jie, et al.
Published: (2025)
by: Zhao, Jie, et al.
Published: (2025)
OpenResearcher: A Fully Open Pipeline for Long-Horizon Deep Research Trajectory Synthesis
by: Li, Zhuofeng, et al.
Published: (2026)
by: Li, Zhuofeng, et al.
Published: (2026)
A Question Answering Based Pipeline for Comprehensive Chinese EHR Information Extraction
by: Ying, Huaiyuan, et al.
Published: (2024)
by: Ying, Huaiyuan, et al.
Published: (2024)
Cocktail: A Comprehensive Information Retrieval Benchmark with LLM-Generated Documents Integration
by: Dai, Sunhao, et al.
Published: (2024)
by: Dai, Sunhao, et al.
Published: (2024)
Zep: A Temporal Knowledge Graph Architecture for Agent Memory
by: Rasmussen, Preston, et al.
Published: (2025)
by: Rasmussen, Preston, et al.
Published: (2025)
Pre-training vs. Fine-tuning: A Reproducibility Study on Dense Retrieval Knowledge Acquisition
by: Yao, Zheng, et al.
Published: (2025)
by: Yao, Zheng, et al.
Published: (2025)
A Benchmark for Deep Information Synthesis
by: Paul, Debjit, et al.
Published: (2026)
by: Paul, Debjit, et al.
Published: (2026)
IFIR: A Comprehensive Benchmark for Evaluating Instruction-Following in Expert-Domain Information Retrieval
by: Song, Tingyu, et al.
Published: (2025)
by: Song, Tingyu, et al.
Published: (2025)
Hypencoder Revisited: Reproducibility and Analysis of Non-Linear Scoring for First-Stage Retrieval
by: Eichholtz, Arne, et al.
Published: (2026)
by: Eichholtz, Arne, et al.
Published: (2026)
Rethinking the Privacy of Text Embeddings: A Reproducibility Study of "Text Embeddings Reveal (Almost) As Much As Text"
by: Seputis, Dominykas, et al.
Published: (2025)
by: Seputis, Dominykas, et al.
Published: (2025)
Evidence-Guided Schema Normalization for Temporal Tabular Reasoning
by: Thanga, Ashish, et al.
Published: (2025)
by: Thanga, Ashish, et al.
Published: (2025)
SimpleDeepSearcher: Deep Information Seeking via Web-Powered Reasoning Trajectory Synthesis
by: Sun, Shuang, et al.
Published: (2025)
by: Sun, Shuang, et al.
Published: (2025)
Similar Items
-
Evaluation of Code LLMs on Geospatial Code Generation
by: Gramacki, Piotr, et al.
Published: (2024) -
Temporal Fact Conflicts in LLMs: Reproducibility Insights from Unifying DYNAMICQA and MULAN
by: Dey, Ritajit, et al.
Published: (2026) -
Silver Retriever: Advancing Neural Passage Retrieval for Polish Question Answering
by: Rybak, Piotr, et al.
Published: (2023) -
A Comprehensive Evaluation of Large Language Models on Temporal Event Forecasting
by: Chang, He, et al.
Published: (2024) -
DeepResearchGym: A Free, Transparent, and Reproducible Evaluation Sandbox for Deep Research
by: Coelho, João, et al.
Published: (2025)