Saved in:
| Main Authors: | Wang, Hexuan, Ren, Yaxuan, Bommireddypalli, Srikar, Chen, Shuxian, Prabhudesai, Adarsh, Zhou, Rongkun, Baral, Elina, Koehn, Philipp |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.08910 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
CRAFT: Training-Free Cascaded Retrieval for Tabular QA
by: Singh, Adarsh, et al.
Published: (2025)
by: Singh, Adarsh, et al.
Published: (2025)
M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for Evaluating Foundation Models
by: Li, Chuhan, et al.
Published: (2024)
by: Li, Chuhan, et al.
Published: (2024)
SciCoQA: Quality Assurance for Scientific Paper--Code Alignment
by: Baumgärtner, Tim, et al.
Published: (2026)
by: Baumgärtner, Tim, et al.
Published: (2026)
VeriSciQA: An Auto-Verified Dataset for Scientific Visual Question Answering
by: Li, Yuyi, et al.
Published: (2025)
by: Li, Yuyi, et al.
Published: (2025)
SciVQR: A Multidisciplinary Multimodal Benchmark for Advanced Scientific Reasoning Evaluation
by: Guo, Longteng, et al.
Published: (2026)
by: Guo, Longteng, et al.
Published: (2026)
SciVideoBench: Benchmarking Scientific Video Reasoning in Large Multimodal Models
by: Deng, Andong, et al.
Published: (2025)
by: Deng, Andong, et al.
Published: (2025)
SciReasoner: Laying the Scientific Reasoning Ground Across Disciplines
by: Wang, Yizhou, et al.
Published: (2025)
by: Wang, Yizhou, et al.
Published: (2025)
Text Style Transfer with Parameter-efficient LLM Finetuning and Round-trip Translation
by: Liu, Ruoxi, et al.
Published: (2026)
by: Liu, Ruoxi, et al.
Published: (2026)
Speech Vecalign: an Embedding-based Method for Aligning Parallel Speech Documents
by: Meng, Chutong, et al.
Published: (2025)
by: Meng, Chutong, et al.
Published: (2025)
Learn and Unlearn: Addressing Misinformation in Multilingual LLMs
by: Lu, Taiming, et al.
Published: (2024)
by: Lu, Taiming, et al.
Published: (2024)
Geometry Matters: Benchmarking Scientific ML Approaches for Flow Prediction around Complex Geometries
by: Rabeh, Ali, et al.
Published: (2024)
by: Rabeh, Ali, et al.
Published: (2024)
SciIF: Benchmarking Scientific Instruction Following Towards Rigorous Scientific Intelligence
by: Su, Encheng, et al.
Published: (2026)
by: Su, Encheng, et al.
Published: (2026)
MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image
by: Arazi, Alan, et al.
Published: (2026)
by: Arazi, Alan, et al.
Published: (2026)
How Robust are the Tabular QA Models for Scientific Tables? A Study using Customized Dataset
by: Ghosh, Akash, et al.
Published: (2024)
by: Ghosh, Akash, et al.
Published: (2024)
SciMDR: Advancing Scientific Multimodal Document Reasoning
by: Chen, Ziyu, et al.
Published: (2026)
by: Chen, Ziyu, et al.
Published: (2026)
Metal-Sci: A Scientific Compute Benchmark for Evolutionary LLM Kernel Search on Apple Silicon
by: Gallego, Víctor
Published: (2026)
by: Gallego, Víctor
Published: (2026)
Quantifying Reproducibility Gaps in Publicly Available Plant Nuclear Bioimaging Datasets: The Reproducibility Risk Assessment Framework (RRAF)
by: Joardar, Sudipta, et al.
Published: (2026)
by: Joardar, Sudipta, et al.
Published: (2026)
Measuring Visual Understanding in Telecom domain: Performance Metrics for Image-to-UML conversion using VLMs
by: Ranjani, HG, et al.
Published: (2025)
by: Ranjani, HG, et al.
Published: (2025)
Dynamics of Dissociative Electron Attachment to Aliphatic Thiols
by: Das, Sukanta, et al.
Published: (2023)
by: Das, Sukanta, et al.
Published: (2023)
Breakthrough Asymmetries across Disciplines and Countries: A Network approach to Structural Complexity of Scientific Progress
by: Raghuvanshi, Adarsh, et al.
Published: (2025)
by: Raghuvanshi, Adarsh, et al.
Published: (2025)
SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval
by: Wu, Siwei, et al.
Published: (2024)
by: Wu, Siwei, et al.
Published: (2024)
SciAssess: Benchmarking LLM Proficiency in Scientific Literature Analysis
by: Cai, Hengxing, et al.
Published: (2024)
by: Cai, Hengxing, et al.
Published: (2024)
SciEvent: Benchmarking Multi-domain Scientific Event Extraction
by: Dong, Bofu, et al.
Published: (2025)
by: Dong, Bofu, et al.
Published: (2025)
SciAgent: Tool-augmented Language Models for Scientific Reasoning
by: Ma, Yubo, et al.
Published: (2024)
by: Ma, Yubo, et al.
Published: (2024)
WildSci: Advancing Scientific Reasoning from In-the-Wild Literature
by: Liu, Tengxiao, et al.
Published: (2026)
by: Liu, Tengxiao, et al.
Published: (2026)
Pointer-Generator Networks for Low-Resource Machine Translation: Don't Copy That!
by: Bafna, Niyati, et al.
Published: (2024)
by: Bafna, Niyati, et al.
Published: (2024)
Recovering document annotations for sentence-level bitext
by: Wicks, Rachel, et al.
Published: (2024)
by: Wicks, Rachel, et al.
Published: (2024)
TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning
by: Zou, Jiaru, et al.
Published: (2025)
by: Zou, Jiaru, et al.
Published: (2025)
Justinian und die Armee des frühen Byzanz
by: Koehn, Clemens
Published: (2021)
by: Koehn, Clemens
Published: (2021)
SciFIBench: Benchmarking Large Multimodal Models for Scientific Figure Interpretation
by: Roberts, Jonathan, et al.
Published: (2024)
by: Roberts, Jonathan, et al.
Published: (2024)
SciResearcher: Scaling Deep Research Agents for Frontier Scientific Reasoning
by: Zheng, Tianshi, et al.
Published: (2026)
by: Zheng, Tianshi, et al.
Published: (2026)
PaperSearchQA: Learning to Search and Reason over Scientific Papers with RLVR
by: Burgess, James, et al.
Published: (2026)
by: Burgess, James, et al.
Published: (2026)
Maximizing Confidence Alone Improves Reasoning
by: Prabhudesai, Mihir, et al.
Published: (2025)
by: Prabhudesai, Mihir, et al.
Published: (2025)
Moneyball with LLMs: Analyzing Tabular Summarization in Sports Narratives
by: Upadhyay, Ritam, et al.
Published: (2025)
by: Upadhyay, Ritam, et al.
Published: (2025)
RelationalFactQA: A Benchmark for Evaluating Tabular Fact Retrieval from Large Language Models
by: Satriani, Dario, et al.
Published: (2025)
by: Satriani, Dario, et al.
Published: (2025)
GeoRC: A Benchmark for Geolocation Reasoning Chains
by: Talreja, Mohit, et al.
Published: (2026)
by: Talreja, Mohit, et al.
Published: (2026)
Online Ramsey numbers of the claw versus cycles
by: Zhi, Hexuan, et al.
Published: (2026)
by: Zhi, Hexuan, et al.
Published: (2026)
Three-color online Ramsey numbers $\tilde{r}(P_3,P_3,P_{\ell})$ and $\tilde{r}(P_3, P_3, C_{\ell})$
by: Zhi, Hexuan, et al.
Published: (2025)
by: Zhi, Hexuan, et al.
Published: (2025)
QA-LIGN: Aligning LLMs through Constitutionally Decomposed QA
by: Dineen, Jacob, et al.
Published: (2025)
by: Dineen, Jacob, et al.
Published: (2025)
SciFigDetect: A Benchmark for AI-Generated Scientific Figure Detection
by: Hu, You, et al.
Published: (2026)
by: Hu, You, et al.
Published: (2026)
Similar Items
-
CRAFT: Training-Free Cascaded Retrieval for Tabular QA
by: Singh, Adarsh, et al.
Published: (2025) -
M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for Evaluating Foundation Models
by: Li, Chuhan, et al.
Published: (2024) -
SciCoQA: Quality Assurance for Scientific Paper--Code Alignment
by: Baumgärtner, Tim, et al.
Published: (2026) -
VeriSciQA: An Auto-Verified Dataset for Scientific Visual Question Answering
by: Li, Yuyi, et al.
Published: (2025) -
SciVQR: A Multidisciplinary Multimodal Benchmark for Advanced Scientific Reasoning Evaluation
by: Guo, Longteng, et al.
Published: (2026)