:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Hexuan, Ren, Yaxuan, Bommireddypalli, Srikar, Chen, Shuxian, Prabhudesai, Adarsh, Zhou, Rongkun, Baral, Elina, Koehn, Philipp
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2603.08910
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

CRAFT: Training-Free Cascaded Retrieval for Tabular QA
by: Singh, Adarsh, et al.
Published: (2025)

M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for Evaluating Foundation Models
by: Li, Chuhan, et al.
Published: (2024)

SciCoQA: Quality Assurance for Scientific Paper--Code Alignment
by: Baumgärtner, Tim, et al.
Published: (2026)

VeriSciQA: An Auto-Verified Dataset for Scientific Visual Question Answering
by: Li, Yuyi, et al.
Published: (2025)

SciVQR: A Multidisciplinary Multimodal Benchmark for Advanced Scientific Reasoning Evaluation
by: Guo, Longteng, et al.
Published: (2026)

SciVideoBench: Benchmarking Scientific Video Reasoning in Large Multimodal Models
by: Deng, Andong, et al.
Published: (2025)

SciReasoner: Laying the Scientific Reasoning Ground Across Disciplines
by: Wang, Yizhou, et al.
Published: (2025)

Text Style Transfer with Parameter-efficient LLM Finetuning and Round-trip Translation
by: Liu, Ruoxi, et al.
Published: (2026)

Speech Vecalign: an Embedding-based Method for Aligning Parallel Speech Documents
by: Meng, Chutong, et al.
Published: (2025)

Learn and Unlearn: Addressing Misinformation in Multilingual LLMs
by: Lu, Taiming, et al.
Published: (2024)

Geometry Matters: Benchmarking Scientific ML Approaches for Flow Prediction around Complex Geometries
by: Rabeh, Ali, et al.
Published: (2024)

SciIF: Benchmarking Scientific Instruction Following Towards Rigorous Scientific Intelligence
by: Su, Encheng, et al.
Published: (2026)

MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image
by: Arazi, Alan, et al.
Published: (2026)

How Robust are the Tabular QA Models for Scientific Tables? A Study using Customized Dataset
by: Ghosh, Akash, et al.
Published: (2024)

SciMDR: Advancing Scientific Multimodal Document Reasoning
by: Chen, Ziyu, et al.
Published: (2026)

Metal-Sci: A Scientific Compute Benchmark for Evolutionary LLM Kernel Search on Apple Silicon
by: Gallego, Víctor
Published: (2026)

Quantifying Reproducibility Gaps in Publicly Available Plant Nuclear Bioimaging Datasets: The Reproducibility Risk Assessment Framework (RRAF)
by: Joardar, Sudipta, et al.
Published: (2026)

Measuring Visual Understanding in Telecom domain: Performance Metrics for Image-to-UML conversion using VLMs
by: Ranjani, HG, et al.
Published: (2025)

Dynamics of Dissociative Electron Attachment to Aliphatic Thiols
by: Das, Sukanta, et al.
Published: (2023)

Breakthrough Asymmetries across Disciplines and Countries: A Network approach to Structural Complexity of Scientific Progress
by: Raghuvanshi, Adarsh, et al.
Published: (2025)

SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval
by: Wu, Siwei, et al.
Published: (2024)

SciAssess: Benchmarking LLM Proficiency in Scientific Literature Analysis
by: Cai, Hengxing, et al.
Published: (2024)

SciEvent: Benchmarking Multi-domain Scientific Event Extraction
by: Dong, Bofu, et al.
Published: (2025)

SciAgent: Tool-augmented Language Models for Scientific Reasoning
by: Ma, Yubo, et al.
Published: (2024)

WildSci: Advancing Scientific Reasoning from In-the-Wild Literature
by: Liu, Tengxiao, et al.
Published: (2026)

Pointer-Generator Networks for Low-Resource Machine Translation: Don't Copy That!
by: Bafna, Niyati, et al.
Published: (2024)

Recovering document annotations for sentence-level bitext
by: Wicks, Rachel, et al.
Published: (2024)

TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning
by: Zou, Jiaru, et al.
Published: (2025)

Justinian und die Armee des frühen Byzanz
by: Koehn, Clemens
Published: (2021)

SciFIBench: Benchmarking Large Multimodal Models for Scientific Figure Interpretation
by: Roberts, Jonathan, et al.
Published: (2024)

SciResearcher: Scaling Deep Research Agents for Frontier Scientific Reasoning
by: Zheng, Tianshi, et al.
Published: (2026)

PaperSearchQA: Learning to Search and Reason over Scientific Papers with RLVR
by: Burgess, James, et al.
Published: (2026)

Maximizing Confidence Alone Improves Reasoning
by: Prabhudesai, Mihir, et al.
Published: (2025)

Moneyball with LLMs: Analyzing Tabular Summarization in Sports Narratives
by: Upadhyay, Ritam, et al.
Published: (2025)

RelationalFactQA: A Benchmark for Evaluating Tabular Fact Retrieval from Large Language Models
by: Satriani, Dario, et al.
Published: (2025)

GeoRC: A Benchmark for Geolocation Reasoning Chains
by: Talreja, Mohit, et al.
Published: (2026)

Online Ramsey numbers of the claw versus cycles
by: Zhi, Hexuan, et al.
Published: (2026)

Three-color online Ramsey numbers $\tilde{r}(P_3,P_3,P_{\ell})$ and $\tilde{r}(P_3, P_3, C_{\ell})$
by: Zhi, Hexuan, et al.
Published: (2025)

QA-LIGN: Aligning LLMs through Constitutionally Decomposed QA
by: Dineen, Jacob, et al.
Published: (2025)

SciFigDetect: A Benchmark for AI-Generated Scientific Figure Detection
by: Hu, You, et al.
Published: (2026)