:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Elhady, Ahmed, Agirre, Eneko, Artetxe, Mikel
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2502.18316
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Cross-lingual Self-Consistency for Multilingual Reasoning with Language Models
by: Elhady, Ahmed, et al.
Published: (2026)

Emergent Abilities of Large Language Models under Continued Pretraining for Language Adaptation
by: Elhady, Ahmed, et al.
Published: (2025)

Latxa: An Open Language Model and Evaluation Suite for Basque
by: Etxaniz, Julen, et al.
Published: (2024)

Automatic Logical Forms improve fidelity in Table-to-Text generation
by: Alonso, Iñigo, et al.
Published: (2023)

Do not be greedy, Think Twice: Sampling and Selection for Document-level Information Extraction
by: Zubillaga, Mikel, et al.
Published: (2026)

PixT3: Pixel-based Table-To-Text Generation
by: Alonso, Iñigo, et al.
Published: (2023)

Grounding Spatial Relations in Text-Only Language Models
by: Azkune, Gorka, et al.
Published: (2024)

Event Extraction in Basque: Typologically motivated Cross-Lingual Transfer-Learning Analysis
by: Zubillaga, Mikel, et al.
Published: (2024)

Instructing Large Language Models for Low-Resource Languages: A Systematic Study for Basque
by: Sainz, Oscar, et al.
Published: (2025)

TABLET: A Large-Scale Dataset for Robust Visual Table Understanding
by: Alonso, Iñigo, et al.
Published: (2025)

Adding simple structure at inference improves Vision-Language Compositionality
by: Miranda, Imanol, et al.
Published: (2025)

BiVLC: Extending Vision-Language Compositionality Evaluation with Text-to-Image Retrieval
by: Miranda, Imanol, et al.
Published: (2024)

Revisiting Compositionality in Dual-Encoder Vision-Language Models: The Role of Inference
by: Miranda, Imanol, et al.
Published: (2026)

GuideX: Guided Synthetic Data Generation for Zero-Shot Information Extraction
by: De La Fuente, Neil, et al.
Published: (2025)

Gender-specific Machine Translation with Large Language Models
by: Sánchez, Eduardo, et al.
Published: (2023)

Linguini: A benchmark for language-agnostic linguistic reasoning
by: Sánchez, Eduardo, et al.
Published: (2024)

GeoChallenge: A Multi-Answer Multiple-Choice Benchmark for Geometric Reasoning with Diagrams
by: Zhang, Yushun, et al.
Published: (2026)

GoLLIE: Annotation Guidelines improve Zero-Shot Information-Extraction
by: Sainz, Oscar, et al.
Published: (2023)

BertaQA: How Much Do Language Models Know About Local Culture?
by: Etxaniz, Julen, et al.
Published: (2024)

MCR for CLIR
by: Eneko Agirre
Published: (2007)

Lexical semantics, Basque and Spanish in QTLeap: Quality Translation by Deep Language Engineering Approaches
by: Eneko Agirre
Published: (2015)

KNOW2: Language understanding technologies for multilingual domain-oriented information access
by: Eneko Agirre
Published: (2010)

Exploring feature set combinations for WSD
by: Eneko Agirre
Published: (2006)

KNOW: Developing large-scale multilingual technologies for language understanding
by: Eneko Agirre
Published: (2009)

Translate, then Detect: Leveraging Machine Translation for Cross-Lingual Toxicity Classification
by: Bell, Samuel J., et al.
Published: (2025)

Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident, Especially When They are Wrong
by: Fu, Tairan, et al.
Published: (2025)

ChartQAPro: A More Diverse and Challenging Benchmark for Chart Question Answering
by: Masry, Ahmed, et al.
Published: (2025)

Distractor Generation in Multiple-Choice Tasks: A Survey of Methods, Datasets, and Evaluation
by: Alhazmi, Elaf, et al.
Published: (2024)

Improving Language Plasticity via Pretraining with Active Forgetting
by: Chen, Yihong, et al.
Published: (2023)

More Bias, Less Bias: BiasPrompting for Enhanced Multiple-Choice Question Answering
by: Vu, Duc Anh, et al.
Published: (2025)

Improving Score Reliability of Multiple Choice Benchmarks with Consistency Evaluation and Altered Answer Choices
by: Cavalin, Paulo, et al.
Published: (2025)

Simulating Training Data Leakage in Multiple-Choice Benchmarks for LLM Evaluation
by: Hidayat, Naila Shafirni, et al.
Published: (2025)

UBench: Benchmarking Uncertainty in Large Language Models with Multiple Choice Questions
by: Wang, Xunzhi, et al.
Published: (2024)

Strengthened Symbol Binding Makes Large Language Models Reliable Multiple-Choice Selectors
by: Xue, Mengge, et al.
Published: (2024)

The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants
by: Bandarkar, Lucas, et al.
Published: (2023)

ViMultiChoice: Toward a Method That Gives Explanation for Multiple-Choice Reading Comprehension in Vietnamese
by: Cao, Trung Tien, et al.
Published: (2026)

Plausibly Problematic Questions in Multiple-Choice Benchmarks for Commonsense Reasoning
by: Palta, Shramay, et al.
Published: (2024)

BrowseComp: A Simple Yet Challenging Benchmark for Browsing Agents
by: Wei, Jason, et al.
Published: (2025)

BenchMarker: An Education-Inspired Toolkit for Highlighting Flaws in Multiple-Choice Benchmarks
by: Balepur, Nishant, et al.
Published: (2026)

More Documents, Same Length: Isolating the Challenge of Multiple Documents in RAG
by: Levy, Shahar, et al.
Published: (2025)