Saved in:
| Main Authors: | Elhady, Ahmed, Agirre, Eneko, Artetxe, Mikel |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.18316 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Cross-lingual Self-Consistency for Multilingual Reasoning with Language Models
by: Elhady, Ahmed, et al.
Published: (2026)
by: Elhady, Ahmed, et al.
Published: (2026)
Emergent Abilities of Large Language Models under Continued Pretraining for Language Adaptation
by: Elhady, Ahmed, et al.
Published: (2025)
by: Elhady, Ahmed, et al.
Published: (2025)
Latxa: An Open Language Model and Evaluation Suite for Basque
by: Etxaniz, Julen, et al.
Published: (2024)
by: Etxaniz, Julen, et al.
Published: (2024)
Automatic Logical Forms improve fidelity in Table-to-Text generation
by: Alonso, Iñigo, et al.
Published: (2023)
by: Alonso, Iñigo, et al.
Published: (2023)
Do not be greedy, Think Twice: Sampling and Selection for Document-level Information Extraction
by: Zubillaga, Mikel, et al.
Published: (2026)
by: Zubillaga, Mikel, et al.
Published: (2026)
PixT3: Pixel-based Table-To-Text Generation
by: Alonso, Iñigo, et al.
Published: (2023)
by: Alonso, Iñigo, et al.
Published: (2023)
Grounding Spatial Relations in Text-Only Language Models
by: Azkune, Gorka, et al.
Published: (2024)
by: Azkune, Gorka, et al.
Published: (2024)
Event Extraction in Basque: Typologically motivated Cross-Lingual Transfer-Learning Analysis
by: Zubillaga, Mikel, et al.
Published: (2024)
by: Zubillaga, Mikel, et al.
Published: (2024)
Instructing Large Language Models for Low-Resource Languages: A Systematic Study for Basque
by: Sainz, Oscar, et al.
Published: (2025)
by: Sainz, Oscar, et al.
Published: (2025)
TABLET: A Large-Scale Dataset for Robust Visual Table Understanding
by: Alonso, Iñigo, et al.
Published: (2025)
by: Alonso, Iñigo, et al.
Published: (2025)
Adding simple structure at inference improves Vision-Language Compositionality
by: Miranda, Imanol, et al.
Published: (2025)
by: Miranda, Imanol, et al.
Published: (2025)
BiVLC: Extending Vision-Language Compositionality Evaluation with Text-to-Image Retrieval
by: Miranda, Imanol, et al.
Published: (2024)
by: Miranda, Imanol, et al.
Published: (2024)
Revisiting Compositionality in Dual-Encoder Vision-Language Models: The Role of Inference
by: Miranda, Imanol, et al.
Published: (2026)
by: Miranda, Imanol, et al.
Published: (2026)
GuideX: Guided Synthetic Data Generation for Zero-Shot Information Extraction
by: De La Fuente, Neil, et al.
Published: (2025)
by: De La Fuente, Neil, et al.
Published: (2025)
Gender-specific Machine Translation with Large Language Models
by: Sánchez, Eduardo, et al.
Published: (2023)
by: Sánchez, Eduardo, et al.
Published: (2023)
Linguini: A benchmark for language-agnostic linguistic reasoning
by: Sánchez, Eduardo, et al.
Published: (2024)
by: Sánchez, Eduardo, et al.
Published: (2024)
GeoChallenge: A Multi-Answer Multiple-Choice Benchmark for Geometric Reasoning with Diagrams
by: Zhang, Yushun, et al.
Published: (2026)
by: Zhang, Yushun, et al.
Published: (2026)
GoLLIE: Annotation Guidelines improve Zero-Shot Information-Extraction
by: Sainz, Oscar, et al.
Published: (2023)
by: Sainz, Oscar, et al.
Published: (2023)
BertaQA: How Much Do Language Models Know About Local Culture?
by: Etxaniz, Julen, et al.
Published: (2024)
by: Etxaniz, Julen, et al.
Published: (2024)
MCR for CLIR
by: Eneko Agirre
Published: (2007)
by: Eneko Agirre
Published: (2007)
Lexical semantics, Basque and Spanish in QTLeap: Quality Translation by Deep Language Engineering Approaches
by: Eneko Agirre
Published: (2015)
by: Eneko Agirre
Published: (2015)
KNOW2: Language understanding technologies for multilingual domain-oriented information access
by: Eneko Agirre
Published: (2010)
by: Eneko Agirre
Published: (2010)
Exploring feature set combinations for WSD
by: Eneko Agirre
Published: (2006)
by: Eneko Agirre
Published: (2006)
KNOW: Developing large-scale multilingual technologies for language understanding
by: Eneko Agirre
Published: (2009)
by: Eneko Agirre
Published: (2009)
Translate, then Detect: Leveraging Machine Translation for Cross-Lingual Toxicity Classification
by: Bell, Samuel J., et al.
Published: (2025)
by: Bell, Samuel J., et al.
Published: (2025)
Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident, Especially When They are Wrong
by: Fu, Tairan, et al.
Published: (2025)
by: Fu, Tairan, et al.
Published: (2025)
ChartQAPro: A More Diverse and Challenging Benchmark for Chart Question Answering
by: Masry, Ahmed, et al.
Published: (2025)
by: Masry, Ahmed, et al.
Published: (2025)
Distractor Generation in Multiple-Choice Tasks: A Survey of Methods, Datasets, and Evaluation
by: Alhazmi, Elaf, et al.
Published: (2024)
by: Alhazmi, Elaf, et al.
Published: (2024)
Improving Language Plasticity via Pretraining with Active Forgetting
by: Chen, Yihong, et al.
Published: (2023)
by: Chen, Yihong, et al.
Published: (2023)
More Bias, Less Bias: BiasPrompting for Enhanced Multiple-Choice Question Answering
by: Vu, Duc Anh, et al.
Published: (2025)
by: Vu, Duc Anh, et al.
Published: (2025)
Improving Score Reliability of Multiple Choice Benchmarks with Consistency Evaluation and Altered Answer Choices
by: Cavalin, Paulo, et al.
Published: (2025)
by: Cavalin, Paulo, et al.
Published: (2025)
Simulating Training Data Leakage in Multiple-Choice Benchmarks for LLM Evaluation
by: Hidayat, Naila Shafirni, et al.
Published: (2025)
by: Hidayat, Naila Shafirni, et al.
Published: (2025)
UBench: Benchmarking Uncertainty in Large Language Models with Multiple Choice Questions
by: Wang, Xunzhi, et al.
Published: (2024)
by: Wang, Xunzhi, et al.
Published: (2024)
Strengthened Symbol Binding Makes Large Language Models Reliable Multiple-Choice Selectors
by: Xue, Mengge, et al.
Published: (2024)
by: Xue, Mengge, et al.
Published: (2024)
The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants
by: Bandarkar, Lucas, et al.
Published: (2023)
by: Bandarkar, Lucas, et al.
Published: (2023)
ViMultiChoice: Toward a Method That Gives Explanation for Multiple-Choice Reading Comprehension in Vietnamese
by: Cao, Trung Tien, et al.
Published: (2026)
by: Cao, Trung Tien, et al.
Published: (2026)
Plausibly Problematic Questions in Multiple-Choice Benchmarks for Commonsense Reasoning
by: Palta, Shramay, et al.
Published: (2024)
by: Palta, Shramay, et al.
Published: (2024)
BrowseComp: A Simple Yet Challenging Benchmark for Browsing Agents
by: Wei, Jason, et al.
Published: (2025)
by: Wei, Jason, et al.
Published: (2025)
BenchMarker: An Education-Inspired Toolkit for Highlighting Flaws in Multiple-Choice Benchmarks
by: Balepur, Nishant, et al.
Published: (2026)
by: Balepur, Nishant, et al.
Published: (2026)
More Documents, Same Length: Isolating the Challenge of Multiple Documents in RAG
by: Levy, Shahar, et al.
Published: (2025)
by: Levy, Shahar, et al.
Published: (2025)
Similar Items
-
Cross-lingual Self-Consistency for Multilingual Reasoning with Language Models
by: Elhady, Ahmed, et al.
Published: (2026) -
Emergent Abilities of Large Language Models under Continued Pretraining for Language Adaptation
by: Elhady, Ahmed, et al.
Published: (2025) -
Latxa: An Open Language Model and Evaluation Suite for Basque
by: Etxaniz, Julen, et al.
Published: (2024) -
Automatic Logical Forms improve fidelity in Table-to-Text generation
by: Alonso, Iñigo, et al.
Published: (2023) -
Do not be greedy, Think Twice: Sampling and Selection for Document-level Information Extraction
by: Zubillaga, Mikel, et al.
Published: (2026)