:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Song, Yueqi, Ou, Tianyue, Kong, Yibo, Li, Zecheng, Neubig, Graham, Yue, Xiang
Format:	Preprint
Veröffentlicht:	2025
Schlagworte:	Computation and Language
Online-Zugang:	https://arxiv.org/abs/2504.10342
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

What Is Missing in Multilingual Visual Reasoning and How to Fix It
von: Song, Yueqi, et al.
Veröffentlicht: (2024)

Grounding Multilingual Multimodal LLMs With Cultural Knowledge
von: Nyandwi, Jean de Dieu, et al.
Veröffentlicht: (2025)

Harnessing Webpage UIs for Text-Rich Visual Understanding
von: Liu, Junpeng, et al.
Veröffentlicht: (2024)

On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models
von: Zhang, Charlie, et al.
Veröffentlicht: (2025)

An image speaks a thousand words, but can everyone listen? On image transcreation for cultural relevance
von: Khanuja, Simran, et al.
Veröffentlicht: (2024)

Beyond Browsing: API-Based Web Agents
von: Song, Yueqi, et al.
Veröffentlicht: (2024)

VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?
von: Liu, Junpeng, et al.
Veröffentlicht: (2024)

Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages
von: Yue, Xiang, et al.
Veröffentlicht: (2024)

Demystifying Long Chain-of-Thought Reasoning in LLMs
von: Yeo, Edward, et al.
Veröffentlicht: (2025)

MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale
von: Guo, Jarvis, et al.
Veröffentlicht: (2024)

Scaling Evaluation-time Compute with Reasoning Models as Evaluators
von: Kim, Seungone, et al.
Veröffentlicht: (2025)

CowPilot: A Framework for Autonomous and Human-Agent Collaborative Web Navigation
von: Huq, Faria, et al.
Veröffentlicht: (2025)

JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation
von: Onohara, Shota, et al.
Veröffentlicht: (2024)

Oolong: Evaluating Long Context Reasoning and Aggregation Capabilities
von: Bertsch, Amanda, et al.
Veröffentlicht: (2025)

Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning
von: Huan, Maggie, et al.
Veröffentlicht: (2025)

Go-Browse: Training Web Agents with Structured Exploration
von: Gandhi, Apurva, et al.
Veröffentlicht: (2025)

BehaviorBox: Automated Discovery of Fine-Grained Performance Differences Between Language Models
von: Tjuatja, Lindia, et al.
Veröffentlicht: (2025)

Modeling Distinct Human Interaction in Web Agents
von: Huq, Faria, et al.
Veröffentlicht: (2026)

TangramPuzzle: Evaluating Multimodal Large Language Models with Compositional Spatial Reasoning
von: Liu, Daixian, et al.
Veröffentlicht: (2026)

VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks
von: Koh, Jing Yu, et al.
Veröffentlicht: (2024)

Coding Agents with Multimodal Browsing are Generalist Problem Solvers
von: Soni, Aditya Bharat, et al.
Veröffentlicht: (2025)

Can Large Language Models be Trusted for Evaluation? Scalable Meta-Evaluation of LLMs as Evaluators via Agent Debate
von: Chern, Steffi, et al.
Veröffentlicht: (2024)

ZINA: Multimodal Fine-grained Hallucination Detection and Editing
von: Wada, Yuiga, et al.
Veröffentlicht: (2025)

Analyzing Information Sharing and Coordination in Multi-Agent Planning
von: Ou, Tianyue, et al.
Veröffentlicht: (2025)

Effective Strategies for Asynchronous Software Engineering Agents
von: Geng, Jiayi, et al.
Veröffentlicht: (2026)

Towards Automatic Evaluation for Image Transcreation
von: Khanuja, Simran, et al.
Veröffentlicht: (2024)

Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents
von: Song, Yueqi, et al.
Veröffentlicht: (2025)

Checklists Are Better Than Reward Models For Aligning Language Models
von: Viswanathan, Vijay, et al.
Veröffentlicht: (2025)

PuzzleWorld: A Benchmark for Multimodal, Open-Ended Reasoning in Puzzlehunts
von: Li, Hengzhi, et al.
Veröffentlicht: (2025)

Gained in Translation: Privileged Pairwise Judges Enhance Multilingual Reasoning
von: Sutawika, Lintang, et al.
Veröffentlicht: (2026)

Evaluating Language Models as Synthetic Data Generators
von: Kim, Seungone, et al.
Veröffentlicht: (2024)

Evaluating Text-to-Visual Generation with Image-to-Text Generation
von: Lin, Zhiqiu, et al.
Veröffentlicht: (2024)

Midtraining Bridges Pretraining and Posttraining Distributions
von: Liu, Emmy, et al.
Veröffentlicht: (2025)

Language Modeling with Editable External Knowledge
von: Li, Belinda Z., et al.
Veröffentlicht: (2024)

An Incomplete Loop: Instruction Inference, Instruction Following, and In-context Learning in Language Models
von: Liu, Emmy, et al.
Veröffentlicht: (2024)

Solving NLP Problems through Human-System Collaboration: A Discussion-based Approach
von: Kaneko, Masahiro, et al.
Veröffentlicht: (2023)

MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in Multimodal Large Language Model
von: Huo, Jiahao, et al.
Veröffentlicht: (2024)

GM-PRM: A Generative Multimodal Process Reward Model for Multimodal Mathematical Reasoning
von: Zhang, Jianghangfan, et al.
Veröffentlicht: (2025)

From Evidence-Based Medicine to Knowledge Graph: Retrieval-Augmented Generation for Sports Rehabilitation and a Domain Benchmark
von: Zhang, Jinning, et al.
Veröffentlicht: (2026)

CAIRe: Cultural Attribution of Images by Retrieval-Augmented Evaluation
von: Yayavaram, Arnav, et al.
Veröffentlicht: (2025)