:: Library Catalog

Copertina

Salvato in:

Dettagli Bibliografici
Autori principali:	Lin, Zhiyu, Zhou, Zhengda, Zhao, Zhiyuan, Wan, Tianrui, Ma, Yilun, Gao, Junyu, Li, Xuelong
Natura:	Preprint
Pubblicazione:	2025
Soggetti:	Computation and Language
Accesso online:	https://arxiv.org/abs/2506.07818
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

Documenti analoghi

WebSynthesis: World-Model-Guided MCTS for Efficient WebUI-Trajectory Synthesis
di: Gao, Yifei, et al.
Pubblicazione: (2025)

Open WebUI: An Open, Extensible, and Usable Interface for AI Interaction
di: Baek, Jaeryang, et al.
Pubblicazione: (2025)

Do MLLMs Really See It: Reinforcing Visual Attention in Multimodal LLMs
di: Ou, Siqu, et al.
Pubblicazione: (2026)

UWBench: A Comprehensive Vision-Language Benchmark for Underwater Understanding
di: Zhang, Da, et al.
Pubblicazione: (2025)

LLMs Caught in the Crossfire: Malware Requests and Jailbreak Challenges
di: Li, Haoyang, et al.
Pubblicazione: (2025)

WebCompass: Towards Multimodal Web Coding Evaluation for Code Language Models
di: Lei, Xinping, et al.
Pubblicazione: (2026)

StitchFusion: Weaving Any Visual Modalities to Enhance Multimodal Semantic Segmentation
di: Li, Bingyu, et al.
Pubblicazione: (2024)

U3M: Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation
di: Li, Bingyu, et al.
Pubblicazione: (2024)

EmbeWebAgent: Embedding Web Agents into Any Customized UI
di: Ma, Chenyang, et al.
Pubblicazione: (2026)

WebVIA: A Web-based Vision-Language Agentic Framework for Interactive and Verifiable UI-to-Code Generation
di: Xu, Mingde, et al.
Pubblicazione: (2025)

NWPU-MOC: A Benchmark for Fine-grained Multi-category Object Counting in Aerial Images
di: Gao, Junyu, et al.
Pubblicazione: (2024)

From Captions to Rewards (CAREVL): Leveraging Large Language Model Experts for Enhanced Reward Modeling in Large Vision-Language Models
di: Dai, Muzhi, et al.
Pubblicazione: (2025)

An Open-Source Benchmark and Baseline for Multi-temporal Referring Segmentation
di: Li, Bingyu, et al.
Pubblicazione: (2026)

MRWeb: An Exploration of Generating Multi-Page Resource-Aware Web Code from UI Designs
di: Wan, Yuxuan, et al.
Pubblicazione: (2024)

SVGen: Interpretable Vector Graphics Generation with Large Language Models
di: Wang, Feiyu, et al.
Pubblicazione: (2025)

TongUI: Internet-Scale Trajectories from Multimodal Web Tutorials for Generalized GUI Agents
di: Zhang, Bofei, et al.
Pubblicazione: (2025)

ReDemon UI: Reactive Synthesis by Demonstration for Web UI
di: Lee, Jay, et al.
Pubblicazione: (2025)

Claim Automation using Large Language Model
di: Mo, Zhengda, et al.
Pubblicazione: (2026)

Secure Tug-of-War (SecTOW): Iterative Defense-Attack Training with Reinforcement Learning for Multimodal Model Security
di: Dai, Muzhi, et al.
Pubblicazione: (2025)

UIBenchKit: A unified toolkit for design-to-code model evaluation
di: Le, Chinh T., et al.
Pubblicazione: (2026)

WebCoderBench: Benchmarking Web Application Generation with Comprehensive and Interpretable Evaluation Metrics
di: Liu, Chenxu, et al.
Pubblicazione: (2026)

WebQuest: A Benchmark for Multimodal QA on Web Page Sequences
di: Wang, Maria, et al.
Pubblicazione: (2024)

Towards Realistic Open-Vocabulary Remote Sensing Segmentation: Benchmark and Baseline
di: Li, Bingyu, et al.
Pubblicazione: (2026)

Dynamic Proxy Domain Generalizes the Crowd Localization by Better Binary Segmentation
di: Gao, Junyu, et al.
Pubblicazione: (2024)

FGAseg: Fine-Grained Pixel-Text Alignment for Open-Vocabulary Semantic Segmentation
di: Li, Bingyu, et al.
Pubblicazione: (2025)

Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs
di: Yun, Sukmin, et al.
Pubblicazione: (2024)

VSA:Visual-Structural Alignment for UI-to-Code
di: Wu, Xian, et al.
Pubblicazione: (2025)

IW-Bench: Evaluating Large Multimodal Models for Converting Image-to-Web
di: Guo, Hongcheng, et al.
Pubblicazione: (2024)

VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks
di: Jang, Lawrence, et al.
Pubblicazione: (2024)

WebDevJudge: Evaluating (M)LLMs as Critiques for Web Development Quality
di: Li, Chunyang, et al.
Pubblicazione: (2025)

WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models
di: He, Hongliang, et al.
Pubblicazione: (2024)

WebMMU: A Benchmark for Multimodal Multilingual Website Understanding and Code Generation
di: Awal, Rabiul, et al.
Pubblicazione: (2025)

VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks
di: Koh, Jing Yu, et al.
Pubblicazione: (2024)

Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks
di: Xu, Kai, et al.
Pubblicazione: (2025)

Open CaptchaWorld: A Comprehensive Web-based Platform for Testing and Benchmarking Multimodal LLM Agents
di: Luo, Yaxin, et al.
Pubblicazione: (2025)

Exploring Efficient Open-Vocabulary Segmentation in the Remote Sensing
di: Li, Bingyu, et al.
Pubblicazione: (2025)

Exploring the Underwater World Segmentation without Extra Training
di: Li, Bingyu, et al.
Pubblicazione: (2025)

MedMamba: Multi-View State Space Models with Adaptive Graph Learning for Medical Time Series Classification
di: Zhang, Da, et al.
Pubblicazione: (2026)

FusAD: Time-Frequency Fusion with Adaptive Denoising for General Time Series Analysis
di: Zhang, Da, et al.
Pubblicazione: (2025)

MARIS: Marine Open-Vocabulary Instance Segmentation with Geometric Enhancement and Semantic Alignment
di: Li, Bingyu, et al.
Pubblicazione: (2025)