Saved in:
| Main Authors: | , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.02865 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866916789974204416 |
|---|---|
| author | Andreux, Mathieu Skuk, Breno Baldas Benchekroun, Hamza Biré, Emilien Bonnet, Antoine Bordie, Riaz Bout, Nathan Brunel, Matthias Cedoz, Pierre-Louis Chassang, Antoine Chen, Mickaël Constantinou, Alexandra D. d'Andigné, Antoine de La Jonquière, Hubert Delfosse, Aurélien Denoyer, Ludovic Deprez, Alexis Derupti, Augustin Eickenberg, Michael Federico, Mathïs Kantor, Charles Koegler, Xavier Labbé, Yann Lee, Matthew C. H. de Kergaradec, Erwan Le Jumeau Mahla, Amir Manevich, Avshalom Maret, Adrien Masson, Charles Maurin, Rafaël Mena, Arturo Modard, Philippe Moyal, Axel Kerbel, Axel Nguyen Revelle, Julien Richter, Mats L. Santos, María Sifre, Laurent Theillard, Maxime Thibault, Marc Thiry, Louis Tronchon, Léo Usunier, Nicolas Wu, Tony |
| author_facet | Andreux, Mathieu Skuk, Breno Baldas Benchekroun, Hamza Biré, Emilien Bonnet, Antoine Bordie, Riaz Bout, Nathan Brunel, Matthias Cedoz, Pierre-Louis Chassang, Antoine Chen, Mickaël Constantinou, Alexandra D. d'Andigné, Antoine de La Jonquière, Hubert Delfosse, Aurélien Denoyer, Ludovic Deprez, Alexis Derupti, Augustin Eickenberg, Michael Federico, Mathïs Kantor, Charles Koegler, Xavier Labbé, Yann Lee, Matthew C. H. de Kergaradec, Erwan Le Jumeau Mahla, Amir Manevich, Avshalom Maret, Adrien Masson, Charles Maurin, Rafaël Mena, Arturo Modard, Philippe Moyal, Axel Kerbel, Axel Nguyen Revelle, Julien Richter, Mats L. Santos, María Sifre, Laurent Theillard, Maxime Thibault, Marc Thiry, Louis Tronchon, Léo Usunier, Nicolas Wu, Tony |
| contents | We present Surfer-H, a cost-efficient web agent that integrates Vision-Language Models (VLM) to perform user-defined tasks on the web. We pair it with Holo1, a new open-weight collection of VLMs specialized in web navigation and information extraction. Holo1 was trained on carefully curated data sources, including open-access web content, synthetic examples, and self-produced agentic data. Holo1 tops generalist User Interface (UI) benchmarks as well as our new web UI localization benchmark, WebClick. When powered by Holo1, Surfer-H achieves a 92.2% state-of-the-art performance on WebVoyager, striking a Pareto-optimal balance between accuracy and cost-efficiency. To accelerate research advancement in agentic systems, we are open-sourcing both our WebClick evaluation dataset and the Holo1 model weights. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2506_02865 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | Surfer-H Meets Holo1: Cost-Efficient Web Agent Powered by Open Weights Andreux, Mathieu Skuk, Breno Baldas Benchekroun, Hamza Biré, Emilien Bonnet, Antoine Bordie, Riaz Bout, Nathan Brunel, Matthias Cedoz, Pierre-Louis Chassang, Antoine Chen, Mickaël Constantinou, Alexandra D. d'Andigné, Antoine de La Jonquière, Hubert Delfosse, Aurélien Denoyer, Ludovic Deprez, Alexis Derupti, Augustin Eickenberg, Michael Federico, Mathïs Kantor, Charles Koegler, Xavier Labbé, Yann Lee, Matthew C. H. de Kergaradec, Erwan Le Jumeau Mahla, Amir Manevich, Avshalom Maret, Adrien Masson, Charles Maurin, Rafaël Mena, Arturo Modard, Philippe Moyal, Axel Kerbel, Axel Nguyen Revelle, Julien Richter, Mats L. Santos, María Sifre, Laurent Theillard, Maxime Thibault, Marc Thiry, Louis Tronchon, Léo Usunier, Nicolas Wu, Tony Artificial Intelligence We present Surfer-H, a cost-efficient web agent that integrates Vision-Language Models (VLM) to perform user-defined tasks on the web. We pair it with Holo1, a new open-weight collection of VLMs specialized in web navigation and information extraction. Holo1 was trained on carefully curated data sources, including open-access web content, synthetic examples, and self-produced agentic data. Holo1 tops generalist User Interface (UI) benchmarks as well as our new web UI localization benchmark, WebClick. When powered by Holo1, Surfer-H achieves a 92.2% state-of-the-art performance on WebVoyager, striking a Pareto-optimal balance between accuracy and cost-efficiency. To accelerate research advancement in agentic systems, we are open-sourcing both our WebClick evaluation dataset and the Holo1 model weights. |
| title | Surfer-H Meets Holo1: Cost-Efficient Web Agent Powered by Open Weights |
| topic | Artificial Intelligence |
| url | https://arxiv.org/abs/2506.02865 |