Affichage MARC: :: Library Catalog

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Ji, Yuqi, Ke, Junjie, He, Lihuo, Liu, Jun, Zhang, Kaifan, Lai, Yu-Kun, Ding, Guiguang, Gao, Xinbo
Format:	Preprint
Publié:	2025
Sujets:	Computer Vision and Pattern Recognition Human-Computer Interaction
Accès en ligne:	https://arxiv.org/abs/2512.03418
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

_version_	1866908690505793536
author	Ji, Yuqi Ke, Junjie He, Lihuo Liu, Jun Zhang, Kaifan Lai, Yu-Kun Ding, Guiguang Gao, Xinbo
author_facet	Ji, Yuqi Ke, Junjie He, Lihuo Liu, Jun Zhang, Kaifan Lai, Yu-Kun Ding, Guiguang Gao, Xinbo
contents	Affordance detection aims to jointly address the fundamental "what-where-how" challenge in embodied AI by understanding "what" an object is, "where" the object is located, and "how" it can be used. However, most affordance learning methods focus solely on "how" objects can be used while neglecting the "what" and "where" aspects. Other affordance detection methods treat object detection and affordance learning as two independent tasks, lacking effective interaction and real-time capability. To overcome these limitations, we introduce YOLO Affordance (YOLOA), a real-time affordance detection model that jointly handles these two tasks via a large language model (LLM) adapter. Specifically, YOLOA employs a lightweight detector consisting of object detection and affordance learning branches refined through the LLM Adapter. During training, the LLM Adapter interacts with object and affordance preliminary predictions to refine both branches by generating more accurate class priors, box offsets, and affordance gates. Experiments on our relabeled ADG-Det and IIT-Heat benchmarks demonstrate that YOLOA achieves state-of-the-art accuracy (52.8 / 73.1 mAP on ADG-Det / IIT-Heat) while maintaining real-time performance (up to 89.77 FPS, and up to 846.24 FPS for the lightweight variant). This indicates that YOLOA achieves an excellent trade-off between accuracy and efficiency.
format	Preprint
id	arxiv_https___arxiv_org_abs_2512_03418
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	YOLOA: Real-Time Affordance Detection via LLM Adapter Ji, Yuqi Ke, Junjie He, Lihuo Liu, Jun Zhang, Kaifan Lai, Yu-Kun Ding, Guiguang Gao, Xinbo Computer Vision and Pattern Recognition Human-Computer Interaction Affordance detection aims to jointly address the fundamental "what-where-how" challenge in embodied AI by understanding "what" an object is, "where" the object is located, and "how" it can be used. However, most affordance learning methods focus solely on "how" objects can be used while neglecting the "what" and "where" aspects. Other affordance detection methods treat object detection and affordance learning as two independent tasks, lacking effective interaction and real-time capability. To overcome these limitations, we introduce YOLO Affordance (YOLOA), a real-time affordance detection model that jointly handles these two tasks via a large language model (LLM) adapter. Specifically, YOLOA employs a lightweight detector consisting of object detection and affordance learning branches refined through the LLM Adapter. During training, the LLM Adapter interacts with object and affordance preliminary predictions to refine both branches by generating more accurate class priors, box offsets, and affordance gates. Experiments on our relabeled ADG-Det and IIT-Heat benchmarks demonstrate that YOLOA achieves state-of-the-art accuracy (52.8 / 73.1 mAP on ADG-Det / IIT-Heat) while maintaining real-time performance (up to 89.77 FPS, and up to 846.24 FPS for the lightweight variant). This indicates that YOLOA achieves an excellent trade-off between accuracy and efficiency.
title	YOLOA: Real-Time Affordance Detection via LLM Adapter
topic	Computer Vision and Pattern Recognition Human-Computer Interaction
url	https://arxiv.org/abs/2512.03418

Documents similaires