Internformat: :: Library Catalog

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Hu, Yu, Gu, Jianyang, Liu, Hao, Cao, Yue, Hamari, Jozsef, Liu, Zheng, Zardadi, Mohsen
Format:	Preprint
Veröffentlicht:	2026
Schlagworte:	Computer Vision and Pattern Recognition
Online-Zugang:	https://arxiv.org/abs/2603.12659
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

_version_	1866908883600015360
author	Hu, Yu Gu, Jianyang Liu, Hao Cao, Yue Hamari, Jozsef Liu, Zheng Zardadi, Mohsen
author_facet	Hu, Yu Gu, Jianyang Liu, Hao Cao, Yue Hamari, Jozsef Liu, Zheng Zardadi, Mohsen
contents	Adapting vision-language models to remote sensing imagery remains challenging due to two key factors: limited semantic coverage in textual representations and insufficient adaptability of visual features. These issues are particularly significant in aerial scenes, which involve various visual appearances and fine-grained object distinctions. We propose AVION, a knowledge distillation framework tailored for remote sensing adaptation of vision-language models. The teacher module constructs semantically rich textual prototypes by collecting descriptions from a large language model and verifying validity using remote sensing image features. The student module integrates lightweight and learnable prompts into both vision and language encoders, guided by the teacher to align embeddings and their cross-modal relationships. Once trained, the student operates independently during inference. Experiments on six optical remote sensing benchmarks show that AVION improves few-shot classification and base-class accuracy without degrading generalization to novel categories. It also enhances mean recall for cross-modal retrieval, with minimal additional trainable parameters.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_12659
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	AVION: Aerial Vision-Language Instruction from Offline Teacher to Prompt-Tuned Network Hu, Yu Gu, Jianyang Liu, Hao Cao, Yue Hamari, Jozsef Liu, Zheng Zardadi, Mohsen Computer Vision and Pattern Recognition Adapting vision-language models to remote sensing imagery remains challenging due to two key factors: limited semantic coverage in textual representations and insufficient adaptability of visual features. These issues are particularly significant in aerial scenes, which involve various visual appearances and fine-grained object distinctions. We propose AVION, a knowledge distillation framework tailored for remote sensing adaptation of vision-language models. The teacher module constructs semantically rich textual prototypes by collecting descriptions from a large language model and verifying validity using remote sensing image features. The student module integrates lightweight and learnable prompts into both vision and language encoders, guided by the teacher to align embeddings and their cross-modal relationships. Once trained, the student operates independently during inference. Experiments on six optical remote sensing benchmarks show that AVION improves few-shot classification and base-class accuracy without degrading generalization to novel categories. It also enhances mean recall for cross-modal retrieval, with minimal additional trainable parameters.
title	AVION: Aerial Vision-Language Instruction from Offline Teacher to Prompt-Tuned Network
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2603.12659

Ähnliche Einträge