Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Deng, Pei, Zhou, Wenqian, Wu, Hanlin
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2409.08582
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866929499505950720
author	Deng, Pei Zhou, Wenqian Wu, Hanlin
author_facet	Deng, Pei Zhou, Wenqian Wu, Hanlin
contents	Remote sensing (RS) change analysis is vital for monitoring Earth's dynamic processes by detecting alterations in images over time. Traditional change detection excels at identifying pixel-level changes but lacks the ability to contextualize these alterations. While recent advancements in change captioning offer natural language descriptions of changes, they do not support interactive, user-specific queries. To address these limitations, we introduce ChangeChat, the first bitemporal vision-language model (VLM) designed specifically for RS change analysis. ChangeChat utilizes multimodal instruction tuning, allowing it to handle complex queries such as change captioning, category-specific quantification, and change localization. To enhance the model's performance, we developed the ChangeChat-87k dataset, which was generated using a combination of rule-based methods and GPT-assisted techniques. Experiments show that ChangeChat offers a comprehensive, interactive solution for RS change analysis, achieving performance comparable to or even better than state-of-the-art (SOTA) methods on specific tasks, and significantly surpassing the latest general-domain model, GPT-4. Code and pre-trained weights are available at https://github.com/hanlinwu/ChangeChat.
format	Preprint
id	arxiv_https___arxiv_org_abs_2409_08582
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	ChangeChat: An Interactive Model for Remote Sensing Change Analysis via Multimodal Instruction Tuning Deng, Pei Zhou, Wenqian Wu, Hanlin Computer Vision and Pattern Recognition Remote sensing (RS) change analysis is vital for monitoring Earth's dynamic processes by detecting alterations in images over time. Traditional change detection excels at identifying pixel-level changes but lacks the ability to contextualize these alterations. While recent advancements in change captioning offer natural language descriptions of changes, they do not support interactive, user-specific queries. To address these limitations, we introduce ChangeChat, the first bitemporal vision-language model (VLM) designed specifically for RS change analysis. ChangeChat utilizes multimodal instruction tuning, allowing it to handle complex queries such as change captioning, category-specific quantification, and change localization. To enhance the model's performance, we developed the ChangeChat-87k dataset, which was generated using a combination of rule-based methods and GPT-assisted techniques. Experiments show that ChangeChat offers a comprehensive, interactive solution for RS change analysis, achieving performance comparable to or even better than state-of-the-art (SOTA) methods on specific tasks, and significantly surpassing the latest general-domain model, GPT-4. Code and pre-trained weights are available at https://github.com/hanlinwu/ChangeChat.
title	ChangeChat: An Interactive Model for Remote Sensing Change Analysis via Multimodal Instruction Tuning
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2409.08582

Similar Items