:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Luting, Xiang, Yinghao, Huang, Hongliang, Li, Dongjun, Gao, Chen, Liu, Si
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2510.26297
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology
by: Wang, Xiangyu, et al.
Published: (2024)

ManiSoft: Towards Vision-Language Manipulation for Soft Continuum Robotics
by: Wei, Ziyu, et al.
Published: (2026)

REOBench: Benchmarking Robustness of Earth Observation Foundation Models
by: Li, Xiang, et al.
Published: (2025)

ChronoEarth-492K: A Large Scale and Long Horizon Spatiotemporal Hyperspectral Earth Observation Dataset and Benchmark
by: Si, Haozhe, et al.
Published: (2026)

RemoteSAM: Towards Segment Anything for Earth Observation
by: Yao, Liang, et al.
Published: (2025)

Towards Realistic Open-Vocabulary Remote Sensing Segmentation: Benchmark and Baseline
by: Li, Bingyu, et al.
Published: (2026)

EarthNets: Empowering AI in Earth Observation
by: Xiong, Zhitong, et al.
Published: (2022)

Image Understanding Makes for A Good Tokenizer for Image Generation
by: Wang, Luting, et al.
Published: (2024)

Towards Unified Vision Language Models for Forest Ecological Analysis in Earth Observation
by: Xue, Xizhe, et al.
Published: (2025)

Toward Realistic Camouflaged Object Detection: Benchmarks and Method
by: Xin, Zhimeng, et al.
Published: (2025)

Transfer Learning for Onboard Cloud Segmentation in Thermal Earth Observation: From Landsat to a CubeSat Constellation
by: Wölki, Niklas, et al.
Published: (2025)

Knowledge Distillation via Query Selection for Detection Transformer
by: Liu, Yi, et al.
Published: (2024)

UniTTA: Unified Benchmark and Versatile Framework Towards Realistic Test-Time Adaptation
by: Du, Chaoqun, et al.
Published: (2024)

InterAct: Capture and Modelling of Realistic, Expressive and Interactive Activities between Two Persons in Daily Scenarios
by: Huang, Yinghao, et al.
Published: (2024)

Benchmarking Composed Image Retrieval for Applied Earth Observation
by: Psomas, Bill, et al.
Published: (2026)

OmniEarth-Bench: Towards Holistic Evaluation of Earth's Six Spheres and Cross-Spheres Interactions with Multimodal Observational Earth Data
by: Wang, Fengxiang, et al.
Published: (2025)

Multi-Label Guided Soft Contrastive Learning for Efficient Earth Observation Pretraining
by: Wang, Yi, et al.
Published: (2024)

Towards Autonomous UAV Visual Object Search in City Space: Benchmark and Agentic Methodology
by: Ji, Yatai, et al.
Published: (2025)

EO-VAE: Towards A Multi-sensor Tokenizer for Earth Observation Data
by: Lehmann, Nils, et al.
Published: (2026)

Earth-Agent: Unlocking the Full Landscape of Earth Observation with Agents
by: Feng, Peilin, et al.
Published: (2025)

EarthSynth: Generating Informative Earth Observation with Diffusion Models
by: Pan, Jiancheng, et al.
Published: (2025)

Toward a Realistic Benchmark for Out-of-Distribution Detection
by: Recalcati, Pietro, et al.
Published: (2024)

REO-VLM: Transforming VLM to Meet Regression Challenges in Earth Observation
by: Xue, Xizhe, et al.
Published: (2024)

LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction
by: Du, Penghui, et al.
Published: (2024)

Bridging the Gap Between End-to-End and Two-Step Text Spotting
by: Huang, Mingxin, et al.
Published: (2024)

SkySense: A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery
by: Guo, Xin, et al.
Published: (2023)

One for All: Toward Unified Foundation Models for Earth Vision
by: Xiong, Zhitong, et al.
Published: (2024)

SwinTextSpotter v2: Towards Better Synergy for Scene Text Spotting
by: Huang, Mingxin, et al.
Published: (2024)

BigEarthNet.txt: A Large-Scale Multi-Sensor Image-Text Dataset and Benchmark for Earth Observation
by: Herzog, Johann-Ludwig, et al.
Published: (2026)

Neural Plasticity-Inspired Multimodal Foundation Model for Earth Observation
by: Xiong, Zhitong, et al.
Published: (2024)

Constellation Dataset: Benchmarking High-Altitude Object Detection for an Urban Intersection
by: Turkcan, Mehmet Kerem, et al.
Published: (2024)

TerraScope: Pixel-Grounded Visual Reasoning for Earth Observation
by: Shu, Yan, et al.
Published: (2026)

Foundation Models for Remote Sensing and Earth Observation: A Survey
by: Xiao, Aoran, et al.
Published: (2024)

OpenEarth-Agent: From Tool Calling to Tool Creation for Open-Environment Earth Observation
by: Zhao, Sijie, et al.
Published: (2026)

Evaluating and Benchmarking Foundation Models for Earth Observation and Geospatial AI
by: Dionelis, Nikolaos, et al.
Published: (2024)

DOFA-CLIP: Multimodal Vision-Language Foundation Models for Earth Observation
by: Xiong, Zhitong, et al.
Published: (2025)

Learning Hazing to Dehazing: Towards Realistic Haze Generation for Real-World Image Dehazing
by: Wang, Ruiyi, et al.
Published: (2025)

Where on Earth? A Vision-Language Benchmark for Probing Model Geolocation Skills Across Scales
by: Qian, Zhaofang, et al.
Published: (2025)

READoc: A Unified Benchmark for Realistic Document Structured Extraction
by: Li, Zichao, et al.
Published: (2024)

Scientific Image Synthesis: Benchmarking, Methodologies, and Downstream Utility
by: Lin, Honglin, et al.
Published: (2026)