:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yu, Mingyang, Guo, Xiahui, chen, Peng, Li, Zhenkai, Shu, Yang
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2507.23253
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Masks Can Talk: Extracting Structured Text Information from Single-Modal Images for Remote Sensing Change Detection
by: Zheng, Kai, et al.
Published: (2026)

TAMMs: Change Understanding and Forecasting in Satellite Image Time Series with Temporal-Aware Multimodal Models
by: Guo, Zhongbin, et al.
Published: (2025)

SSDA: Bridging Spectral and Structural Gaps via Dual Adaptation for Vision-Based Time Series Forecasting
by: Zhang, Mingrui, et al.
Published: (2026)

Improving Denoising Diffusion Models via Simultaneous Estimation of Image and Noise
by: Zhang, Zhenkai, et al.
Published: (2023)

Towards Understanding the Working Mechanism of Text-to-Image Diffusion Model
by: Yi, Mingyang, et al.
Published: (2024)

Unsupervised Modality Adaptation with Text-to-Image Diffusion Models for Semantic Segmentation
by: Xia, Ruihao, et al.
Published: (2024)

Enhancing Text-to-Image Editing via Hybrid Mask-Informed Fusion
by: Li, Aoxue, et al.
Published: (2024)

Image-Plane Geometric Decoding for View-Invariant Indoor Scene Reconstruction
by: Li, Mingyang, et al.
Published: (2025)

Vision-Enhanced Time Series Forecasting via Latent Diffusion Models
by: Ruan, Weilin, et al.
Published: (2025)

Multi-Modal Vision Transformers for Crop Mapping from Satellite Image Time Series
by: Follath, Theresa, et al.
Published: (2024)

Self-Supervised Cross-Modal Text-Image Time Series Retrieval in Remote Sensing
by: Hoxha, Genc, et al.
Published: (2025)

Towards Classifying Histopathological Microscope Images as Time Series Data
by: Hong, Sungrae, et al.
Published: (2025)

GeoGPT4V: Towards Geometric Multi-modal Large Language Models with Geometric Image Generation
by: Cai, Shihao, et al.
Published: (2024)

Modality-Agnostic Structural Image Representation Learning for Deformable Multi-Modality Medical Image Registration
by: Mok, Tony C. W., et al.
Published: (2024)

Robust Fairness Vision-Language Learning for Medical Image Analysis
by: Bansal, Sparsh, et al.
Published: (2025)

VIFO: Visual Feature Empowered Multivariate Time Series Forecasting with Cross-Modal Fusion
by: Wang, Yanlong, et al.
Published: (2025)

Forecasting as Rendering: A 2D Gaussian Splatting Framework for Time Series Forecasting
by: Wang, Yixin, et al.
Published: (2026)

PRFusion: Toward Effective and Robust Multi-Modal Place Recognition with Image and Point Cloud Fusion
by: Wang, Sijie, et al.
Published: (2024)

Object-level Geometric Structure Preserving for Natural Image Stitching
by: Cai, Wenxiao, et al.
Published: (2024)

Zo3T: Zero-Shot 3D-Aware Trajectory-Guided Image-to-Video Generation via Test-Time Training
by: Zhang, Ruicheng, et al.
Published: (2025)

A New Benchmark and Model for Challenging Image Manipulation Detection
by: Zhang, Zhenfei, et al.
Published: (2023)

Towards Natural Image Matting in the Wild via Real-Scenario Prior
by: Xia, Ruihao, et al.
Published: (2024)

Cross-Modal Mapping: Mitigating the Modality Gap for Few-Shot Image Classification
by: Yang, Xi, et al.
Published: (2024)

Decomposing Subject-Driven Image Generation via Intermediate Structural Prediction
by: Guo, Hanzhong, et al.
Published: (2026)

StreamEQA: Towards Streaming Video Understanding for Embodied Scenarios
by: Wang, Yifei, et al.
Published: (2025)

Time-VLM: Exploring Multimodal Vision-Language Models for Augmented Time Series Forecasting
by: Zhong, Siru, et al.
Published: (2025)

GERA: Geometric Embedding for Efficient Point Registration Analysis
by: Li, Geng, et al.
Published: (2024)

RaffeSDG: Random Frequency Filtering enabled Single-source Domain Generalization for Medical Image Segmentation
by: Li, Heng, et al.
Published: (2024)

ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing
by: Xing, Long, et al.
Published: (2025)

Inpainting-Style Conditional Diffusion for Multivariable Time Series Forecasting
by: Kiani, Kourosh, et al.
Published: (2026)

Image Inpainting via Conditional Texture and Structure Dual Generation
by: Guo, Xiefan, et al.
Published: (2021)

Better with Less: Tackling Heterogeneous Multi-Modal Image Joint Pretraining via Conditioned and Degraded Masked Autoencoder
by: Peng, Bowen, et al.
Published: (2026)

Towards Training-free Open-world Segmentation via Image Prompt Foundation Models
by: Tang, Lv, et al.
Published: (2023)

MM-ISTS: Cooperating Irregularly Sampled Time Series Forecasting with Multimodal Vision-Text LLMs
by: Lei, Zhi, et al.
Published: (2026)

CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation
by: Zhang, Guanghao, et al.
Published: (2025)

Towards Unbiased Cross-Modal Representation Learning for Food Image-to-Recipe Retrieval
by: Wang, Qing, et al.
Published: (2025)

UniGeo: Unifying Geometric Guidance for Camera-Controllable Image Editing via Video Models
by: Jiang, Hong, et al.
Published: (2026)

TimeRFT: Stimulating Generalizable Time Series Forecasting for TSFMs via Reinforcement Finetuning
by: Li, Siyang, et al.
Published: (2026)

ViTime: Foundation Model for Time Series Forecasting Powered by Vision Intelligence
by: Yang, Luoxiao, et al.
Published: (2024)

Improving Multimodal Sentiment Analysis via Modality Optimization and Dynamic Primary Modality Selection
by: Yang, Dingkang, et al.
Published: (2025)