Saved in:
| Main Authors: | Yu, Mingyang, Guo, Xiahui, chen, Peng, Li, Zhenkai, Shu, Yang |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.23253 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Masks Can Talk: Extracting Structured Text Information from Single-Modal Images for Remote Sensing Change Detection
by: Zheng, Kai, et al.
Published: (2026)
by: Zheng, Kai, et al.
Published: (2026)
TAMMs: Change Understanding and Forecasting in Satellite Image Time Series with Temporal-Aware Multimodal Models
by: Guo, Zhongbin, et al.
Published: (2025)
by: Guo, Zhongbin, et al.
Published: (2025)
SSDA: Bridging Spectral and Structural Gaps via Dual Adaptation for Vision-Based Time Series Forecasting
by: Zhang, Mingrui, et al.
Published: (2026)
by: Zhang, Mingrui, et al.
Published: (2026)
Improving Denoising Diffusion Models via Simultaneous Estimation of Image and Noise
by: Zhang, Zhenkai, et al.
Published: (2023)
by: Zhang, Zhenkai, et al.
Published: (2023)
Towards Understanding the Working Mechanism of Text-to-Image Diffusion Model
by: Yi, Mingyang, et al.
Published: (2024)
by: Yi, Mingyang, et al.
Published: (2024)
Unsupervised Modality Adaptation with Text-to-Image Diffusion Models for Semantic Segmentation
by: Xia, Ruihao, et al.
Published: (2024)
by: Xia, Ruihao, et al.
Published: (2024)
Enhancing Text-to-Image Editing via Hybrid Mask-Informed Fusion
by: Li, Aoxue, et al.
Published: (2024)
by: Li, Aoxue, et al.
Published: (2024)
Image-Plane Geometric Decoding for View-Invariant Indoor Scene Reconstruction
by: Li, Mingyang, et al.
Published: (2025)
by: Li, Mingyang, et al.
Published: (2025)
Vision-Enhanced Time Series Forecasting via Latent Diffusion Models
by: Ruan, Weilin, et al.
Published: (2025)
by: Ruan, Weilin, et al.
Published: (2025)
Multi-Modal Vision Transformers for Crop Mapping from Satellite Image Time Series
by: Follath, Theresa, et al.
Published: (2024)
by: Follath, Theresa, et al.
Published: (2024)
Self-Supervised Cross-Modal Text-Image Time Series Retrieval in Remote Sensing
by: Hoxha, Genc, et al.
Published: (2025)
by: Hoxha, Genc, et al.
Published: (2025)
Towards Classifying Histopathological Microscope Images as Time Series Data
by: Hong, Sungrae, et al.
Published: (2025)
by: Hong, Sungrae, et al.
Published: (2025)
GeoGPT4V: Towards Geometric Multi-modal Large Language Models with Geometric Image Generation
by: Cai, Shihao, et al.
Published: (2024)
by: Cai, Shihao, et al.
Published: (2024)
Modality-Agnostic Structural Image Representation Learning for Deformable Multi-Modality Medical Image Registration
by: Mok, Tony C. W., et al.
Published: (2024)
by: Mok, Tony C. W., et al.
Published: (2024)
Robust Fairness Vision-Language Learning for Medical Image Analysis
by: Bansal, Sparsh, et al.
Published: (2025)
by: Bansal, Sparsh, et al.
Published: (2025)
VIFO: Visual Feature Empowered Multivariate Time Series Forecasting with Cross-Modal Fusion
by: Wang, Yanlong, et al.
Published: (2025)
by: Wang, Yanlong, et al.
Published: (2025)
Forecasting as Rendering: A 2D Gaussian Splatting Framework for Time Series Forecasting
by: Wang, Yixin, et al.
Published: (2026)
by: Wang, Yixin, et al.
Published: (2026)
PRFusion: Toward Effective and Robust Multi-Modal Place Recognition with Image and Point Cloud Fusion
by: Wang, Sijie, et al.
Published: (2024)
by: Wang, Sijie, et al.
Published: (2024)
Object-level Geometric Structure Preserving for Natural Image Stitching
by: Cai, Wenxiao, et al.
Published: (2024)
by: Cai, Wenxiao, et al.
Published: (2024)
Zo3T: Zero-Shot 3D-Aware Trajectory-Guided Image-to-Video Generation via Test-Time Training
by: Zhang, Ruicheng, et al.
Published: (2025)
by: Zhang, Ruicheng, et al.
Published: (2025)
A New Benchmark and Model for Challenging Image Manipulation Detection
by: Zhang, Zhenfei, et al.
Published: (2023)
by: Zhang, Zhenfei, et al.
Published: (2023)
Towards Natural Image Matting in the Wild via Real-Scenario Prior
by: Xia, Ruihao, et al.
Published: (2024)
by: Xia, Ruihao, et al.
Published: (2024)
Cross-Modal Mapping: Mitigating the Modality Gap for Few-Shot Image Classification
by: Yang, Xi, et al.
Published: (2024)
by: Yang, Xi, et al.
Published: (2024)
Decomposing Subject-Driven Image Generation via Intermediate Structural Prediction
by: Guo, Hanzhong, et al.
Published: (2026)
by: Guo, Hanzhong, et al.
Published: (2026)
StreamEQA: Towards Streaming Video Understanding for Embodied Scenarios
by: Wang, Yifei, et al.
Published: (2025)
by: Wang, Yifei, et al.
Published: (2025)
Time-VLM: Exploring Multimodal Vision-Language Models for Augmented Time Series Forecasting
by: Zhong, Siru, et al.
Published: (2025)
by: Zhong, Siru, et al.
Published: (2025)
GERA: Geometric Embedding for Efficient Point Registration Analysis
by: Li, Geng, et al.
Published: (2024)
by: Li, Geng, et al.
Published: (2024)
RaffeSDG: Random Frequency Filtering enabled Single-source Domain Generalization for Medical Image Segmentation
by: Li, Heng, et al.
Published: (2024)
by: Li, Heng, et al.
Published: (2024)
ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing
by: Xing, Long, et al.
Published: (2025)
by: Xing, Long, et al.
Published: (2025)
Inpainting-Style Conditional Diffusion for Multivariable Time Series Forecasting
by: Kiani, Kourosh, et al.
Published: (2026)
by: Kiani, Kourosh, et al.
Published: (2026)
Image Inpainting via Conditional Texture and Structure Dual Generation
by: Guo, Xiefan, et al.
Published: (2021)
by: Guo, Xiefan, et al.
Published: (2021)
Better with Less: Tackling Heterogeneous Multi-Modal Image Joint Pretraining via Conditioned and Degraded Masked Autoencoder
by: Peng, Bowen, et al.
Published: (2026)
by: Peng, Bowen, et al.
Published: (2026)
Towards Training-free Open-world Segmentation via Image Prompt Foundation Models
by: Tang, Lv, et al.
Published: (2023)
by: Tang, Lv, et al.
Published: (2023)
MM-ISTS: Cooperating Irregularly Sampled Time Series Forecasting with Multimodal Vision-Text LLMs
by: Lei, Zhi, et al.
Published: (2026)
by: Lei, Zhi, et al.
Published: (2026)
CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation
by: Zhang, Guanghao, et al.
Published: (2025)
by: Zhang, Guanghao, et al.
Published: (2025)
Towards Unbiased Cross-Modal Representation Learning for Food Image-to-Recipe Retrieval
by: Wang, Qing, et al.
Published: (2025)
by: Wang, Qing, et al.
Published: (2025)
UniGeo: Unifying Geometric Guidance for Camera-Controllable Image Editing via Video Models
by: Jiang, Hong, et al.
Published: (2026)
by: Jiang, Hong, et al.
Published: (2026)
TimeRFT: Stimulating Generalizable Time Series Forecasting for TSFMs via Reinforcement Finetuning
by: Li, Siyang, et al.
Published: (2026)
by: Li, Siyang, et al.
Published: (2026)
ViTime: Foundation Model for Time Series Forecasting Powered by Vision Intelligence
by: Yang, Luoxiao, et al.
Published: (2024)
by: Yang, Luoxiao, et al.
Published: (2024)
Improving Multimodal Sentiment Analysis via Modality Optimization and Dynamic Primary Modality Selection
by: Yang, Dingkang, et al.
Published: (2025)
by: Yang, Dingkang, et al.
Published: (2025)
Similar Items
-
Masks Can Talk: Extracting Structured Text Information from Single-Modal Images for Remote Sensing Change Detection
by: Zheng, Kai, et al.
Published: (2026) -
TAMMs: Change Understanding and Forecasting in Satellite Image Time Series with Temporal-Aware Multimodal Models
by: Guo, Zhongbin, et al.
Published: (2025) -
SSDA: Bridging Spectral and Structural Gaps via Dual Adaptation for Vision-Based Time Series Forecasting
by: Zhang, Mingrui, et al.
Published: (2026) -
Improving Denoising Diffusion Models via Simultaneous Estimation of Image and Noise
by: Zhang, Zhenkai, et al.
Published: (2023) -
Towards Understanding the Working Mechanism of Text-to-Image Diffusion Model
by: Yi, Mingyang, et al.
Published: (2024)