Saved in:
| Main Authors: | Yuan, Junyi, Zhang, Jian, Wu, Fangyu, Lu, Dongming, Lu, Huanda, Wang, Qiufeng |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.10921 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
LLM-Driven Completeness and Consistency Evaluation for Cultural Heritage Data Augmentation in Cross-Modal Retrieval
by: Zhang, Jian, et al.
Published: (2025)
by: Zhang, Jian, et al.
Published: (2025)
HiGarment: Cross-modal Harmony Based Diffusion Model for Flat Sketch to Realistic Garment Image
by: Guo, Junyi, et al.
Published: (2025)
by: Guo, Junyi, et al.
Published: (2025)
EditGarment: An Instruction-Based Garment Editing Dataset Constructed with Automated MLLM Synthesis and Semantic-Aware Evaluation
by: Yin, Deqiang, et al.
Published: (2025)
by: Yin, Deqiang, et al.
Published: (2025)
Diffusion Based Augmentation for Captioning and Retrieval in Cultural Heritage
by: Cioni, Dario, et al.
Published: (2023)
by: Cioni, Dario, et al.
Published: (2023)
DvD: Unleashing a Generative Paradigm for Document Dewarping via Coordinates-based Diffusion Model
by: Zhang, Weiguang, et al.
Published: (2025)
by: Zhang, Weiguang, et al.
Published: (2025)
Enhanced Cross-modal 3D Retrieval via Tri-modal Reconstruction
by: Ren, Junlong, et al.
Published: (2025)
by: Ren, Junlong, et al.
Published: (2025)
Multimodal LLM Enhanced Cross-lingual Cross-modal Retrieval
by: Wang, Yabing, et al.
Published: (2024)
by: Wang, Yabing, et al.
Published: (2024)
Fusion-then-Distillation: Toward Cross-modal Positive Distillation for Domain Adaptive 3D Semantic Segmentation
by: Wu, Yao, et al.
Published: (2024)
by: Wu, Yao, et al.
Published: (2024)
Towards Patronizing and Condescending Language in Chinese Videos: A Multimodal Dataset and Detector
by: Wang, Hongbo, et al.
Published: (2024)
by: Wang, Hongbo, et al.
Published: (2024)
Text-guided Feature Disentanglement for Cross-modal Gait Recognition
by: Lu, Zhiyang, et al.
Published: (2026)
by: Lu, Zhiyang, et al.
Published: (2026)
Enhanced Textual Feature Extraction for Visual Question Answering: A Simple Convolutional Approach
by: Zhang, Zhilin, et al.
Published: (2024)
by: Zhang, Zhilin, et al.
Published: (2024)
Representation Learning for Compressed Video Action Recognition via Attentive Cross-modal Interaction with Motion Enhancement
by: Li, Bing, et al.
Published: (2022)
by: Li, Bing, et al.
Published: (2022)
Unsupervised Spike Depth Estimation via Cross-modality Cross-domain Knowledge Transfer
by: Liu, Jiaming, et al.
Published: (2022)
by: Liu, Jiaming, et al.
Published: (2022)
Dynamic Adapter with Semantics Disentangling for Cross-lingual Cross-modal Retrieval
by: Cai, Rui, et al.
Published: (2024)
by: Cai, Rui, et al.
Published: (2024)
Towards a Universal 3D Medical Multi-modality Generalization via Learning Personalized Invariant Representation
by: Tan, Zhaorui, et al.
Published: (2024)
by: Tan, Zhaorui, et al.
Published: (2024)
Enhancing Video Memorability Prediction with Text-Motion Cross-modal Contrastive Loss and Its Application in Video Summarization
by: Zhu, Zhiyi, et al.
Published: (2025)
by: Zhu, Zhiyi, et al.
Published: (2025)
Decentralized Vehicle Coordination: The Berkeley DeepDrive Drone Dataset and Consensus-Based Models
by: Wu, Fangyu, et al.
Published: (2022)
by: Wu, Fangyu, et al.
Published: (2022)
Cross-modal Ship Re-Identification via Optical and SAR Imagery: A Novel Dataset and Method
by: Wang, Han, et al.
Published: (2025)
by: Wang, Han, et al.
Published: (2025)
CogniMap3D: Cognitive 3D Mapping and Rapid Retrieval
by: Wang, Feiran, et al.
Published: (2026)
by: Wang, Feiran, et al.
Published: (2026)
Cross-modal Fundus Image Registration under Large FoV Disparity
by: Li, Hongyang, et al.
Published: (2025)
by: Li, Hongyang, et al.
Published: (2025)
Cross-modal ultra-scale learning with tri-modalities of renal biopsy images for glomerular multi-disease auxiliary diagnosis
by: Long, Kaixing, et al.
Published: (2025)
by: Long, Kaixing, et al.
Published: (2025)
DMGD: Train-Free Dataset Distillation with Semantic-Distribution Matching in Diffusion Models
by: Wang, Qichao, et al.
Published: (2026)
by: Wang, Qichao, et al.
Published: (2026)
CLIP Multi-modal Hashing for Multimedia Retrieval
by: Zhu, Jian, et al.
Published: (2024)
by: Zhu, Jian, et al.
Published: (2024)
VULCA-Bench: A Multicultural Vision-Language Benchmark for Evaluating Cultural Understanding
by: Yu, Haorui, et al.
Published: (2026)
by: Yu, Haorui, et al.
Published: (2026)
ChineseVideoBench: Benchmarking Multi-modal Large Models for Chinese Video Question Answering
by: Nie, Yuxiang, et al.
Published: (2025)
by: Nie, Yuxiang, et al.
Published: (2025)
Gaussian Heritage: 3D Digitization of Cultural Heritage with Integrated Object Segmentation
by: Dahaghin, Mahtab, et al.
Published: (2024)
by: Dahaghin, Mahtab, et al.
Published: (2024)
Hands-Free Heritage: Automated 3D Scanning for Cultural Heritage Digitization
by: Ahmad, Javed, et al.
Published: (2025)
by: Ahmad, Javed, et al.
Published: (2025)
Seeing Symbols, Missing Cultures: Probing Vision-Language Models' Reasoning on Fire Imagery and Cultural Meaning
by: Yu, Haorui, et al.
Published: (2025)
by: Yu, Haorui, et al.
Published: (2025)
Lightweight Contrastive Distilled Hashing for Online Cross-modal Retrieval
by: Li, Jiaxing, et al.
Published: (2025)
by: Li, Jiaxing, et al.
Published: (2025)
Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval
by: Li, Hao, et al.
Published: (2023)
by: Li, Hao, et al.
Published: (2023)
Reviving Cultural Heritage: A Novel Approach for Comprehensive Historical Document Restoration
by: Zhang, Yuyi, et al.
Published: (2025)
by: Zhang, Yuyi, et al.
Published: (2025)
SegDebias: Test-Time Bias Mitigation for ViT-Based CLIP via Segmentation
by: Wu, Fangyu, et al.
Published: (2025)
by: Wu, Fangyu, et al.
Published: (2025)
Multi-modal Semantic Understanding with Contrastive Cross-modal Feature Alignment
by: Zhang, Ming, et al.
Published: (2024)
by: Zhang, Ming, et al.
Published: (2024)
Unveil: Unified Visual-Textual Integration and Distillation for Multi-modal Document Retrieval
by: Sun, Hao, et al.
Published: (2026)
by: Sun, Hao, et al.
Published: (2026)
Scope: Selective Cross-modal Orchestration of Visual Perception Experts
by: Zhang, Tianyu, et al.
Published: (2025)
by: Zhang, Tianyu, et al.
Published: (2025)
Deep Reversible Consistency Learning for Cross-modal Retrieval
by: Pu, Ruitao, et al.
Published: (2025)
by: Pu, Ruitao, et al.
Published: (2025)
Single-Sample Black-Box Membership Inference Attack against Vision-Language Models via Cross-modal Semantic Alignment
by: Li, Jiaqing, et al.
Published: (2026)
by: Li, Jiaqing, et al.
Published: (2026)
Towards RGB-NIR Cross-modality Image Registration and Beyond
by: Li, Huadong, et al.
Published: (2024)
by: Li, Huadong, et al.
Published: (2024)
Mitigating Cross-modal Representation Bias for Multicultural Image-to-Recipe Retrieval
by: Wang, Qing, et al.
Published: (2025)
by: Wang, Qing, et al.
Published: (2025)
RefSAM3D: Adapting SAM with Cross-modal Reference for 3D Medical Image Segmentation
by: Gao, Xiang, et al.
Published: (2024)
by: Gao, Xiang, et al.
Published: (2024)
Similar Items
-
LLM-Driven Completeness and Consistency Evaluation for Cultural Heritage Data Augmentation in Cross-Modal Retrieval
by: Zhang, Jian, et al.
Published: (2025) -
HiGarment: Cross-modal Harmony Based Diffusion Model for Flat Sketch to Realistic Garment Image
by: Guo, Junyi, et al.
Published: (2025) -
EditGarment: An Instruction-Based Garment Editing Dataset Constructed with Automated MLLM Synthesis and Semantic-Aware Evaluation
by: Yin, Deqiang, et al.
Published: (2025) -
Diffusion Based Augmentation for Captioning and Retrieval in Cultural Heritage
by: Cioni, Dario, et al.
Published: (2023) -
DvD: Unleashing a Generative Paradigm for Document Dewarping via Coordinates-based Diffusion Model
by: Zhang, Weiguang, et al.
Published: (2025)