Guardado en:
| Autores principales: | Zhan, Yu-Wei, Wu, Xiao-Ming, Luo, Xin, Wei, Yinwei, Xu, Xin-Shun |
|---|---|
| Formato: | Preprint |
| Publicado: |
2024
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2406.10776 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
Simple but Effective Raw-Data Level Multimodal Fusion for Composed Image Retrieval
por: Wen, Haokun, et al.
Publicado: (2024)
por: Wen, Haokun, et al.
Publicado: (2024)
Fine-grained Image Retrieval via Dual-Vision Adaptation
por: Jiang, Xin, et al.
Publicado: (2025)
por: Jiang, Xin, et al.
Publicado: (2025)
Deep Mamba Multi-modal Learning
por: Zhu, Jian, et al.
Publicado: (2024)
por: Zhu, Jian, et al.
Publicado: (2024)
MRD: Multi-resolution Retrieval-Detection Fusion for High-Resolution Image Understanding
por: Yang, Fan, et al.
Publicado: (2025)
por: Yang, Fan, et al.
Publicado: (2025)
Rethinking Vision Transformer for Large-Scale Fine-Grained Image Retrieval
por: Jiang, Xin, et al.
Publicado: (2025)
por: Jiang, Xin, et al.
Publicado: (2025)
G4G:A Generic Framework for High Fidelity Talking Face Generation with Fine-grained Intra-modal Alignment
por: Zhang, Juan, et al.
Publicado: (2024)
por: Zhang, Juan, et al.
Publicado: (2024)
Semantic-Aware Adversarial Training for Reliable Deep Hashing Retrieval
por: Yuan, Xu, et al.
Publicado: (2023)
por: Yuan, Xu, et al.
Publicado: (2023)
Adaptive Multi-Agent Reasoning for Text-to-Video Retrieval
por: Wu, Jiaxin, et al.
Publicado: (2025)
por: Wu, Jiaxin, et al.
Publicado: (2025)
FineBadminton: A Multi-Level Dataset for Fine-Grained Badminton Video Understanding
por: He, Xusheng, et al.
Publicado: (2025)
por: He, Xusheng, et al.
Publicado: (2025)
MMoFusion: Multi-modal Co-Speech Motion Generation with Diffusion Model
por: Wang, Sen, et al.
Publicado: (2024)
por: Wang, Sen, et al.
Publicado: (2024)
How to Cache Important Contents for Multi-modal Service in Dynamic Networks: A DRL-based Caching Scheme
por: Zhang, Zhe, et al.
Publicado: (2024)
por: Zhang, Zhe, et al.
Publicado: (2024)
Robust Multi-modal Task-oriented Communications with Redundancy-aware Representations
por: Fu, Jingwen, et al.
Publicado: (2025)
por: Fu, Jingwen, et al.
Publicado: (2025)
Multi-modal and Metadata Capture Model for Micro Video Popularity Prediction
por: Lu, Jiacheng, et al.
Publicado: (2025)
por: Lu, Jiacheng, et al.
Publicado: (2025)
DVF: Advancing Robust and Accurate Fine-Grained Image Retrieval with Retrieval Guidelines
por: Jiang, Xin, et al.
Publicado: (2024)
por: Jiang, Xin, et al.
Publicado: (2024)
EEmo-Bench: A Benchmark for Multi-modal Large Language Models on Image Evoked Emotion Assessment
por: Gao, Lancheng, et al.
Publicado: (2025)
por: Gao, Lancheng, et al.
Publicado: (2025)
Attribute-driven Disentangled Representation Learning for Multimodal Recommendation
por: Li, Zhenyang, et al.
Publicado: (2023)
por: Li, Zhenyang, et al.
Publicado: (2023)
Is One-Shot In-Context Learning Helpful for Data Selection in Task-Specific Fine-Tuning of Multimodal LLMs?
por: An, Xiao, et al.
Publicado: (2026)
por: An, Xiao, et al.
Publicado: (2026)
Revolutionizing Text-to-Image Retrieval as Autoregressive Token-to-Voken Generation
por: Li, Yongqi, et al.
Publicado: (2024)
por: Li, Yongqi, et al.
Publicado: (2024)
Tri-Ergon: Fine-grained Video-to-Audio Generation with Multi-modal Conditions and LUFS Control
por: Li, Bingliang, et al.
Publicado: (2024)
por: Li, Bingliang, et al.
Publicado: (2024)
Robust Self-Paced Hashing for Cross-Modal Retrieval with Noisy Labels
por: Pu, Ruitao, et al.
Publicado: (2025)
por: Pu, Ruitao, et al.
Publicado: (2025)
SEPS: Semantic-enhanced Patch Slimming Framework for fine-grained cross-modal alignment
por: Mao, Xinyu, et al.
Publicado: (2025)
por: Mao, Xinyu, et al.
Publicado: (2025)
Fine-grained Textual Inversion Network for Zero-Shot Composed Image Retrieval
por: Lin, Haoqiang, et al.
Publicado: (2025)
por: Lin, Haoqiang, et al.
Publicado: (2025)
Emotional Cues Extraction and Fusion for Multi-modal Emotion Prediction and Recognition in Conversation
por: Shi, Haoxiang, et al.
Publicado: (2024)
por: Shi, Haoxiang, et al.
Publicado: (2024)
MDF: A Dynamic Fusion Model for Multi-modal Fake News Detection
por: Lv, Hongzhen, et al.
Publicado: (2024)
por: Lv, Hongzhen, et al.
Publicado: (2024)
PromptHash: Affinity-Prompted Collaborative Cross-Modal Learning for Adaptive Hashing Retrieval
por: Zou, Qiang, et al.
Publicado: (2025)
por: Zou, Qiang, et al.
Publicado: (2025)
Mixture-of-Prompt-Experts for Multi-modal Semantic Understanding
por: Wu, Zichen, et al.
Publicado: (2024)
por: Wu, Zichen, et al.
Publicado: (2024)
Quantifying and Enhancing Multi-modal Robustness with Modality Preference
por: Yang, Zequn, et al.
Publicado: (2024)
por: Yang, Zequn, et al.
Publicado: (2024)
SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval
por: Wu, Siwei, et al.
Publicado: (2024)
por: Wu, Siwei, et al.
Publicado: (2024)
Inter-Frame Coding for Dynamic Meshes via Coarse-to-Fine Anchor Mesh Generation
por: Huang, He, et al.
Publicado: (2024)
por: Huang, He, et al.
Publicado: (2024)
Spatiotemporal Graph Guided Multi-modal Network for Livestreaming Product Retrieval
por: Hu, Xiaowan, et al.
Publicado: (2024)
por: Hu, Xiaowan, et al.
Publicado: (2024)
Routing Experts: Learning to Route Dynamic Experts in Multi-modal Large Language Models
por: Wu, Qiong, et al.
Publicado: (2024)
por: Wu, Qiong, et al.
Publicado: (2024)
Fine-grained Knowledge Graph-driven Video-Language Learning for Action Recognition
por: Zhang, Rui, et al.
Publicado: (2024)
por: Zhang, Rui, et al.
Publicado: (2024)
Towards Alleviating Text-to-Image Retrieval Hallucination for CLIP in Zero-shot Learning
por: Wang, Hanyao, et al.
Publicado: (2024)
por: Wang, Hanyao, et al.
Publicado: (2024)
Towards Multimodal Sentiment Analysis via Contrastive Cross-modal Retrieval Augmentation and Hierachical Prompts
por: Zhao, Xianbing, et al.
Publicado: (2025)
por: Zhao, Xianbing, et al.
Publicado: (2025)
PixCLIP: Achieving Fine-grained Visual Language Understanding via Any-granularity Pixel-Text Alignment Learning
por: Xiao, Yicheng, et al.
Publicado: (2025)
por: Xiao, Yicheng, et al.
Publicado: (2025)
M3SD: Multi-modal, Multi-scenario and Multi-language Speaker Diarization Dataset
por: Wu, Shilong
Publicado: (2025)
por: Wu, Shilong
Publicado: (2025)
MMPKUBase: A Comprehensive and High-quality Chinese Multi-modal Knowledge Graph
por: Yi, Xuan, et al.
Publicado: (2024)
por: Yi, Xuan, et al.
Publicado: (2024)
Unified Generative and Discriminative Training for Multi-modal Large Language Models
por: Chow, Wei, et al.
Publicado: (2024)
por: Chow, Wei, et al.
Publicado: (2024)
Start from Video-Music Retrieval: An Inter-Intra Modal Loss for Cross Modal Retrieval
por: Chen, Zeyu, et al.
Publicado: (2024)
por: Chen, Zeyu, et al.
Publicado: (2024)
SpeechCraft: A Fine-grained Expressive Speech Dataset with Natural Language Description
por: Jin, Zeyu, et al.
Publicado: (2024)
por: Jin, Zeyu, et al.
Publicado: (2024)
Ejemplares similares
-
Simple but Effective Raw-Data Level Multimodal Fusion for Composed Image Retrieval
por: Wen, Haokun, et al.
Publicado: (2024) -
Fine-grained Image Retrieval via Dual-Vision Adaptation
por: Jiang, Xin, et al.
Publicado: (2025) -
Deep Mamba Multi-modal Learning
por: Zhu, Jian, et al.
Publicado: (2024) -
MRD: Multi-resolution Retrieval-Detection Fusion for High-Resolution Image Understanding
por: Yang, Fan, et al.
Publicado: (2025) -
Rethinking Vision Transformer for Large-Scale Fine-Grained Image Retrieval
por: Jiang, Xin, et al.
Publicado: (2025)