:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yan, Shuanglin, Liu, Jun, Dong, Neng, Zhang, Liyan, Tang, Jinhui
Format:	Preprint
Published:	2024
Subjects:	Multimedia
Online Access:	https://arxiv.org/abs/2409.09427
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Embedding and Enriching Explicit Semantics for Visible-Infrared Person Re-Identification
by: Dong, Neng, et al.
Published: (2024)

Diverse Semantics-Guided Feature Alignment and Decoupling for Visible-Infrared Person Re-Identification
by: Dong, Neng, et al.
Published: (2025)

Noisy-Correspondence Learning for Text-to-Image Person Re-identification
by: Qin, Yang, et al.
Published: (2023)

Human-centered Interactive Learning via MLLMs for Text-to-Image Person Re-identification
by: Qin, Yang, et al.
Published: (2025)

Enhancing Visible-Infrared Person Re-identification with Modality- and Instance-aware Visual Prompt Learning
by: Wu, Ruiqi, et al.
Published: (2024)

Cross-modal Proxy Evolving for OOD Detection with Vision-Language Models
by: Tang, Hao, et al.
Published: (2026)

DRFormer: A Dual-Regularized Bidirectional Transformer for Person Re-identification
by: Shu, Ying, et al.
Published: (2026)

ProFD: Prompt-Guided Feature Disentangling for Occluded Person Re-Identification
by: Cui, Can, et al.
Published: (2024)

Divide-and-Conquer: Confluent Triple-Flow Network for RGB-T Salient Object Detection
by: Tang, Hao, et al.
Published: (2024)

Multi-scale Activation, Refinement, and Aggregation: Exploring Diverse Cues for Fine-Grained Bird Recognition
by: Zhang, Zhicheng, et al.
Published: (2025)

UMETTS: A Unified Framework for Emotional Text-to-Speech Synthesis with Multimodal Prompts
by: Cheng, Zhi-Qi, et al.
Published: (2024)

MambaPro: Multi-Modal Object Re-Identification with Mamba Aggregation and Synergistic Prompt
by: Wang, Yuhao, et al.
Published: (2024)

Learning Shared Sentiment Prototypes for Adaptive Multimodal Sentiment Analysis
by: Su, Chen, et al.
Published: (2026)

Towards Alleviating Text-to-Image Retrieval Hallucination for CLIP in Zero-shot Learning
by: Wang, Hanyao, et al.
Published: (2024)

DVF: Advancing Robust and Accurate Fine-Grained Image Retrieval with Retrieval Guidelines
by: Jiang, Xin, et al.
Published: (2024)

Efficient Vision Language Model Fine-tuning for Text-based Person Anomaly Search
by: He, Jiayi, et al.
Published: (2025)

TextRefiner: Internal Visual Feature as Efficient Refiner for Vision-Language Models Prompt Tuning
by: Xie, Jingjing, et al.
Published: (2024)

Controllable Text-to-Speech Synthesis with Masked-Autoencoded Style-Rich Representation
by: Wang, Yongqi, et al.
Published: (2025)

Prompt-aware of Frame Sampling for Efficient Text-Video Retrieval
by: Zhang, Deyu, et al.
Published: (2025)

TIP and Polish: Text-Image-Prototype Guided Multi-Modal Generation via Commonality-Discrepancy Modeling and Refinement
by: Ma, Zhiyong, et al.
Published: (2025)

Towards Multimodal Sentiment Analysis via Contrastive Cross-modal Retrieval Augmentation and Hierachical Prompts
by: Zhao, Xianbing, et al.
Published: (2025)

Personalized Image Generation with Large Multimodal Models
by: Xu, Yiyan, et al.
Published: (2024)

HybridVC: Efficient Voice Style Conversion with Text and Audio Prompts
by: Niu, Xinlei, et al.
Published: (2024)

Bridging the Pose-Semantic Gap: A Cascade Framework for Text-Based Person Anomaly Search
by: Xie, Zequn, et al.
Published: (2026)

Robust Duality Learning for Unsupervised Visible-Infrared Person Re-Identification
by: Li, Yongxiang, et al.
Published: (2025)

A User-Friendly Framework for Generating Model-Preferred Prompts in Text-to-Image Synthesis
by: Hei, Nailei, et al.
Published: (2024)

DeepStream: Prototyping Deep Joint Source-Channel Coding for Real-Time Multimedia Transmissions
by: Chi, Kaiyi, et al.
Published: (2025)

IDEA: Inverted Text with Cooperative Deformable Aggregation for Multi-modal Object Re-Identification
by: Wang, Yuhao, et al.
Published: (2025)

TCAN: Text-oriented Cross Attention Network for Multimodal Sentiment Analysis
by: Quan, Weize, et al.
Published: (2024)

Generating Digital Models Using Text-to-3D and Image-to-3D Prompts: Critical Case Study
by: Ziatdinov, Rushan, et al.
Published: (2025)

COPA: Efficient Vision-Language Pre-training Through Collaborative Object- and Patch-Text Alignment
by: Jiang, Chaoya, et al.
Published: (2023)

A Collaborative Extended Reality Prototype for 3D Surgical Planning and Visualization
by: Qiu, Shi, et al.
Published: (2026)

Identity-Preserving Text-to-Video Generation via Training-Free Prompt, Image, and Guidance Enhancement
by: Gao, Jiayi, et al.
Published: (2025)

ProMSC-MIS: Prompt-based Multimodal Semantic Communication for Multi-Spectral Image Segmentation
by: Zhang, Haoshuo, et al.
Published: (2025)

Beyond Walking: A Large-Scale Image-Text Benchmark for Text-based Person Anomaly Search
by: Yang, Shuyu, et al.
Published: (2024)

CustomDancer: Customized Dance Recommendation by Text-Dance Retrieval
by: Qin, Yawen, et al.
Published: (2026)

Video Streaming with Kairos: An MPC-Based ABR with Streaming-Aware Throughput Prediction
by: Zhong, Ziyu, et al.
Published: (2025)

Efficient Prompt Tuning for Hierarchical Ingredient Recognition
by: Gui, Yinxuan, et al.
Published: (2025)

Learning Switchable Priors for Neural Image Compression
by: Zhang, Haotian, et al.
Published: (2025)

FinCall-Surprise: A Large Scale Multi-modal Benchmark for Earning Surprise Prediction
by: Shu, Dong, et al.
Published: (2025)