Saved in:
| Main Authors: | Yan, Shuanglin, Liu, Jun, Dong, Neng, Zhang, Liyan, Tang, Jinhui |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2409.09427 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Embedding and Enriching Explicit Semantics for Visible-Infrared Person Re-Identification
by: Dong, Neng, et al.
Published: (2024)
by: Dong, Neng, et al.
Published: (2024)
Diverse Semantics-Guided Feature Alignment and Decoupling for Visible-Infrared Person Re-Identification
by: Dong, Neng, et al.
Published: (2025)
by: Dong, Neng, et al.
Published: (2025)
Noisy-Correspondence Learning for Text-to-Image Person Re-identification
by: Qin, Yang, et al.
Published: (2023)
by: Qin, Yang, et al.
Published: (2023)
Human-centered Interactive Learning via MLLMs for Text-to-Image Person Re-identification
by: Qin, Yang, et al.
Published: (2025)
by: Qin, Yang, et al.
Published: (2025)
Enhancing Visible-Infrared Person Re-identification with Modality- and Instance-aware Visual Prompt Learning
by: Wu, Ruiqi, et al.
Published: (2024)
by: Wu, Ruiqi, et al.
Published: (2024)
Cross-modal Proxy Evolving for OOD Detection with Vision-Language Models
by: Tang, Hao, et al.
Published: (2026)
by: Tang, Hao, et al.
Published: (2026)
DRFormer: A Dual-Regularized Bidirectional Transformer for Person Re-identification
by: Shu, Ying, et al.
Published: (2026)
by: Shu, Ying, et al.
Published: (2026)
ProFD: Prompt-Guided Feature Disentangling for Occluded Person Re-Identification
by: Cui, Can, et al.
Published: (2024)
by: Cui, Can, et al.
Published: (2024)
Divide-and-Conquer: Confluent Triple-Flow Network for RGB-T Salient Object Detection
by: Tang, Hao, et al.
Published: (2024)
by: Tang, Hao, et al.
Published: (2024)
Multi-scale Activation, Refinement, and Aggregation: Exploring Diverse Cues for Fine-Grained Bird Recognition
by: Zhang, Zhicheng, et al.
Published: (2025)
by: Zhang, Zhicheng, et al.
Published: (2025)
UMETTS: A Unified Framework for Emotional Text-to-Speech Synthesis with Multimodal Prompts
by: Cheng, Zhi-Qi, et al.
Published: (2024)
by: Cheng, Zhi-Qi, et al.
Published: (2024)
MambaPro: Multi-Modal Object Re-Identification with Mamba Aggregation and Synergistic Prompt
by: Wang, Yuhao, et al.
Published: (2024)
by: Wang, Yuhao, et al.
Published: (2024)
Learning Shared Sentiment Prototypes for Adaptive Multimodal Sentiment Analysis
by: Su, Chen, et al.
Published: (2026)
by: Su, Chen, et al.
Published: (2026)
Towards Alleviating Text-to-Image Retrieval Hallucination for CLIP in Zero-shot Learning
by: Wang, Hanyao, et al.
Published: (2024)
by: Wang, Hanyao, et al.
Published: (2024)
DVF: Advancing Robust and Accurate Fine-Grained Image Retrieval with Retrieval Guidelines
by: Jiang, Xin, et al.
Published: (2024)
by: Jiang, Xin, et al.
Published: (2024)
Efficient Vision Language Model Fine-tuning for Text-based Person Anomaly Search
by: He, Jiayi, et al.
Published: (2025)
by: He, Jiayi, et al.
Published: (2025)
TextRefiner: Internal Visual Feature as Efficient Refiner for Vision-Language Models Prompt Tuning
by: Xie, Jingjing, et al.
Published: (2024)
by: Xie, Jingjing, et al.
Published: (2024)
Controllable Text-to-Speech Synthesis with Masked-Autoencoded Style-Rich Representation
by: Wang, Yongqi, et al.
Published: (2025)
by: Wang, Yongqi, et al.
Published: (2025)
Prompt-aware of Frame Sampling for Efficient Text-Video Retrieval
by: Zhang, Deyu, et al.
Published: (2025)
by: Zhang, Deyu, et al.
Published: (2025)
TIP and Polish: Text-Image-Prototype Guided Multi-Modal Generation via Commonality-Discrepancy Modeling and Refinement
by: Ma, Zhiyong, et al.
Published: (2025)
by: Ma, Zhiyong, et al.
Published: (2025)
Towards Multimodal Sentiment Analysis via Contrastive Cross-modal Retrieval Augmentation and Hierachical Prompts
by: Zhao, Xianbing, et al.
Published: (2025)
by: Zhao, Xianbing, et al.
Published: (2025)
Personalized Image Generation with Large Multimodal Models
by: Xu, Yiyan, et al.
Published: (2024)
by: Xu, Yiyan, et al.
Published: (2024)
HybridVC: Efficient Voice Style Conversion with Text and Audio Prompts
by: Niu, Xinlei, et al.
Published: (2024)
by: Niu, Xinlei, et al.
Published: (2024)
Bridging the Pose-Semantic Gap: A Cascade Framework for Text-Based Person Anomaly Search
by: Xie, Zequn, et al.
Published: (2026)
by: Xie, Zequn, et al.
Published: (2026)
Robust Duality Learning for Unsupervised Visible-Infrared Person Re-Identification
by: Li, Yongxiang, et al.
Published: (2025)
by: Li, Yongxiang, et al.
Published: (2025)
A User-Friendly Framework for Generating Model-Preferred Prompts in Text-to-Image Synthesis
by: Hei, Nailei, et al.
Published: (2024)
by: Hei, Nailei, et al.
Published: (2024)
DeepStream: Prototyping Deep Joint Source-Channel Coding for Real-Time Multimedia Transmissions
by: Chi, Kaiyi, et al.
Published: (2025)
by: Chi, Kaiyi, et al.
Published: (2025)
IDEA: Inverted Text with Cooperative Deformable Aggregation for Multi-modal Object Re-Identification
by: Wang, Yuhao, et al.
Published: (2025)
by: Wang, Yuhao, et al.
Published: (2025)
TCAN: Text-oriented Cross Attention Network for Multimodal Sentiment Analysis
by: Quan, Weize, et al.
Published: (2024)
by: Quan, Weize, et al.
Published: (2024)
Generating Digital Models Using Text-to-3D and Image-to-3D Prompts: Critical Case Study
by: Ziatdinov, Rushan, et al.
Published: (2025)
by: Ziatdinov, Rushan, et al.
Published: (2025)
COPA: Efficient Vision-Language Pre-training Through Collaborative Object- and Patch-Text Alignment
by: Jiang, Chaoya, et al.
Published: (2023)
by: Jiang, Chaoya, et al.
Published: (2023)
A Collaborative Extended Reality Prototype for 3D Surgical Planning and Visualization
by: Qiu, Shi, et al.
Published: (2026)
by: Qiu, Shi, et al.
Published: (2026)
Identity-Preserving Text-to-Video Generation via Training-Free Prompt, Image, and Guidance Enhancement
by: Gao, Jiayi, et al.
Published: (2025)
by: Gao, Jiayi, et al.
Published: (2025)
ProMSC-MIS: Prompt-based Multimodal Semantic Communication for Multi-Spectral Image Segmentation
by: Zhang, Haoshuo, et al.
Published: (2025)
by: Zhang, Haoshuo, et al.
Published: (2025)
Beyond Walking: A Large-Scale Image-Text Benchmark for Text-based Person Anomaly Search
by: Yang, Shuyu, et al.
Published: (2024)
by: Yang, Shuyu, et al.
Published: (2024)
CustomDancer: Customized Dance Recommendation by Text-Dance Retrieval
by: Qin, Yawen, et al.
Published: (2026)
by: Qin, Yawen, et al.
Published: (2026)
Video Streaming with Kairos: An MPC-Based ABR with Streaming-Aware Throughput Prediction
by: Zhong, Ziyu, et al.
Published: (2025)
by: Zhong, Ziyu, et al.
Published: (2025)
Efficient Prompt Tuning for Hierarchical Ingredient Recognition
by: Gui, Yinxuan, et al.
Published: (2025)
by: Gui, Yinxuan, et al.
Published: (2025)
Learning Switchable Priors for Neural Image Compression
by: Zhang, Haotian, et al.
Published: (2025)
by: Zhang, Haotian, et al.
Published: (2025)
FinCall-Surprise: A Large Scale Multi-modal Benchmark for Earning Surprise Prediction
by: Shu, Dong, et al.
Published: (2025)
by: Shu, Dong, et al.
Published: (2025)
Similar Items
-
Embedding and Enriching Explicit Semantics for Visible-Infrared Person Re-Identification
by: Dong, Neng, et al.
Published: (2024) -
Diverse Semantics-Guided Feature Alignment and Decoupling for Visible-Infrared Person Re-Identification
by: Dong, Neng, et al.
Published: (2025) -
Noisy-Correspondence Learning for Text-to-Image Person Re-identification
by: Qin, Yang, et al.
Published: (2023) -
Human-centered Interactive Learning via MLLMs for Text-to-Image Person Re-identification
by: Qin, Yang, et al.
Published: (2025) -
Enhancing Visible-Infrared Person Re-identification with Modality- and Instance-aware Visual Prompt Learning
by: Wu, Ruiqi, et al.
Published: (2024)