:: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Deng, Yuchuan, Hu, Zhanpeng, Xin, Zijie, Deng, Chuang, Zhao, Qijun
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2405.07459
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation
by: Yuan, Linfeng, et al.
Published: (2023)

Dynamic Patch-aware Enrichment Transformer for Occluded Person Re-Identification
by: Zhang, Xin, et al.
Published: (2024)

Boosting Weak Positives for Text Based Person Search
by: Modi, Akshay, et al.
Published: (2025)

Empowering Small VLMs to Think with Dynamic Memorization and Exploration
by: Liu, Jiazhen, et al.
Published: (2025)

Fundus-R1: Training a Fundus-Reading MLLM with Knowledge-Aware Reasoning on Public Data
by: Deng, Yuchuan, et al.
Published: (2026)

Hierarchical Generative Network for Face Morphing Attacks
by: He, Zuyuan, et al.
Published: (2024)

Optimal-Landmark-Guided Image Blending for Face Morphing Attacks
by: He, Qiaoyun, et al.
Published: (2024)

MINDiff: Mask-Integrated Negative Attention for Controlling Overfitting in Text-to-Image Personalization
by: Jeong, Seulgi, et al.
Published: (2025)

XHand: Real-time Expressive Hand Avatar
by: Gan, Qijun, et al.
Published: (2024)

TextPSG: Panoptic Scene Graph Generation from Textual Descriptions
by: Zhao, Chengyang, et al.
Published: (2023)

Learning Partially-Decorrelated Common Spaces for Ad-hoc Video Search
by: Hu, Fan, et al.
Published: (2025)

AnomalyLMM: Bridging Generative Knowledge and Discriminative Retrieval for Text-Based Person Anomaly Search
by: Ju, Hao, et al.
Published: (2025)

Cross-modal Fuzzy Alignment Network for Text-Aerial Person Retrieval and A Large-scale Benchmark
by: Deng, Yifei, et al.
Published: (2026)

Event Voxel Set Transformer for Spatiotemporal Representation Learning on Event Streams
by: Xie, Bochen, et al.
Published: (2023)

Bootstrapping MLLM for Weakly-Supervised Class-Agnostic Object Counting
by: Zhang, Xiaowen, et al.
Published: (2026)

Tailored Visions: Enhancing Text-to-Image Generation with Personalized Prompt Rewriting
by: Chen, Zijie, et al.
Published: (2023)

CamoSAM2: Motion-Appearance Induced Auto-Refining Prompts for Video Camouflaged Object Detection
by: Zhang, Xin, et al.
Published: (2025)

Mamba-based Spatio-Frequency Motion Perception for Video Camouflaged Object Detection
by: Li, Xin, et al.
Published: (2025)

IDAdapter: Learning Mixed Features for Tuning-Free Personalization of Text-to-Image Models
by: Cui, Siying, et al.
Published: (2024)

SCMM: Calibrating Cross-modal Representations for Text-Based Person Search
by: Liu, Jing, et al.
Published: (2023)

Dynamic Uncertainty Learning with Noisy Correspondence for Text-Based Person Search
by: Xie, Zequn, et al.
Published: (2025)

Fast One-Stage Unsupervised Domain Adaptive Person Search
by: Cui, Tianxiang, et al.
Published: (2024)

Semi-supervised Text-based Person Search
by: Gao, Daming, et al.
Published: (2024)

Unsupervised Integrated-Circuit Defect Segmentation via Image-Intrinsic Normality
by: Zhao, Botong, et al.
Published: (2025)

SAVE: Speech-Aware Video Representation Learning for Video-Text Retrieval
by: Zhao, Ruixiang, et al.
Published: (2026)

CONQUER: Context-Aware Representation with Query Enhancement for Text-Based Person Search
by: Xie, Zequn
Published: (2026)

Enhancing Micro Gesture Recognition for Emotion Understanding via Context-aware Visual-Text Contrastive Learning
by: Li, Deng, et al.
Published: (2024)

Bridging the Pose-Semantic Gap: A Cascade Framework for Text-Based Person Anomaly Search
by: Xie, Zequn, et al.
Published: (2026)

Enhancing Visual Representation for Text-based Person Searching
by: Shen, Wei, et al.
Published: (2024)

Bootstrapping Vision-language Models for Self-supervised Remote Physiological Measurement
by: Yue, Zijie, et al.
Published: (2024)

Uncertainty-Aware Prototype Semantic Decoupling for Text-Based Person Search in Full Images
by: Luo, Zengli, et al.
Published: (2025)

Prompting Continual Person Search
by: Zhang, Pengcheng, et al.
Published: (2024)

Decoupled Cross-Modal Alignment Network for Text-RGBT Person Retrieval and A High-Quality Benchmark
by: Deng, Yifei, et al.
Published: (2025)

Playing to Vision Foundation Model's Strengths in Stereo Matching
by: Liu, Chuang-Wei, et al.
Published: (2024)

Harnessing Weak Pair Uncertainty for Text-based Person Search
by: Sun, Jintao, et al.
Published: (2026)

These Maps Are Made by Propagation: Adapting Deep Stereo Networks to Road Scenarios with Decisive Disparity Diffusion
by: Liu, Chuang-Wei, et al.
Published: (2024)

Fully Exploiting Vision Foundation Model's Profound Prior Knowledge for Generalizable RGB-Depth Driving Scene Parsing
by: Guo, Sicen, et al.
Published: (2025)

QEMesh: Employing A Quadric Error Metrics-Based Representation for Mesh Generation
by: Li, Jiaqi, et al.
Published: (2025)

Selectively Informative Description can Reduce Undesired Embedding Entanglements in Text-to-Image Personalization
by: Kim, Jimyeong, et al.
Published: (2024)

TriLoRA: Integrating SVD for Advanced Style Personalization in Text-to-Image Generation
by: Feng, Chengcheng, et al.
Published: (2024)