Saved in:
| Main Authors: | Deng, Yuchuan, Hu, Zhanpeng, Xin, Zijie, Deng, Chuang, Zhao, Qijun |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2405.07459 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation
by: Yuan, Linfeng, et al.
Published: (2023)
by: Yuan, Linfeng, et al.
Published: (2023)
Dynamic Patch-aware Enrichment Transformer for Occluded Person Re-Identification
by: Zhang, Xin, et al.
Published: (2024)
by: Zhang, Xin, et al.
Published: (2024)
Boosting Weak Positives for Text Based Person Search
by: Modi, Akshay, et al.
Published: (2025)
by: Modi, Akshay, et al.
Published: (2025)
Empowering Small VLMs to Think with Dynamic Memorization and Exploration
by: Liu, Jiazhen, et al.
Published: (2025)
by: Liu, Jiazhen, et al.
Published: (2025)
Fundus-R1: Training a Fundus-Reading MLLM with Knowledge-Aware Reasoning on Public Data
by: Deng, Yuchuan, et al.
Published: (2026)
by: Deng, Yuchuan, et al.
Published: (2026)
Hierarchical Generative Network for Face Morphing Attacks
by: He, Zuyuan, et al.
Published: (2024)
by: He, Zuyuan, et al.
Published: (2024)
Optimal-Landmark-Guided Image Blending for Face Morphing Attacks
by: He, Qiaoyun, et al.
Published: (2024)
by: He, Qiaoyun, et al.
Published: (2024)
MINDiff: Mask-Integrated Negative Attention for Controlling Overfitting in Text-to-Image Personalization
by: Jeong, Seulgi, et al.
Published: (2025)
by: Jeong, Seulgi, et al.
Published: (2025)
XHand: Real-time Expressive Hand Avatar
by: Gan, Qijun, et al.
Published: (2024)
by: Gan, Qijun, et al.
Published: (2024)
TextPSG: Panoptic Scene Graph Generation from Textual Descriptions
by: Zhao, Chengyang, et al.
Published: (2023)
by: Zhao, Chengyang, et al.
Published: (2023)
Learning Partially-Decorrelated Common Spaces for Ad-hoc Video Search
by: Hu, Fan, et al.
Published: (2025)
by: Hu, Fan, et al.
Published: (2025)
AnomalyLMM: Bridging Generative Knowledge and Discriminative Retrieval for Text-Based Person Anomaly Search
by: Ju, Hao, et al.
Published: (2025)
by: Ju, Hao, et al.
Published: (2025)
Cross-modal Fuzzy Alignment Network for Text-Aerial Person Retrieval and A Large-scale Benchmark
by: Deng, Yifei, et al.
Published: (2026)
by: Deng, Yifei, et al.
Published: (2026)
Event Voxel Set Transformer for Spatiotemporal Representation Learning on Event Streams
by: Xie, Bochen, et al.
Published: (2023)
by: Xie, Bochen, et al.
Published: (2023)
Bootstrapping MLLM for Weakly-Supervised Class-Agnostic Object Counting
by: Zhang, Xiaowen, et al.
Published: (2026)
by: Zhang, Xiaowen, et al.
Published: (2026)
Tailored Visions: Enhancing Text-to-Image Generation with Personalized Prompt Rewriting
by: Chen, Zijie, et al.
Published: (2023)
by: Chen, Zijie, et al.
Published: (2023)
CamoSAM2: Motion-Appearance Induced Auto-Refining Prompts for Video Camouflaged Object Detection
by: Zhang, Xin, et al.
Published: (2025)
by: Zhang, Xin, et al.
Published: (2025)
Mamba-based Spatio-Frequency Motion Perception for Video Camouflaged Object Detection
by: Li, Xin, et al.
Published: (2025)
by: Li, Xin, et al.
Published: (2025)
IDAdapter: Learning Mixed Features for Tuning-Free Personalization of Text-to-Image Models
by: Cui, Siying, et al.
Published: (2024)
by: Cui, Siying, et al.
Published: (2024)
SCMM: Calibrating Cross-modal Representations for Text-Based Person Search
by: Liu, Jing, et al.
Published: (2023)
by: Liu, Jing, et al.
Published: (2023)
Dynamic Uncertainty Learning with Noisy Correspondence for Text-Based Person Search
by: Xie, Zequn, et al.
Published: (2025)
by: Xie, Zequn, et al.
Published: (2025)
Fast One-Stage Unsupervised Domain Adaptive Person Search
by: Cui, Tianxiang, et al.
Published: (2024)
by: Cui, Tianxiang, et al.
Published: (2024)
Semi-supervised Text-based Person Search
by: Gao, Daming, et al.
Published: (2024)
by: Gao, Daming, et al.
Published: (2024)
Unsupervised Integrated-Circuit Defect Segmentation via Image-Intrinsic Normality
by: Zhao, Botong, et al.
Published: (2025)
by: Zhao, Botong, et al.
Published: (2025)
SAVE: Speech-Aware Video Representation Learning for Video-Text Retrieval
by: Zhao, Ruixiang, et al.
Published: (2026)
by: Zhao, Ruixiang, et al.
Published: (2026)
CONQUER: Context-Aware Representation with Query Enhancement for Text-Based Person Search
by: Xie, Zequn
Published: (2026)
by: Xie, Zequn
Published: (2026)
Enhancing Micro Gesture Recognition for Emotion Understanding via Context-aware Visual-Text Contrastive Learning
by: Li, Deng, et al.
Published: (2024)
by: Li, Deng, et al.
Published: (2024)
Bridging the Pose-Semantic Gap: A Cascade Framework for Text-Based Person Anomaly Search
by: Xie, Zequn, et al.
Published: (2026)
by: Xie, Zequn, et al.
Published: (2026)
Enhancing Visual Representation for Text-based Person Searching
by: Shen, Wei, et al.
Published: (2024)
by: Shen, Wei, et al.
Published: (2024)
Bootstrapping Vision-language Models for Self-supervised Remote Physiological Measurement
by: Yue, Zijie, et al.
Published: (2024)
by: Yue, Zijie, et al.
Published: (2024)
Uncertainty-Aware Prototype Semantic Decoupling for Text-Based Person Search in Full Images
by: Luo, Zengli, et al.
Published: (2025)
by: Luo, Zengli, et al.
Published: (2025)
Prompting Continual Person Search
by: Zhang, Pengcheng, et al.
Published: (2024)
by: Zhang, Pengcheng, et al.
Published: (2024)
Decoupled Cross-Modal Alignment Network for Text-RGBT Person Retrieval and A High-Quality Benchmark
by: Deng, Yifei, et al.
Published: (2025)
by: Deng, Yifei, et al.
Published: (2025)
Playing to Vision Foundation Model's Strengths in Stereo Matching
by: Liu, Chuang-Wei, et al.
Published: (2024)
by: Liu, Chuang-Wei, et al.
Published: (2024)
Harnessing Weak Pair Uncertainty for Text-based Person Search
by: Sun, Jintao, et al.
Published: (2026)
by: Sun, Jintao, et al.
Published: (2026)
These Maps Are Made by Propagation: Adapting Deep Stereo Networks to Road Scenarios with Decisive Disparity Diffusion
by: Liu, Chuang-Wei, et al.
Published: (2024)
by: Liu, Chuang-Wei, et al.
Published: (2024)
Fully Exploiting Vision Foundation Model's Profound Prior Knowledge for Generalizable RGB-Depth Driving Scene Parsing
by: Guo, Sicen, et al.
Published: (2025)
by: Guo, Sicen, et al.
Published: (2025)
QEMesh: Employing A Quadric Error Metrics-Based Representation for Mesh Generation
by: Li, Jiaqi, et al.
Published: (2025)
by: Li, Jiaqi, et al.
Published: (2025)
Selectively Informative Description can Reduce Undesired Embedding Entanglements in Text-to-Image Personalization
by: Kim, Jimyeong, et al.
Published: (2024)
by: Kim, Jimyeong, et al.
Published: (2024)
TriLoRA: Integrating SVD for Advanced Style Personalization in Text-to-Image Generation
by: Feng, Chengcheng, et al.
Published: (2024)
by: Feng, Chengcheng, et al.
Published: (2024)
Similar Items
-
LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation
by: Yuan, Linfeng, et al.
Published: (2023) -
Dynamic Patch-aware Enrichment Transformer for Occluded Person Re-Identification
by: Zhang, Xin, et al.
Published: (2024) -
Boosting Weak Positives for Text Based Person Search
by: Modi, Akshay, et al.
Published: (2025) -
Empowering Small VLMs to Think with Dynamic Memorization and Exploration
by: Liu, Jiazhen, et al.
Published: (2025) -
Fundus-R1: Training a Fundus-Reading MLLM with Knowledge-Aware Reasoning on Public Data
by: Deng, Yuchuan, et al.
Published: (2026)