:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yang, Zhenkui, Huang, Zeyi, Wang, Ge, Ding, Han, Han, Tony Xiao, Wang, Fei
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2504.14621
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MobiDiary: Autoregressive Action Captioning with Wearable Devices and Wireless Signals
by: Deng, Fei, et al.
Published: (2026)

A Survey on Wi-Fi Sensing Generalizability: Taxonomy, Techniques, Datasets, and Future Research Prospects
by: Wang, Fei, et al.
Published: (2025)

DynaIP: Dynamic Image Prompt Adapter for Scalable Zero-shot Personalized Text-to-Image Generation
by: Wang, Zhizhong, et al.
Published: (2025)

Text-based Talking Video Editing with Cascaded Conditional Diffusion
by: Han, Bo, et al.
Published: (2024)

TalkCLIP: Talking Head Generation with Text-Guided Expressive Speaking Styles
by: Ma, Yifeng, et al.
Published: (2023)

GrOCE:Graph-Guided Online Concept Erasure for Text-to-Image Diffusion Models
by: Han, Ning, et al.
Published: (2025)

PoseTalk: Text-and-Audio-based Pose Control and Motion Refinement for One-Shot Talking Head Generation
by: Ling, Jun, et al.
Published: (2024)

Self-Rewarding Large Vision-Language Models for Optimizing Prompts in Text-to-Image Generation
by: Yang, Hongji, et al.
Published: (2025)

Dynamic Prompting of Frozen Text-to-Image Diffusion Models for Panoptic Narrative Grounding
by: Li, Hongyu, et al.
Published: (2024)

XRF V2: A Dataset for Action Summarization with Wi-Fi Signals, and IMUs in Phones, Watches, Earbuds, and Glasses
by: Lan, Bo, et al.
Published: (2025)

BideDPO: Conditional Image Generation with Simultaneous Text and Condition Alignment
by: Zhou, Dewei, et al.
Published: (2025)

Diversity Has Always Been There in Your Visual Autoregressive Models
by: Wang, Tong, et al.
Published: (2025)

Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation
by: Guo, Lanqing, et al.
Published: (2024)

WS-IMUBench: Can Weakly Supervised Methods from Audio, Image, and Video Be Adapted for IMU-based Temporal Action Localization?
by: Li, Pei, et al.
Published: (2026)

One Snapshot is All You Need: A Generalized Method for mmWave Signal Generation
by: Huang, Teng, et al.
Published: (2025)

EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts
by: Han, Yucheng, et al.
Published: (2024)

ConText: Driving In-context Learning for Text Removal and Segmentation
by: Zhang, Fei, et al.
Published: (2025)

Fast Prompt Alignment for Text-to-Image Generation
by: Mrini, Khalil, et al.
Published: (2024)

Text-Driven Emotionally Continuous Talking Face Generation
by: Yang, Hao, et al.
Published: (2026)

GloTSFormer: Global Video Text Spotting Transformer
by: Wang, Han, et al.
Published: (2024)

Cross-Modal Urban Sensing: Evaluating Sound-Vision Alignment Across Street-Level and Aerial Imagery
by: Chen, Pengyu, et al.
Published: (2025)

Modality Prompts for Arbitrary Modality Salient Object Detection
by: Huang, Nianchang, et al.
Published: (2024)

Text2Lip: Progressive Lip-Synced Talking Face Generation from Text via Viseme-Guided Rendering
by: Wang, Xu, et al.
Published: (2025)

Edge Approximation Text Detector
by: Yang, Chuang, et al.
Published: (2025)

Prompt-Softbox-Prompt: A Free-Text Embedding Control for Image Editing
by: Yang, Yitong, et al.
Published: (2024)

GuardT2I: Defending Text-to-Image Models from Adversarial Prompts
by: Yang, Yijun, et al.
Published: (2024)

Cross-domain EEG-based Emotion Recognition with Contrastive Learning
by: Yan, Rui, et al.
Published: (2025)

Advancing Pose-Guided Image Synthesis with Progressive Conditional Diffusion Models
by: Shen, Fei, et al.
Published: (2023)

HiLoTs: High-Low Temporal Sensitive Representation Learning for Semi-Supervised LiDAR Segmentation in Autonomous Driving
by: Lin, R. D., et al.
Published: (2025)

Beyond Open Vocabulary: Multimodal Prompting for Object Detection in Remote Sensing Images
by: Yang, Shuai, et al.
Published: (2026)

Text-Pass Filter: An Efficient Scene Text Detector
by: Yang, Chuang, et al.
Published: (2026)

StyleTalk++: A Unified Framework for Controlling the Speaking Styles of Talking Heads
by: Wang, Suzhen, et al.
Published: (2024)

mmEgoHand: Egocentric Hand Pose Estimation and Gesture Recognition with Head-mounted Millimeter-wave Radar and IMU
by: Lv, Yizhe, et al.
Published: (2025)

DreamTalk: When Emotional Talking Head Generation Meets Diffusion Probabilistic Models
by: Ma, Yifeng, et al.
Published: (2023)

PromptSafe: Gated Prompt Tuning for Safe Text-to-Image Generation
by: Jing, Zonglei, et al.
Published: (2025)

VMix: Improving Text-to-Image Diffusion Model with Cross-Attention Mixing Control
by: Wu, Shaojin, et al.
Published: (2024)

Chat-Edit-3D: Interactive 3D Scene Editing via Text Prompts
by: Fang, Shuangkang, et al.
Published: (2024)

PiTe: Pixel-Temporal Alignment for Large Video-Language Model
by: Liu, Yang, et al.
Published: (2024)

What's on Your Plate? Inferring Chinese Cuisine Intake from Wearable IMUs
by: Yin, Jiaxi, et al.
Published: (2025)

Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis
by: Zhang, Zicheng, et al.
Published: (2024)