Saved in:
| Main Authors: | Yang, Zhenkui, Huang, Zeyi, Wang, Ge, Ding, Han, Han, Tony Xiao, Wang, Fei |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.14621 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MobiDiary: Autoregressive Action Captioning with Wearable Devices and Wireless Signals
by: Deng, Fei, et al.
Published: (2026)
by: Deng, Fei, et al.
Published: (2026)
A Survey on Wi-Fi Sensing Generalizability: Taxonomy, Techniques, Datasets, and Future Research Prospects
by: Wang, Fei, et al.
Published: (2025)
by: Wang, Fei, et al.
Published: (2025)
DynaIP: Dynamic Image Prompt Adapter for Scalable Zero-shot Personalized Text-to-Image Generation
by: Wang, Zhizhong, et al.
Published: (2025)
by: Wang, Zhizhong, et al.
Published: (2025)
Text-based Talking Video Editing with Cascaded Conditional Diffusion
by: Han, Bo, et al.
Published: (2024)
by: Han, Bo, et al.
Published: (2024)
TalkCLIP: Talking Head Generation with Text-Guided Expressive Speaking Styles
by: Ma, Yifeng, et al.
Published: (2023)
by: Ma, Yifeng, et al.
Published: (2023)
GrOCE:Graph-Guided Online Concept Erasure for Text-to-Image Diffusion Models
by: Han, Ning, et al.
Published: (2025)
by: Han, Ning, et al.
Published: (2025)
PoseTalk: Text-and-Audio-based Pose Control and Motion Refinement for One-Shot Talking Head Generation
by: Ling, Jun, et al.
Published: (2024)
by: Ling, Jun, et al.
Published: (2024)
Self-Rewarding Large Vision-Language Models for Optimizing Prompts in Text-to-Image Generation
by: Yang, Hongji, et al.
Published: (2025)
by: Yang, Hongji, et al.
Published: (2025)
Dynamic Prompting of Frozen Text-to-Image Diffusion Models for Panoptic Narrative Grounding
by: Li, Hongyu, et al.
Published: (2024)
by: Li, Hongyu, et al.
Published: (2024)
XRF V2: A Dataset for Action Summarization with Wi-Fi Signals, and IMUs in Phones, Watches, Earbuds, and Glasses
by: Lan, Bo, et al.
Published: (2025)
by: Lan, Bo, et al.
Published: (2025)
BideDPO: Conditional Image Generation with Simultaneous Text and Condition Alignment
by: Zhou, Dewei, et al.
Published: (2025)
by: Zhou, Dewei, et al.
Published: (2025)
Diversity Has Always Been There in Your Visual Autoregressive Models
by: Wang, Tong, et al.
Published: (2025)
by: Wang, Tong, et al.
Published: (2025)
Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation
by: Guo, Lanqing, et al.
Published: (2024)
by: Guo, Lanqing, et al.
Published: (2024)
WS-IMUBench: Can Weakly Supervised Methods from Audio, Image, and Video Be Adapted for IMU-based Temporal Action Localization?
by: Li, Pei, et al.
Published: (2026)
by: Li, Pei, et al.
Published: (2026)
One Snapshot is All You Need: A Generalized Method for mmWave Signal Generation
by: Huang, Teng, et al.
Published: (2025)
by: Huang, Teng, et al.
Published: (2025)
EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts
by: Han, Yucheng, et al.
Published: (2024)
by: Han, Yucheng, et al.
Published: (2024)
ConText: Driving In-context Learning for Text Removal and Segmentation
by: Zhang, Fei, et al.
Published: (2025)
by: Zhang, Fei, et al.
Published: (2025)
Fast Prompt Alignment for Text-to-Image Generation
by: Mrini, Khalil, et al.
Published: (2024)
by: Mrini, Khalil, et al.
Published: (2024)
Text-Driven Emotionally Continuous Talking Face Generation
by: Yang, Hao, et al.
Published: (2026)
by: Yang, Hao, et al.
Published: (2026)
GloTSFormer: Global Video Text Spotting Transformer
by: Wang, Han, et al.
Published: (2024)
by: Wang, Han, et al.
Published: (2024)
Cross-Modal Urban Sensing: Evaluating Sound-Vision Alignment Across Street-Level and Aerial Imagery
by: Chen, Pengyu, et al.
Published: (2025)
by: Chen, Pengyu, et al.
Published: (2025)
Modality Prompts for Arbitrary Modality Salient Object Detection
by: Huang, Nianchang, et al.
Published: (2024)
by: Huang, Nianchang, et al.
Published: (2024)
Text2Lip: Progressive Lip-Synced Talking Face Generation from Text via Viseme-Guided Rendering
by: Wang, Xu, et al.
Published: (2025)
by: Wang, Xu, et al.
Published: (2025)
Edge Approximation Text Detector
by: Yang, Chuang, et al.
Published: (2025)
by: Yang, Chuang, et al.
Published: (2025)
Prompt-Softbox-Prompt: A Free-Text Embedding Control for Image Editing
by: Yang, Yitong, et al.
Published: (2024)
by: Yang, Yitong, et al.
Published: (2024)
GuardT2I: Defending Text-to-Image Models from Adversarial Prompts
by: Yang, Yijun, et al.
Published: (2024)
by: Yang, Yijun, et al.
Published: (2024)
Cross-domain EEG-based Emotion Recognition with Contrastive Learning
by: Yan, Rui, et al.
Published: (2025)
by: Yan, Rui, et al.
Published: (2025)
Advancing Pose-Guided Image Synthesis with Progressive Conditional Diffusion Models
by: Shen, Fei, et al.
Published: (2023)
by: Shen, Fei, et al.
Published: (2023)
HiLoTs: High-Low Temporal Sensitive Representation Learning for Semi-Supervised LiDAR Segmentation in Autonomous Driving
by: Lin, R. D., et al.
Published: (2025)
by: Lin, R. D., et al.
Published: (2025)
Beyond Open Vocabulary: Multimodal Prompting for Object Detection in Remote Sensing Images
by: Yang, Shuai, et al.
Published: (2026)
by: Yang, Shuai, et al.
Published: (2026)
Text-Pass Filter: An Efficient Scene Text Detector
by: Yang, Chuang, et al.
Published: (2026)
by: Yang, Chuang, et al.
Published: (2026)
StyleTalk++: A Unified Framework for Controlling the Speaking Styles of Talking Heads
by: Wang, Suzhen, et al.
Published: (2024)
by: Wang, Suzhen, et al.
Published: (2024)
mmEgoHand: Egocentric Hand Pose Estimation and Gesture Recognition with Head-mounted Millimeter-wave Radar and IMU
by: Lv, Yizhe, et al.
Published: (2025)
by: Lv, Yizhe, et al.
Published: (2025)
DreamTalk: When Emotional Talking Head Generation Meets Diffusion Probabilistic Models
by: Ma, Yifeng, et al.
Published: (2023)
by: Ma, Yifeng, et al.
Published: (2023)
PromptSafe: Gated Prompt Tuning for Safe Text-to-Image Generation
by: Jing, Zonglei, et al.
Published: (2025)
by: Jing, Zonglei, et al.
Published: (2025)
VMix: Improving Text-to-Image Diffusion Model with Cross-Attention Mixing Control
by: Wu, Shaojin, et al.
Published: (2024)
by: Wu, Shaojin, et al.
Published: (2024)
Chat-Edit-3D: Interactive 3D Scene Editing via Text Prompts
by: Fang, Shuangkang, et al.
Published: (2024)
by: Fang, Shuangkang, et al.
Published: (2024)
PiTe: Pixel-Temporal Alignment for Large Video-Language Model
by: Liu, Yang, et al.
Published: (2024)
by: Liu, Yang, et al.
Published: (2024)
What's on Your Plate? Inferring Chinese Cuisine Intake from Wearable IMUs
by: Yin, Jiaxi, et al.
Published: (2025)
by: Yin, Jiaxi, et al.
Published: (2025)
Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis
by: Zhang, Zicheng, et al.
Published: (2024)
by: Zhang, Zicheng, et al.
Published: (2024)
Similar Items
-
MobiDiary: Autoregressive Action Captioning with Wearable Devices and Wireless Signals
by: Deng, Fei, et al.
Published: (2026) -
A Survey on Wi-Fi Sensing Generalizability: Taxonomy, Techniques, Datasets, and Future Research Prospects
by: Wang, Fei, et al.
Published: (2025) -
DynaIP: Dynamic Image Prompt Adapter for Scalable Zero-shot Personalized Text-to-Image Generation
by: Wang, Zhizhong, et al.
Published: (2025) -
Text-based Talking Video Editing with Cascaded Conditional Diffusion
by: Han, Bo, et al.
Published: (2024) -
TalkCLIP: Talking Head Generation with Text-Guided Expressive Speaking Styles
by: Ma, Yifeng, et al.
Published: (2023)