:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Kang, Taewon, Kothandaraman, Divya, Lin, Ming C.
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2503.06310
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

3D-free meets 3D priors: Novel View Synthesis from a Single Image with Pretrained Diffusion Guidance
by: Kang, Taewon, et al.
Published: (2024)

Differentiable Frequency-based Disentanglement for Aerial Video Action Recognition
by: Kothandaraman, Divya, et al.
Published: (2022)

Character-Centered Dialogue Generation from Scene-Level Prompts
by: Kang, Taewon, et al.
Published: (2025)

NEGATE: Constrained Semantic Guidance for Linguistic Negation in Text-to-Video Diffusion
by: Kang, Taewon, et al.
Published: (2026)

Financial Models in Generative Art: Black-Scholes-Inspired Concept Blending in Text-to-Image Diffusion
by: Kothandaraman, Divya, et al.
Published: (2024)

Text Prompting for Multi-Concept Video Customization by Autoregressive Generation
by: Kothandaraman, Divya, et al.
Published: (2024)

HawkI: Homography & Mutual Information Guidance for 3D-free Single Image to Aerial View
by: Kothandaraman, Divya, et al.
Published: (2023)

ImPoster: Text and Frequency Guidance for Subject Driven Action Personalization using Diffusion Models
by: Kothandaraman, Divya, et al.
Published: (2024)

BoMuDANet: Unsupervised Adaptation for Visual Scene Understanding in Unstructured Driving Environments
by: Kothandaraman, Divya, et al.
Published: (2020)

Beyond Memorization: Selective Learning for Copyright-Safe Diffusion Model Training
by: Kothandaraman, Divya, et al.
Published: (2025)

Placing Human Animations into 3D Scenes by Learning Interaction- and Geometry-Driven Keyframes
by: Mullen Jr, James F., et al.
Published: (2022)

SS-SFDA : Self-Supervised Source-Free Domain Adaptation for Road Segmentation in Hazardous Environments
by: Kothandaraman, Divya, et al.
Published: (2020)

Zero-Shot Personalized Camera Motion Control for Image-to-Video Synthesis
by: Guhan, Pooja, et al.
Published: (2025)

Trajectory-Guided Diffusion for Foreground-Preserving Background Generation in Multi-Layer Documents
by: Kang, Taewon
Published: (2026)

RegionRoute: Regional Style Transfer with Diffusion Model
by: Chen, Bowen, et al.
Published: (2026)

Low-Bitrate Video Compression through Semantic-Conditioned Diffusion
by: Wang, Lingdong, et al.
Published: (2025)

Text-Conditioned Background Generation for Editable Multi-Layer Documents
by: Kang, Taewon, et al.
Published: (2025)

DCR: Counterfactual Attractor Guidance for Rare Compositional Generation
by: Kang, Taewon, et al.
Published: (2026)

HART: Human Aligned Reconstruction Transformer
by: Chen, Xiyi, et al.
Published: (2025)

SALAD: Source-free Active Label-Agnostic Domain Adaptation for Classification, Segmentation and Detection
by: Kothandaraman, Divya, et al.
Published: (2022)

Region Prompt Tuning: Fine-grained Scene Text Detection Utilizing Region Text Prompt
by: Lin, Xingtao, et al.
Published: (2024)

ActionVOS: Actions as Prompts for Video Object Segmentation
by: Ouyang, Liangyang, et al.
Published: (2024)

StoryMem: Multi-shot Long Video Storytelling with Memory
by: Zhang, Kaiwen, et al.
Published: (2025)

The Lost Melody: Empirical Observations on Text-to-Video Generation From A Storytelling Perspective
by: Shin, Andrew, et al.
Published: (2024)

3D Scene Prompting for Scene-Consistent Camera-Controllable Video Generation
by: Lee, JoungBin, et al.
Published: (2025)

Chat-Edit-3D: Interactive 3D Scene Editing via Text Prompts
by: Fang, Shuangkang, et al.
Published: (2024)

DEVIAS: Learning Disentangled Video Representations of Action and Scene
by: Bae, Kyungho, et al.
Published: (2023)

SceneEval: Evaluating Semantic Coherence in Text-Conditioned 3D Indoor Scene Synthesis
by: Tam, Hou In Ivan, et al.
Published: (2025)

DynamicScaler: Seamless and Scalable Video Generation for Panoramic Scenes
by: Liu, Jinxiu, et al.
Published: (2024)

Enhancing Motion in Text-to-Video Generation with Decomposed Encoding and Conditioning
by: Ruan, Penghui, et al.
Published: (2024)

Tri-Prompting: Video Diffusion with Unified Control over Scene, Subject, and Motion
by: Zhou, Zhenghong, et al.
Published: (2026)

Precise Action-to-Video Generation Through Visual Action Prompts
by: Wang, Yuang, et al.
Published: (2025)

Event-Driven Storytelling with Multiple Lifelike Humans in a 3D Scene
by: Lim, Donggeun, et al.
Published: (2025)

Coherent 3D Portrait Video Reconstruction via Triplane Fusion
by: Wang, Shengze, et al.
Published: (2024)

CMFN: Cross-Modal Fusion Network for Irregular Scene Text Recognition
by: Zheng, Jinzhi, et al.
Published: (2024)

TIP-Editor: An Accurate 3D Editor Following Both Text-Prompts And Image-Prompts
by: Zhuang, Jingyu, et al.
Published: (2024)

Action Reimagined: Text-to-Pose Video Editing for Dynamic Human Actions
by: Wang, Lan, et al.
Published: (2024)

CI-VID: A Coherent Interleaved Text-Video Dataset
by: Ju, Yiming, et al.
Published: (2025)

Scene-Text Grounding for Text-Based Video Question Answering
by: Zhou, Sheng, et al.
Published: (2024)

Action-Guided Attention for Video Action Anticipation
by: Tai, Tsung-Ming, et al.
Published: (2026)