:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wan, Zhang, Tang, Sheng, Wei, Jiawei, Zhang, Ruize, Cao, Juan
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2410.10751
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

DragAnything: Motion Control for Anything using Entity Representation
by: Wu, Weijia, et al.
Published: (2024)

3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation
by: Fu, Xiao, et al.
Published: (2024)

Leveraging Entity Information for Cross-Modality Correlation Learning: The Entity-Guided Multimodal Summarization
by: Zhang, Yanghai, et al.
Published: (2024)

EntityBench: Towards Entity-Consistent Long-Range Multi-Shot Video Generation
by: He, Ruozhen, et al.
Published: (2026)

DragVideo: Interactive Drag-style Video Editing
by: Deng, Yufan, et al.
Published: (2023)

Knowledge Guided Entity-aware Video Captioning and A Basketball Benchmark
by: Xi, Zeyu, et al.
Published: (2024)

OmniDrag: Enabling Motion Control for Omnidirectional Image-to-Video Generation
by: Li, Weiqi, et al.
Published: (2024)

Tora2: Motion and Appearance Customized Diffusion Transformer for Multi-Entity Video Generation
by: Zhang, Zhenghao, et al.
Published: (2025)

EAMA : Entity-Aware Multimodal Alignment Based Approach for News Image Captioning
by: Zhang, Junzhe, et al.
Published: (2024)

VP-MEL: Visual Prompts Guided Multimodal Entity Linking
by: Mi, Hongze, et al.
Published: (2024)

EliGen: Entity-Level Controlled Image Generation with Regional Attention
by: Zhang, Hong, et al.
Published: (2025)

DragMesh: Interactive 3D Generation Made Easy
by: Zhang, Tianshan, et al.
Published: (2025)

A Proposal-Free Query-Guided Network for Grounded Multimodal Named Entity Recognition
by: Li, Hongbing, et al.
Published: (2026)

Entity-NeRF: Detecting and Removing Moving Entities in Urban Scenes
by: Otonari, Takashi, et al.
Published: (2024)

Video Summarization: Towards Entity-Aware Captions
by: Ayyubi, Hammad A., et al.
Published: (2023)

Streaming Drag-Oriented Interactive Video Manipulation: Drag Anything, Anytime!
by: Zhou, Junbao, et al.
Published: (2025)

ECIS-VQG: Generation of Entity-centric Information-seeking Questions from Videos
by: Phukan, Arpan, et al.
Published: (2024)

Causal-Entity Reflected Egocentric Traffic Accident Video Synthesis
by: Li, Lei-lei, et al.
Published: (2025)

Causal Prompt Calibration Guided Segment Anything Model for Open-Vocabulary Multi-Entity Segmentation
by: Wang, Jingyao, et al.
Published: (2025)

StableDrag: Stable Dragging for Point-based Image Editing
by: Cui, Yutao, et al.
Published: (2024)

Self-supervised Video Instance Segmentation Can Boost Geographic Entity Alignment in Historical Maps
by: Xia, Xue, et al.
Published: (2024)

Entity-Guided Multi-Task Learning for Infrared and Visible Image Fusion
by: Shao, Wenyu, et al.
Published: (2026)

VFM$^{4}$SDG: Unveiling the Power of VFMs for Single-Domain Generalized Object Detection
by: Zhang, Yupeng, et al.
Published: (2026)

HENASY: Learning to Assemble Scene-Entities for Egocentric Video-Language Model
by: Vo, Khoa, et al.
Published: (2024)

EntityCLIP: Entity-Centric Image-Text Matching via Multimodal Attentive Contrastive Learning
by: Wang, Yaxiong, et al.
Published: (2024)

Grounding Language Models for Visual Entity Recognition
by: Xiao, Zilin, et al.
Published: (2024)

Generating Fine Details of Entity Interactions
by: Gu, Xinyi, et al.
Published: (2025)

KITTEN: A Knowledge-Intensive Evaluation of Image Generation on Visual Entities
by: Huang, Hsin-Ping, et al.
Published: (2024)

C-Drag: Chain-of-Thought Driven Motion Controller for Video Generation
by: Li, Yuhao, et al.
Published: (2025)

Entity6K: A Large Open-Domain Evaluation Dataset for Real-World Entity Recognition
by: Qiu, Jielin, et al.
Published: (2024)

ODOV: Benchmark the Open-Domain Open-Vocabulary Object Detection
by: Zhang, Yupeng, et al.
Published: (2025)

NoOVD: Novel Category Discovery and Embedding for Open-Vocabulary Object Detection
by: Zhang, Yupeng, et al.
Published: (2026)

LV-OSD: Language-Vision-Complementary Open-Set Object Detection
by: Zhang, Yupeng, et al.
Published: (2026)

E-SAM: Training-Free Segment Every Entity Model
by: Zhang, Weiming, et al.
Published: (2025)

A Generative Approach for Wikipedia-Scale Visual Entity Recognition
by: Caron, Mathilde, et al.
Published: (2024)

When and What: Diffusion-Grounded VideoLLM with Entity Aware Segmentation for Long Video Understanding
by: Fang, Pengcheng, et al.
Published: (2025)

E2E-GMNER: End-to-End Generative Grounded Multimodal Named Entity Recognition
by: Zhang, Meng, et al.
Published: (2026)

Incantation: Natural Language as the Action Interface for Multi-Entity Video World Models
by: Zhu, Shangwen, et al.
Published: (2026)

Entity-Centric World Models: Interaction-Aware Masking for Causal Video Prediction
by: Paidi, Santosh Kumar
Published: (2026)

AdaptiveDrag: Semantic-Driven Dragging on Diffusion-Based Image Editing
by: Chen, DuoSheng, et al.
Published: (2024)