Saved in:
| Main Authors: | Wan, Zhang, Tang, Sheng, Wei, Jiawei, Zhang, Ruize, Cao, Juan |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2410.10751 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
DragAnything: Motion Control for Anything using Entity Representation
by: Wu, Weijia, et al.
Published: (2024)
by: Wu, Weijia, et al.
Published: (2024)
3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation
by: Fu, Xiao, et al.
Published: (2024)
by: Fu, Xiao, et al.
Published: (2024)
Leveraging Entity Information for Cross-Modality Correlation Learning: The Entity-Guided Multimodal Summarization
by: Zhang, Yanghai, et al.
Published: (2024)
by: Zhang, Yanghai, et al.
Published: (2024)
EntityBench: Towards Entity-Consistent Long-Range Multi-Shot Video Generation
by: He, Ruozhen, et al.
Published: (2026)
by: He, Ruozhen, et al.
Published: (2026)
DragVideo: Interactive Drag-style Video Editing
by: Deng, Yufan, et al.
Published: (2023)
by: Deng, Yufan, et al.
Published: (2023)
Knowledge Guided Entity-aware Video Captioning and A Basketball Benchmark
by: Xi, Zeyu, et al.
Published: (2024)
by: Xi, Zeyu, et al.
Published: (2024)
OmniDrag: Enabling Motion Control for Omnidirectional Image-to-Video Generation
by: Li, Weiqi, et al.
Published: (2024)
by: Li, Weiqi, et al.
Published: (2024)
Tora2: Motion and Appearance Customized Diffusion Transformer for Multi-Entity Video Generation
by: Zhang, Zhenghao, et al.
Published: (2025)
by: Zhang, Zhenghao, et al.
Published: (2025)
EAMA : Entity-Aware Multimodal Alignment Based Approach for News Image Captioning
by: Zhang, Junzhe, et al.
Published: (2024)
by: Zhang, Junzhe, et al.
Published: (2024)
VP-MEL: Visual Prompts Guided Multimodal Entity Linking
by: Mi, Hongze, et al.
Published: (2024)
by: Mi, Hongze, et al.
Published: (2024)
EliGen: Entity-Level Controlled Image Generation with Regional Attention
by: Zhang, Hong, et al.
Published: (2025)
by: Zhang, Hong, et al.
Published: (2025)
DragMesh: Interactive 3D Generation Made Easy
by: Zhang, Tianshan, et al.
Published: (2025)
by: Zhang, Tianshan, et al.
Published: (2025)
A Proposal-Free Query-Guided Network for Grounded Multimodal Named Entity Recognition
by: Li, Hongbing, et al.
Published: (2026)
by: Li, Hongbing, et al.
Published: (2026)
Entity-NeRF: Detecting and Removing Moving Entities in Urban Scenes
by: Otonari, Takashi, et al.
Published: (2024)
by: Otonari, Takashi, et al.
Published: (2024)
Video Summarization: Towards Entity-Aware Captions
by: Ayyubi, Hammad A., et al.
Published: (2023)
by: Ayyubi, Hammad A., et al.
Published: (2023)
Streaming Drag-Oriented Interactive Video Manipulation: Drag Anything, Anytime!
by: Zhou, Junbao, et al.
Published: (2025)
by: Zhou, Junbao, et al.
Published: (2025)
ECIS-VQG: Generation of Entity-centric Information-seeking Questions from Videos
by: Phukan, Arpan, et al.
Published: (2024)
by: Phukan, Arpan, et al.
Published: (2024)
Causal-Entity Reflected Egocentric Traffic Accident Video Synthesis
by: Li, Lei-lei, et al.
Published: (2025)
by: Li, Lei-lei, et al.
Published: (2025)
Causal Prompt Calibration Guided Segment Anything Model for Open-Vocabulary Multi-Entity Segmentation
by: Wang, Jingyao, et al.
Published: (2025)
by: Wang, Jingyao, et al.
Published: (2025)
StableDrag: Stable Dragging for Point-based Image Editing
by: Cui, Yutao, et al.
Published: (2024)
by: Cui, Yutao, et al.
Published: (2024)
Self-supervised Video Instance Segmentation Can Boost Geographic Entity Alignment in Historical Maps
by: Xia, Xue, et al.
Published: (2024)
by: Xia, Xue, et al.
Published: (2024)
Entity-Guided Multi-Task Learning for Infrared and Visible Image Fusion
by: Shao, Wenyu, et al.
Published: (2026)
by: Shao, Wenyu, et al.
Published: (2026)
VFM$^{4}$SDG: Unveiling the Power of VFMs for Single-Domain Generalized Object Detection
by: Zhang, Yupeng, et al.
Published: (2026)
by: Zhang, Yupeng, et al.
Published: (2026)
HENASY: Learning to Assemble Scene-Entities for Egocentric Video-Language Model
by: Vo, Khoa, et al.
Published: (2024)
by: Vo, Khoa, et al.
Published: (2024)
EntityCLIP: Entity-Centric Image-Text Matching via Multimodal Attentive Contrastive Learning
by: Wang, Yaxiong, et al.
Published: (2024)
by: Wang, Yaxiong, et al.
Published: (2024)
Grounding Language Models for Visual Entity Recognition
by: Xiao, Zilin, et al.
Published: (2024)
by: Xiao, Zilin, et al.
Published: (2024)
Generating Fine Details of Entity Interactions
by: Gu, Xinyi, et al.
Published: (2025)
by: Gu, Xinyi, et al.
Published: (2025)
KITTEN: A Knowledge-Intensive Evaluation of Image Generation on Visual Entities
by: Huang, Hsin-Ping, et al.
Published: (2024)
by: Huang, Hsin-Ping, et al.
Published: (2024)
C-Drag: Chain-of-Thought Driven Motion Controller for Video Generation
by: Li, Yuhao, et al.
Published: (2025)
by: Li, Yuhao, et al.
Published: (2025)
Entity6K: A Large Open-Domain Evaluation Dataset for Real-World Entity Recognition
by: Qiu, Jielin, et al.
Published: (2024)
by: Qiu, Jielin, et al.
Published: (2024)
ODOV: Benchmark the Open-Domain Open-Vocabulary Object Detection
by: Zhang, Yupeng, et al.
Published: (2025)
by: Zhang, Yupeng, et al.
Published: (2025)
NoOVD: Novel Category Discovery and Embedding for Open-Vocabulary Object Detection
by: Zhang, Yupeng, et al.
Published: (2026)
by: Zhang, Yupeng, et al.
Published: (2026)
LV-OSD: Language-Vision-Complementary Open-Set Object Detection
by: Zhang, Yupeng, et al.
Published: (2026)
by: Zhang, Yupeng, et al.
Published: (2026)
E-SAM: Training-Free Segment Every Entity Model
by: Zhang, Weiming, et al.
Published: (2025)
by: Zhang, Weiming, et al.
Published: (2025)
A Generative Approach for Wikipedia-Scale Visual Entity Recognition
by: Caron, Mathilde, et al.
Published: (2024)
by: Caron, Mathilde, et al.
Published: (2024)
When and What: Diffusion-Grounded VideoLLM with Entity Aware Segmentation for Long Video Understanding
by: Fang, Pengcheng, et al.
Published: (2025)
by: Fang, Pengcheng, et al.
Published: (2025)
E2E-GMNER: End-to-End Generative Grounded Multimodal Named Entity Recognition
by: Zhang, Meng, et al.
Published: (2026)
by: Zhang, Meng, et al.
Published: (2026)
Incantation: Natural Language as the Action Interface for Multi-Entity Video World Models
by: Zhu, Shangwen, et al.
Published: (2026)
by: Zhu, Shangwen, et al.
Published: (2026)
Entity-Centric World Models: Interaction-Aware Masking for Causal Video Prediction
by: Paidi, Santosh Kumar
Published: (2026)
by: Paidi, Santosh Kumar
Published: (2026)
AdaptiveDrag: Semantic-Driven Dragging on Diffusion-Based Image Editing
by: Chen, DuoSheng, et al.
Published: (2024)
by: Chen, DuoSheng, et al.
Published: (2024)
Similar Items
-
DragAnything: Motion Control for Anything using Entity Representation
by: Wu, Weijia, et al.
Published: (2024) -
3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation
by: Fu, Xiao, et al.
Published: (2024) -
Leveraging Entity Information for Cross-Modality Correlation Learning: The Entity-Guided Multimodal Summarization
by: Zhang, Yanghai, et al.
Published: (2024) -
EntityBench: Towards Entity-Consistent Long-Range Multi-Shot Video Generation
by: He, Ruozhen, et al.
Published: (2026) -
DragVideo: Interactive Drag-style Video Editing
by: Deng, Yufan, et al.
Published: (2023)