:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Gu, Xin, Li, Ming, Zhang, Libo, Chen, Fan, Wen, Longyin, Luo, Tiejian, Zhu, Sijie
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2411.04713
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Edit3K: Universal Representation Learning for Video Editing Components
by: Gu, Xin, et al.
Published: (2024)

SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing
by: Li, Ming, et al.
Published: (2025)

Structured Context Learning for Generic Event Boundary Detection
by: Gu, Xin, et al.
Published: (2025)

Context-Guided Spatio-Temporal Video Grounding
by: Gu, Xin, et al.
Published: (2024)

Thinking With Bounding Boxes: Enhancing Spatio-Temporal Video Grounding via Reinforcement Fine-Tuning
by: Gu, Xin, et al.
Published: (2025)

Robust Domain Adaptive Object Detection with Unified Multi-Granularity Alignment
by: Zhang, Libo, et al.
Published: (2023)

D-Attn: Decomposed Attention for Large Vision-and-Language Models
by: Kuo, Chia-Wen, et al.
Published: (2025)

VEBench:Benchmarking Large Multimodal Models for Real-World Video Editing
by: Deng, Andong, et al.
Published: (2026)

Accurate and Fast Compressed Video Captioning
by: Shen, Yaojie, et al.
Published: (2023)

Beyond Raw Videos: Understanding Edited Videos with Large Multimodal Model
by: Xu, Lu, et al.
Published: (2024)

Knowing Your Target: Target-Aware Transformer Makes Better Spatio-Temporal Video Grounding
by: Gu, Xin, et al.
Published: (2025)

Where do Large Vision-Language Models Look at when Answering Questions?
by: Xing, Xiaoying, et al.
Published: (2025)

SEED-Data-Edit Technical Report: A Hybrid Dataset for Instructional Image Editing
by: Ge, Yuying, et al.
Published: (2024)

Vidi: Large Multimodal Models for Video Understanding and Editing
by: Vidi Team, et al.
Published: (2025)

CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts
by: Li, Jiachen, et al.
Published: (2024)

MultiEdit: Advancing Instruction-based Image Editing on Diverse and Challenging Tasks
by: Li, Mingsong, et al.
Published: (2025)

Edit-Compass & EditReward-Compass: A Unified Benchmark for Image Editing and Reward Modeling
by: Bai, Xuehai, et al.
Published: (2026)

HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based Image Editing
by: Bai, Jinbin, et al.
Published: (2024)

InstructBrush: Learning Attention-based Instruction Optimization for Image Editing
by: Zhao, Ruoyu, et al.
Published: (2024)

Disentangling Instruction Influence in Diffusion Transformers for Parallel Multi-Instruction-Guided Image Editing
by: Liu, Hui, et al.
Published: (2025)

EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing
by: Wu, Keming, et al.
Published: (2025)

Multi-Modal LLM based Image Captioning in ICT: Bridging the Gap Between General and Industry Domain
by: Chao, Lianying, et al.
Published: (2026)

LIVE: Leveraging Image Manipulation Priors for Instruction-based Video Editing
by: Wang, Weicheng, et al.
Published: (2026)

Rethinking Scribble-Guided Image Editing: Generalization, Instruction Adherence, and Multi-Tasking
by: Xu, Mingyi, et al.
Published: (2026)

UltraEdit: Instruction-based Fine-Grained Image Editing at Scale
by: Zhao, Haozhe, et al.
Published: (2024)

CAMEO: A Conditional and Quality-Aware Multi-Agent Image Editing Orchestrator
by: Pu, Yuhan, et al.
Published: (2026)

VIVA: VLM-Guided Instruction-Based Video Editing with Reward Optimization
by: Cong, Xiaoyan, et al.
Published: (2025)

MCIE: Multimodal LLM-Driven Complex Instruction Image Editing with Spatial Guidance
by: Bai, Xuehai, et al.
Published: (2026)

Instruction Guided Multi Object Image Editing with Quantity and Layout Consistency
by: Tan, Jiaqi, et al.
Published: (2025)

EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling
by: Luo, Xin, et al.
Published: (2025)

SpatialReward: Bridging the Perception Gap in Online RL for Image Editing via Explicit Spatial Reasoning
by: Long, Yancheng, et al.
Published: (2026)

DreamVE: Unified Instruction-based Image and Video Editing
by: Xia, Bin, et al.
Published: (2025)

PhotoFramer: Multi-modal Image Composition Instruction
by: You, Zhiyuan, et al.
Published: (2025)

CCA: Collaborative Competitive Agents for Image Editing
by: Hang, Tiankai, et al.
Published: (2024)

3D-aware Image Generation and Editing with Multi-modal Conditions
by: Li, Bo, et al.
Published: (2024)

SmartFreeEdit: Mask-Free Spatial-Aware Image Editing with Complex Instruction Understanding
by: Sun, Qianqian, et al.
Published: (2025)

CLIPDrag: Combining Text-based and Drag-based Instructions for Image Editing
by: Jiang, Ziqi, et al.
Published: (2024)

CompBench: Benchmarking Complex Instruction-guided Image Editing
by: Jia, Bohan, et al.
Published: (2025)

Reasoning to Edit: Hypothetical Instruction-Based Image Editing with Visual Reasoning
by: He, Qingdong, et al.
Published: (2025)

UniRef-Image-Edit: Towards Scalable and Consistent Multi-Reference Image Editing
by: Wei, Hongyang, et al.
Published: (2026)