Saved in:
| Main Authors: | Gu, Xin, Li, Ming, Zhang, Libo, Chen, Fan, Wen, Longyin, Luo, Tiejian, Zhu, Sijie |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2411.04713 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Edit3K: Universal Representation Learning for Video Editing Components
by: Gu, Xin, et al.
Published: (2024)
by: Gu, Xin, et al.
Published: (2024)
SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing
by: Li, Ming, et al.
Published: (2025)
by: Li, Ming, et al.
Published: (2025)
Structured Context Learning for Generic Event Boundary Detection
by: Gu, Xin, et al.
Published: (2025)
by: Gu, Xin, et al.
Published: (2025)
Context-Guided Spatio-Temporal Video Grounding
by: Gu, Xin, et al.
Published: (2024)
by: Gu, Xin, et al.
Published: (2024)
Thinking With Bounding Boxes: Enhancing Spatio-Temporal Video Grounding via Reinforcement Fine-Tuning
by: Gu, Xin, et al.
Published: (2025)
by: Gu, Xin, et al.
Published: (2025)
Robust Domain Adaptive Object Detection with Unified Multi-Granularity Alignment
by: Zhang, Libo, et al.
Published: (2023)
by: Zhang, Libo, et al.
Published: (2023)
D-Attn: Decomposed Attention for Large Vision-and-Language Models
by: Kuo, Chia-Wen, et al.
Published: (2025)
by: Kuo, Chia-Wen, et al.
Published: (2025)
VEBench:Benchmarking Large Multimodal Models for Real-World Video Editing
by: Deng, Andong, et al.
Published: (2026)
by: Deng, Andong, et al.
Published: (2026)
Accurate and Fast Compressed Video Captioning
by: Shen, Yaojie, et al.
Published: (2023)
by: Shen, Yaojie, et al.
Published: (2023)
Beyond Raw Videos: Understanding Edited Videos with Large Multimodal Model
by: Xu, Lu, et al.
Published: (2024)
by: Xu, Lu, et al.
Published: (2024)
Knowing Your Target: Target-Aware Transformer Makes Better Spatio-Temporal Video Grounding
by: Gu, Xin, et al.
Published: (2025)
by: Gu, Xin, et al.
Published: (2025)
Where do Large Vision-Language Models Look at when Answering Questions?
by: Xing, Xiaoying, et al.
Published: (2025)
by: Xing, Xiaoying, et al.
Published: (2025)
SEED-Data-Edit Technical Report: A Hybrid Dataset for Instructional Image Editing
by: Ge, Yuying, et al.
Published: (2024)
by: Ge, Yuying, et al.
Published: (2024)
Vidi: Large Multimodal Models for Video Understanding and Editing
by: Vidi Team, et al.
Published: (2025)
by: Vidi Team, et al.
Published: (2025)
CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts
by: Li, Jiachen, et al.
Published: (2024)
by: Li, Jiachen, et al.
Published: (2024)
MultiEdit: Advancing Instruction-based Image Editing on Diverse and Challenging Tasks
by: Li, Mingsong, et al.
Published: (2025)
by: Li, Mingsong, et al.
Published: (2025)
Edit-Compass & EditReward-Compass: A Unified Benchmark for Image Editing and Reward Modeling
by: Bai, Xuehai, et al.
Published: (2026)
by: Bai, Xuehai, et al.
Published: (2026)
HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based Image Editing
by: Bai, Jinbin, et al.
Published: (2024)
by: Bai, Jinbin, et al.
Published: (2024)
InstructBrush: Learning Attention-based Instruction Optimization for Image Editing
by: Zhao, Ruoyu, et al.
Published: (2024)
by: Zhao, Ruoyu, et al.
Published: (2024)
Disentangling Instruction Influence in Diffusion Transformers for Parallel Multi-Instruction-Guided Image Editing
by: Liu, Hui, et al.
Published: (2025)
by: Liu, Hui, et al.
Published: (2025)
EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing
by: Wu, Keming, et al.
Published: (2025)
by: Wu, Keming, et al.
Published: (2025)
Multi-Modal LLM based Image Captioning in ICT: Bridging the Gap Between General and Industry Domain
by: Chao, Lianying, et al.
Published: (2026)
by: Chao, Lianying, et al.
Published: (2026)
LIVE: Leveraging Image Manipulation Priors for Instruction-based Video Editing
by: Wang, Weicheng, et al.
Published: (2026)
by: Wang, Weicheng, et al.
Published: (2026)
Rethinking Scribble-Guided Image Editing: Generalization, Instruction Adherence, and Multi-Tasking
by: Xu, Mingyi, et al.
Published: (2026)
by: Xu, Mingyi, et al.
Published: (2026)
UltraEdit: Instruction-based Fine-Grained Image Editing at Scale
by: Zhao, Haozhe, et al.
Published: (2024)
by: Zhao, Haozhe, et al.
Published: (2024)
CAMEO: A Conditional and Quality-Aware Multi-Agent Image Editing Orchestrator
by: Pu, Yuhan, et al.
Published: (2026)
by: Pu, Yuhan, et al.
Published: (2026)
VIVA: VLM-Guided Instruction-Based Video Editing with Reward Optimization
by: Cong, Xiaoyan, et al.
Published: (2025)
by: Cong, Xiaoyan, et al.
Published: (2025)
MCIE: Multimodal LLM-Driven Complex Instruction Image Editing with Spatial Guidance
by: Bai, Xuehai, et al.
Published: (2026)
by: Bai, Xuehai, et al.
Published: (2026)
Instruction Guided Multi Object Image Editing with Quantity and Layout Consistency
by: Tan, Jiaqi, et al.
Published: (2025)
by: Tan, Jiaqi, et al.
Published: (2025)
EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling
by: Luo, Xin, et al.
Published: (2025)
by: Luo, Xin, et al.
Published: (2025)
SpatialReward: Bridging the Perception Gap in Online RL for Image Editing via Explicit Spatial Reasoning
by: Long, Yancheng, et al.
Published: (2026)
by: Long, Yancheng, et al.
Published: (2026)
DreamVE: Unified Instruction-based Image and Video Editing
by: Xia, Bin, et al.
Published: (2025)
by: Xia, Bin, et al.
Published: (2025)
PhotoFramer: Multi-modal Image Composition Instruction
by: You, Zhiyuan, et al.
Published: (2025)
by: You, Zhiyuan, et al.
Published: (2025)
CCA: Collaborative Competitive Agents for Image Editing
by: Hang, Tiankai, et al.
Published: (2024)
by: Hang, Tiankai, et al.
Published: (2024)
3D-aware Image Generation and Editing with Multi-modal Conditions
by: Li, Bo, et al.
Published: (2024)
by: Li, Bo, et al.
Published: (2024)
SmartFreeEdit: Mask-Free Spatial-Aware Image Editing with Complex Instruction Understanding
by: Sun, Qianqian, et al.
Published: (2025)
by: Sun, Qianqian, et al.
Published: (2025)
CLIPDrag: Combining Text-based and Drag-based Instructions for Image Editing
by: Jiang, Ziqi, et al.
Published: (2024)
by: Jiang, Ziqi, et al.
Published: (2024)
CompBench: Benchmarking Complex Instruction-guided Image Editing
by: Jia, Bohan, et al.
Published: (2025)
by: Jia, Bohan, et al.
Published: (2025)
Reasoning to Edit: Hypothetical Instruction-Based Image Editing with Visual Reasoning
by: He, Qingdong, et al.
Published: (2025)
by: He, Qingdong, et al.
Published: (2025)
UniRef-Image-Edit: Towards Scalable and Consistent Multi-Reference Image Editing
by: Wei, Hongyang, et al.
Published: (2026)
by: Wei, Hongyang, et al.
Published: (2026)
Similar Items
-
Edit3K: Universal Representation Learning for Video Editing Components
by: Gu, Xin, et al.
Published: (2024) -
SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing
by: Li, Ming, et al.
Published: (2025) -
Structured Context Learning for Generic Event Boundary Detection
by: Gu, Xin, et al.
Published: (2025) -
Context-Guided Spatio-Temporal Video Grounding
by: Gu, Xin, et al.
Published: (2024) -
Thinking With Bounding Boxes: Enhancing Spatio-Temporal Video Grounding via Reinforcement Fine-Tuning
by: Gu, Xin, et al.
Published: (2025)