Saved in:
| Main Authors: | Gupta, Advait, Velaga, NandaKiran, Nguyen, Dang, Zhou, Tianyi |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.10613 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
FaSTA$^*$: Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing
by: Gupta, Advait, et al.
Published: (2025)
by: Gupta, Advait, et al.
Published: (2025)
Multi-turn Consistent Image Editing
by: Zhou, Zijun, et al.
Published: (2025)
by: Zhou, Zijun, et al.
Published: (2025)
CAMEO: A Conditional and Quality-Aware Multi-Agent Image Editing Orchestrator
by: Pu, Yuhan, et al.
Published: (2026)
by: Pu, Yuhan, et al.
Published: (2026)
Predicting the Reliability of an Image Classifier under Image Distortion
by: Nguyen, Dang, et al.
Published: (2024)
by: Nguyen, Dang, et al.
Published: (2024)
ChartAB: A Benchmark for Chart Grounding & Dense Alignment
by: Bansal, Aniruddh, et al.
Published: (2025)
by: Bansal, Aniruddh, et al.
Published: (2025)
Few-shot Algorithm Assurance
by: Nguyen, Dang, et al.
Published: (2024)
by: Nguyen, Dang, et al.
Published: (2024)
Talk2Image: A Multi-Agent System for Multi-Turn Image Generation and Editing
by: Ma, Shichao, et al.
Published: (2025)
by: Ma, Shichao, et al.
Published: (2025)
AdapEdit: Spatio-Temporal Guided Adaptive Editing Algorithm for Text-Based Continuity-Sensitive Image Editing
by: Ma, Zhiyuan, et al.
Published: (2023)
by: Ma, Zhiyuan, et al.
Published: (2023)
ViSTA: Visual Storytelling using Multi-modal Adapters for Text-to-Image Diffusion Models
by: Dong, Sibo, et al.
Published: (2025)
by: Dong, Sibo, et al.
Published: (2025)
FreeFlux: Understanding and Exploiting Layer-Specific Roles in RoPE-Based MMDiT for Versatile Image Editing
by: Wei, Tianyi, et al.
Published: (2025)
by: Wei, Tianyi, et al.
Published: (2025)
Deep But Reliable: Advancing Multi-turn Reasoning for Thinking with Images
by: Yang, Wenhao, et al.
Published: (2025)
by: Yang, Wenhao, et al.
Published: (2025)
IDSelect: A RL-Based Cost-Aware Selection Agent for Video-based Multi-Modal Person Recognition
by: Ji, Yuyang, et al.
Published: (2026)
by: Ji, Yuyang, et al.
Published: (2026)
CREA: A Collaborative Multi-Agent Framework for Creative Image Editing and Generation
by: Venkatesh, Kavana, et al.
Published: (2025)
by: Venkatesh, Kavana, et al.
Published: (2025)
CCA: Collaborative Competitive Agents for Image Editing
by: Hang, Tiankai, et al.
Published: (2024)
by: Hang, Tiankai, et al.
Published: (2024)
MedSAM-Agent: Empowering Interactive Medical Image Segmentation with Multi-turn Agentic Reinforcement Learning
by: Liu, Shengyuan, et al.
Published: (2026)
by: Liu, Shengyuan, et al.
Published: (2026)
MMAP: A Multi-Magnification and Prototype-Aware Architecture for Predicting Spatial Gene Expression
by: Nguyen, Hai Dang, et al.
Published: (2025)
by: Nguyen, Hai Dang, et al.
Published: (2025)
Image-level Regression for Uncertainty-aware Retinal Image Segmentation
by: Dang, Trung, et al.
Published: (2024)
by: Dang, Trung, et al.
Published: (2024)
AutoStudio: Crafting Consistent Subjects in Multi-turn Interactive Image Generation
by: Cheng, Junhao, et al.
Published: (2024)
by: Cheng, Junhao, et al.
Published: (2024)
TheaterGen: Character Management with LLM for Consistent Multi-turn Image Generation
by: Cheng, Junhao, et al.
Published: (2024)
by: Cheng, Junhao, et al.
Published: (2024)
DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation
by: Huang, Minbin, et al.
Published: (2024)
by: Huang, Minbin, et al.
Published: (2024)
Beyond Textual CoT: Interleaved Text-Image Chains with Deep Confidence Reasoning for Image Editing
by: Zou, Zhentao, et al.
Published: (2025)
by: Zou, Zhentao, et al.
Published: (2025)
MSRAMIE: Multimodal Structured Reasoning Agent for Multi-instruction Image Editing
by: Qiu, Zhaoyuan, et al.
Published: (2026)
by: Qiu, Zhaoyuan, et al.
Published: (2026)
Geometric Image Editing via Effects-Sensitive In-Context Inpainting with Diffusion Transformers
by: Zhang, Shuo, et al.
Published: (2026)
by: Zhang, Shuo, et al.
Published: (2026)
Diverse Image Priors for Black-box Data-free Knowledge Distillation
by: Vo, Tri-Nhan, et al.
Published: (2026)
by: Vo, Tri-Nhan, et al.
Published: (2026)
MIRA: Multimodal Iterative Reasoning Agent for Image Editing
by: Zeng, Ziyun, et al.
Published: (2025)
by: Zeng, Ziyun, et al.
Published: (2025)
SwiftEdit: Lightning Fast Text-Guided Image Editing via One-Step Diffusion
by: Nguyen, Trong-Tung, et al.
Published: (2024)
by: Nguyen, Trong-Tung, et al.
Published: (2024)
FlexEdit: Flexible and Controllable Diffusion-based Object-centric Image Editing
by: Nguyen, Trong-Tung, et al.
Published: (2024)
by: Nguyen, Trong-Tung, et al.
Published: (2024)
Training-Free Multi-Concept Image Editing
by: Foteinopoulou, Niki, et al.
Published: (2026)
by: Foteinopoulou, Niki, et al.
Published: (2026)
MuLan: Multimodal-LLM Agent for Progressive and Interactive Multi-Object Diffusion
by: Li, Sen, et al.
Published: (2024)
by: Li, Sen, et al.
Published: (2024)
CRAG-MM: Multi-modal Multi-turn Comprehensive RAG Benchmark
by: Wang, Jiaqi, et al.
Published: (2025)
by: Wang, Jiaqi, et al.
Published: (2025)
Rethinking Scribble-Guided Image Editing: Generalization, Instruction Adherence, and Multi-Tasking
by: Xu, Mingyi, et al.
Published: (2026)
by: Xu, Mingyi, et al.
Published: (2026)
TexTAR : Textual Attribute Recognition in Multi-domain and Multi-lingual Document Images
by: Kumar, Rohan, et al.
Published: (2025)
by: Kumar, Rohan, et al.
Published: (2025)
ImageEdit-R1: Boosting Multi-Agent Image Editing via Reinforcement Learning
by: Zhao, Yiran, et al.
Published: (2026)
by: Zhao, Yiran, et al.
Published: (2026)
Edit One for All: Interactive Batch Image Editing
by: Nguyen, Thao, et al.
Published: (2024)
by: Nguyen, Thao, et al.
Published: (2024)
CoCoEdit: Content-Consistent Image Editing via Region Regularized Reinforcement Learning
by: Wu, Yuhui, et al.
Published: (2026)
by: Wu, Yuhui, et al.
Published: (2026)
MedXplain-VQA: Multi-Component Explainable Medical Visual Question Answering
by: Nguyen, Hai-Dang, et al.
Published: (2025)
by: Nguyen, Hai-Dang, et al.
Published: (2025)
The Forensic Cost of Watermark Removal: From Dedicated Attacks to Image Editing
by: Evennou, Gautier, et al.
Published: (2026)
by: Evennou, Gautier, et al.
Published: (2026)
STA-Unet: Rethink the semantic redundant for Medical Imaging Segmentation
by: Vasa, Vamsi Krishna, et al.
Published: (2024)
by: Vasa, Vamsi Krishna, et al.
Published: (2024)
Multi-Reward as Condition for Instruction-based Image Editing
by: Gu, Xin, et al.
Published: (2024)
by: Gu, Xin, et al.
Published: (2024)
ParallelEdits: Efficient Multi-object Image Editing
by: Huang, Mingzhen, et al.
Published: (2024)
by: Huang, Mingzhen, et al.
Published: (2024)
Similar Items
-
FaSTA$^*$: Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing
by: Gupta, Advait, et al.
Published: (2025) -
Multi-turn Consistent Image Editing
by: Zhou, Zijun, et al.
Published: (2025) -
CAMEO: A Conditional and Quality-Aware Multi-Agent Image Editing Orchestrator
by: Pu, Yuhan, et al.
Published: (2026) -
Predicting the Reliability of an Image Classifier under Image Distortion
by: Nguyen, Dang, et al.
Published: (2024) -
ChartAB: A Benchmark for Chart Grounding & Dense Alignment
by: Bansal, Aniruddh, et al.
Published: (2025)