:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Furuta, Ryosuke
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2603.27790
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Learning Gaussian Data Augmentation in Feature Space for One-shot Object Detection in Manga
by: Taniguchi, Takara, et al.
Published: (2024)

Seeking Flat Minima with Mean Teacher on Semi- and Weakly-Supervised Domain Generalization for Object Detection
by: Furuta, Ryosuke, et al.
Published: (2023)

MangaFlow: An End-to-End Agentic Framework for Controllable Story to Manga Generation
by: Wang, Muyao, et al.
Published: (2026)

Manga109-v2026: Revisiting Manga109 Annotations for Modern Manga Understanding
by: Baek, Jeonghun, et al.
Published: (2026)

MangaVQA and MangaLMM: A Benchmark and Specialized Model for Multimodal Manga Understanding
by: Baek, Jeonghun, et al.
Published: (2025)

EgoInstruct: An Egocentric Video Dataset of Face-to-face Instructional Interactions with Multi-modal LLM Benchmarking
by: Sakai, Yuki, et al.
Published: (2025)

Learning Multiple Object States from Actions via Large Language Models
by: Tateno, Masatoshi, et al.
Published: (2024)

MangaUB: A Manga Understanding Benchmark for Large Multimodal Models
by: Ikuta, Hikaru, et al.
Published: (2024)

Pre-Training for 3D Hand Pose Estimation with Contrastive Learning on Large-Scale Hand Images in the Wild
by: Lin, Nie, et al.
Published: (2024)

Region-Wise Correspondence Prediction between Manga Line Art Images
by: Li, Yingxuan, et al.
Published: (2025)

Diffusion-Based Conditional Image Editing through Optimized Inference with Guidance
by: Lee, Hyunsoo, et al.
Published: (2024)

ActionVOS: Actions as Prompts for Video Object Segmentation
by: Ouyang, Liangyang, et al.
Published: (2024)

AssemblyHands-X: Modeling 3D Hand-Body Coordination for Understanding Bimanual Human Activities
by: Banno, Tatsuro, et al.
Published: (2025)

The Manga Whisperer: Automatically Generating Transcriptions for Comics
by: Sachdeva, Ragav, et al.
Published: (2024)

Manga Generation via Layout-controllable Diffusion
by: Chen, Siyu, et al.
Published: (2024)

Number it: Temporal Grounding Videos like Flipping Manga
by: Wu, Yongliang, et al.
Published: (2024)

Leadership Assessment in Pediatric Intensive Care Unit Team Training
by: Ouyang, Liangyang, et al.
Published: (2025)

Affordance-Guided Diffusion Prior for 3D Hand Reconstruction
by: Suzuki, Naru, et al.
Published: (2025)

Multi-speaker Attention Alignment for Multimodal Social Interaction
by: Ouyang, Liangyang, et al.
Published: (2025)

Inference-Time Text-to-Video Alignment with Diffusion Latent Beam Search
by: Oshima, Yuta, et al.
Published: (2025)

MangaNinja: Line Art Colorization with Precise Reference Following
by: Liu, Zhiheng, et al.
Published: (2025)

Doubly Abductive Counterfactual Inference for Text-based Image Editing
by: Song, Xue, et al.
Published: (2024)

FlowBypass: Rectified Flow Trajectory Bypass for Training-Free Image Editing
by: Han, Menglin, et al.
Published: (2026)

Tails Tell Tales: Chapter-Wide Manga Transcriptions with Character Names
by: Sachdeva, Ragav, et al.
Published: (2024)

FineBio: A Fine-Grained Video Dataset of Biological Experiments with Hierarchical Annotation
by: Yagi, Takuma, et al.
Published: (2024)

Re:Verse -- Can Your VLM Read a Manga?
by: Baranwal, Aaditya, et al.
Published: (2025)

InstaFace: Identity-Preserving Facial Editing with Single Image Inference
by: Khan, MD Wahiduzzaman, et al.
Published: (2025)

How Panel Layouts Define Manga: Insights from Visual Ablation Experiments
by: Feng, Siyuan, et al.
Published: (2024)

Exo2EgoDVC: Dense Video Captioning of Egocentric Procedural Activities Using Web Instructional Videos
by: Ohkawa, Takehiko, et al.
Published: (2023)

Accelerating Text-to-Image Editing via Cache-Enabled Sparse Diffusion Inference
by: Yu, Zihao, et al.
Published: (2023)

Follow-Your-Shape: Shape-Aware Image Editing via Trajectory-Guided Region Control
by: Long, Zeqian, et al.
Published: (2025)

Manga109Dialog: A Large-scale Dialogue Dataset for Comics Speaker Detection
by: Li, Yingxuan, et al.
Published: (2023)

MangaDiT: Reference-Guided Line Art Colorization with Hierarchical Attention in Diffusion Transformers
by: Qiu, Qianru, et al.
Published: (2025)

DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
by: Wu, Jianzong, et al.
Published: (2024)

SiMHand: Mining Similar Hands for Large-Scale 3D Hand Pose Pre-training
by: Lin, Nie, et al.
Published: (2025)

Energy-Guided Optimization for Personalized Image Editing with Pretrained Text-to-Image Diffusion Models
by: Jiang, Rui, et al.
Published: (2025)

Training-Free Reward-Guided Image Editing via Trajectory Optimal Control
by: Chang, Jinho, et al.
Published: (2025)

Perturb-and-Revise: Flexible 3D Editing with Generative Trajectories
by: Hong, Susung, et al.
Published: (2024)

Region-Constrained Group Relative Policy Optimization for Flow-Based Image Editing
by: Ouyang, Zhuohan, et al.
Published: (2026)

Edit-GRPO: A Locality-Preserving Policy Optimization Framework for Image Editing
by: Xu, Shaodong, et al.
Published: (2026)