:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Qu, Yadong, Fang, Shancheng, Wang, Yuxin, Wang, Xiaorui, Chen, Zhineng, Xie, Hongtao, Zhang, Yongdong
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2507.09910
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Seeing is Improving: Visual Feedback for Iterative Text Layout Refinement
by: Guo, Junrong, et al.
Published: (2026)

Boosting Semi-Supervised Scene Text Recognition via Viewing and Summarizing
by: Qu, Yadong, et al.
Published: (2024)

DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations
by: Qi, Tianhao, et al.
Published: (2024)

Mask$^2$DiT: Dual Mask-based Diffusion Transformer for Multi-Scene Long Video Generation
by: Qi, Tianhao, et al.
Published: (2025)

CreatiParser: Generative Image Parsing of Raster Graphic Designs into Editable Layers
by: Chen, Weidong, et al.
Published: (2026)

How Control Information Influences Multilingual Text Image Generation and Editing?
by: Zhang, Boqiang, et al.
Published: (2024)

Self-Supervised Pre-training with Symmetric Superimposition Modeling for Scene Text Recognition
by: Gao, Zuan, et al.
Published: (2024)

SpaceVLLM: Endowing Multimodal Large Language Model with Spatio-Temporal Video Grounding Capability
by: Wang, Jiankang, et al.
Published: (2025)

Focus on the Whole Character: Discriminative Character Modeling for Scene Text Recognition
by: Zhou, Bangbang, et al.
Published: (2024)

Rethinking Layered Graphic Design Generation with a Top-Down Approach
by: Chen, Jingye, et al.
Published: (2025)

SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition
by: Du, Yongkun, et al.
Published: (2024)

Leveraging Text Localization for Scene Text Removal via Text-aware Masked Image Modeling
by: Wang, Zixiao, et al.
Published: (2024)

COLE: A Hierarchical Generation Framework for Multi-Layered and Editable Graphic Design
by: Jia, Peidong, et al.
Published: (2023)

LRANet++: Low-Rank Approximation Network for Accurate and Efficient Text Spotting
by: Su, Yuchen, et al.
Published: (2025)

Choose What You Need: Disentangled Representation Learning for Scene Text Recognition, Removal and Editing
by: Zhang, Boqiang, et al.
Published: (2024)

AlignZeg: Mitigating Objective Misalignment for Zero-shot Semantic Segmentation
by: Ge, Jiannan, et al.
Published: (2024)

DiffAM: Diffusion-based Adversarial Makeup Transfer for Facial Privacy Protection
by: Sun, Yuhao, et al.
Published: (2024)

Instruction-Guided Scene Text Recognition
by: Du, Yongkun, et al.
Published: (2024)

CreatiPoster: Towards Editable and Controllable Multi-Layer Graphic Design Generation
by: Zhang, Zhao, et al.
Published: (2025)

Sentiment-oriented Transformer-based Variational Autoencoder Network for Live Video Commenting
by: Fu, Fengyi, et al.
Published: (2024)

LayerD: Decomposing Raster Graphic Designs into Layers
by: Suzuki, Tomoyuki, et al.
Published: (2025)

Stream-R1: Reliability-Perplexity Aware Reward Distillation for Streaming Video Generation
by: Wu, Bin, et al.
Published: (2026)

Graphic Design with Large Multimodal Model
by: Cheng, Yutao, et al.
Published: (2024)

DSE-GAN: Dynamic Semantic Evolution Generative Adversarial Network for Text-to-Image Generation
by: Huang, Mengqi, et al.
Published: (2022)

GRIP: A Graph-Based Reasoning Instruction Producer
by: Wang, Jiankang, et al.
Published: (2024)

Deeply-Conditioned Image Compression via Self-Generated Priors
by: Zhao, Zhineng, et al.
Published: (2025)

From Elements to Design: A Layered Approach for Automatic Graphic Design Composition
by: Lin, Jiawei, et al.
Published: (2024)

Neural Assembler: Learning to Generate Fine-Grained Robotic Assembly Instructions from Multi-View Images
by: Yan, Hongyu, et al.
Published: (2024)

DesignProbe: A Graphic Design Benchmark for Multimodal Large Language Models
by: Lin, Jieru, et al.
Published: (2024)

Multimodal Instruction Tuning with Hybrid State Space Models
by: Zhou, Jianing, et al.
Published: (2024)

InstructScene: Instruction-Driven 3D Indoor Scene Synthesis with Semantic Graph Prior
by: Lin, Chenguo, et al.
Published: (2024)

LaDe: Unified Multi-Layered Graphic Media Generation and Decomposition
by: Lungu-Stan, Vlad-Constantin, et al.
Published: (2026)

MetaMorph: Multimodal Understanding and Generation via Instruction Tuning
by: Tong, Shengbang, et al.
Published: (2024)

T-SVG: Text-Driven Stereoscopic Video Generation
by: Jin, Qiao, et al.
Published: (2024)

Instruction Tuning-free Visual Token Complement for Multimodal LLMs
by: Wang, Dongsheng, et al.
Published: (2024)

LICA: Layered Image Composition Annotations for Graphic Design Research
by: Hirsch, Elad, et al.
Published: (2026)

DreamOmni2: Multimodal Instruction-based Editing and Generation
by: Xia, Bin, et al.
Published: (2025)

TextSSR: Diffusion-based Data Synthesis for Scene Text Recognition
by: Ye, Xingsong, et al.
Published: (2024)

AdvQDet: Detecting Query-Based Adversarial Attacks with Adversarial Contrastive Prompt Tuning
by: Wang, Xin, et al.
Published: (2024)

Biomechanics-Guided Residual Approach to Generalizable Human Motion Generation and Estimation
by: Kang, Zixi, et al.
Published: (2025)