Saved in:
| Main Authors: | Qu, Yadong, Fang, Shancheng, Wang, Yuxin, Wang, Xiaorui, Chen, Zhineng, Xie, Hongtao, Zhang, Yongdong |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.09910 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Seeing is Improving: Visual Feedback for Iterative Text Layout Refinement
by: Guo, Junrong, et al.
Published: (2026)
by: Guo, Junrong, et al.
Published: (2026)
Boosting Semi-Supervised Scene Text Recognition via Viewing and Summarizing
by: Qu, Yadong, et al.
Published: (2024)
by: Qu, Yadong, et al.
Published: (2024)
DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations
by: Qi, Tianhao, et al.
Published: (2024)
by: Qi, Tianhao, et al.
Published: (2024)
Mask$^2$DiT: Dual Mask-based Diffusion Transformer for Multi-Scene Long Video Generation
by: Qi, Tianhao, et al.
Published: (2025)
by: Qi, Tianhao, et al.
Published: (2025)
CreatiParser: Generative Image Parsing of Raster Graphic Designs into Editable Layers
by: Chen, Weidong, et al.
Published: (2026)
by: Chen, Weidong, et al.
Published: (2026)
How Control Information Influences Multilingual Text Image Generation and Editing?
by: Zhang, Boqiang, et al.
Published: (2024)
by: Zhang, Boqiang, et al.
Published: (2024)
Self-Supervised Pre-training with Symmetric Superimposition Modeling for Scene Text Recognition
by: Gao, Zuan, et al.
Published: (2024)
by: Gao, Zuan, et al.
Published: (2024)
SpaceVLLM: Endowing Multimodal Large Language Model with Spatio-Temporal Video Grounding Capability
by: Wang, Jiankang, et al.
Published: (2025)
by: Wang, Jiankang, et al.
Published: (2025)
Focus on the Whole Character: Discriminative Character Modeling for Scene Text Recognition
by: Zhou, Bangbang, et al.
Published: (2024)
by: Zhou, Bangbang, et al.
Published: (2024)
Rethinking Layered Graphic Design Generation with a Top-Down Approach
by: Chen, Jingye, et al.
Published: (2025)
by: Chen, Jingye, et al.
Published: (2025)
SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition
by: Du, Yongkun, et al.
Published: (2024)
by: Du, Yongkun, et al.
Published: (2024)
Leveraging Text Localization for Scene Text Removal via Text-aware Masked Image Modeling
by: Wang, Zixiao, et al.
Published: (2024)
by: Wang, Zixiao, et al.
Published: (2024)
COLE: A Hierarchical Generation Framework for Multi-Layered and Editable Graphic Design
by: Jia, Peidong, et al.
Published: (2023)
by: Jia, Peidong, et al.
Published: (2023)
LRANet++: Low-Rank Approximation Network for Accurate and Efficient Text Spotting
by: Su, Yuchen, et al.
Published: (2025)
by: Su, Yuchen, et al.
Published: (2025)
Choose What You Need: Disentangled Representation Learning for Scene Text Recognition, Removal and Editing
by: Zhang, Boqiang, et al.
Published: (2024)
by: Zhang, Boqiang, et al.
Published: (2024)
AlignZeg: Mitigating Objective Misalignment for Zero-shot Semantic Segmentation
by: Ge, Jiannan, et al.
Published: (2024)
by: Ge, Jiannan, et al.
Published: (2024)
DiffAM: Diffusion-based Adversarial Makeup Transfer for Facial Privacy Protection
by: Sun, Yuhao, et al.
Published: (2024)
by: Sun, Yuhao, et al.
Published: (2024)
Instruction-Guided Scene Text Recognition
by: Du, Yongkun, et al.
Published: (2024)
by: Du, Yongkun, et al.
Published: (2024)
CreatiPoster: Towards Editable and Controllable Multi-Layer Graphic Design Generation
by: Zhang, Zhao, et al.
Published: (2025)
by: Zhang, Zhao, et al.
Published: (2025)
Sentiment-oriented Transformer-based Variational Autoencoder Network for Live Video Commenting
by: Fu, Fengyi, et al.
Published: (2024)
by: Fu, Fengyi, et al.
Published: (2024)
LayerD: Decomposing Raster Graphic Designs into Layers
by: Suzuki, Tomoyuki, et al.
Published: (2025)
by: Suzuki, Tomoyuki, et al.
Published: (2025)
Stream-R1: Reliability-Perplexity Aware Reward Distillation for Streaming Video Generation
by: Wu, Bin, et al.
Published: (2026)
by: Wu, Bin, et al.
Published: (2026)
Graphic Design with Large Multimodal Model
by: Cheng, Yutao, et al.
Published: (2024)
by: Cheng, Yutao, et al.
Published: (2024)
DSE-GAN: Dynamic Semantic Evolution Generative Adversarial Network for Text-to-Image Generation
by: Huang, Mengqi, et al.
Published: (2022)
by: Huang, Mengqi, et al.
Published: (2022)
GRIP: A Graph-Based Reasoning Instruction Producer
by: Wang, Jiankang, et al.
Published: (2024)
by: Wang, Jiankang, et al.
Published: (2024)
Deeply-Conditioned Image Compression via Self-Generated Priors
by: Zhao, Zhineng, et al.
Published: (2025)
by: Zhao, Zhineng, et al.
Published: (2025)
From Elements to Design: A Layered Approach for Automatic Graphic Design Composition
by: Lin, Jiawei, et al.
Published: (2024)
by: Lin, Jiawei, et al.
Published: (2024)
Neural Assembler: Learning to Generate Fine-Grained Robotic Assembly Instructions from Multi-View Images
by: Yan, Hongyu, et al.
Published: (2024)
by: Yan, Hongyu, et al.
Published: (2024)
DesignProbe: A Graphic Design Benchmark for Multimodal Large Language Models
by: Lin, Jieru, et al.
Published: (2024)
by: Lin, Jieru, et al.
Published: (2024)
Multimodal Instruction Tuning with Hybrid State Space Models
by: Zhou, Jianing, et al.
Published: (2024)
by: Zhou, Jianing, et al.
Published: (2024)
InstructScene: Instruction-Driven 3D Indoor Scene Synthesis with Semantic Graph Prior
by: Lin, Chenguo, et al.
Published: (2024)
by: Lin, Chenguo, et al.
Published: (2024)
LaDe: Unified Multi-Layered Graphic Media Generation and Decomposition
by: Lungu-Stan, Vlad-Constantin, et al.
Published: (2026)
by: Lungu-Stan, Vlad-Constantin, et al.
Published: (2026)
MetaMorph: Multimodal Understanding and Generation via Instruction Tuning
by: Tong, Shengbang, et al.
Published: (2024)
by: Tong, Shengbang, et al.
Published: (2024)
T-SVG: Text-Driven Stereoscopic Video Generation
by: Jin, Qiao, et al.
Published: (2024)
by: Jin, Qiao, et al.
Published: (2024)
Instruction Tuning-free Visual Token Complement for Multimodal LLMs
by: Wang, Dongsheng, et al.
Published: (2024)
by: Wang, Dongsheng, et al.
Published: (2024)
LICA: Layered Image Composition Annotations for Graphic Design Research
by: Hirsch, Elad, et al.
Published: (2026)
by: Hirsch, Elad, et al.
Published: (2026)
DreamOmni2: Multimodal Instruction-based Editing and Generation
by: Xia, Bin, et al.
Published: (2025)
by: Xia, Bin, et al.
Published: (2025)
TextSSR: Diffusion-based Data Synthesis for Scene Text Recognition
by: Ye, Xingsong, et al.
Published: (2024)
by: Ye, Xingsong, et al.
Published: (2024)
AdvQDet: Detecting Query-Based Adversarial Attacks with Adversarial Contrastive Prompt Tuning
by: Wang, Xin, et al.
Published: (2024)
by: Wang, Xin, et al.
Published: (2024)
Biomechanics-Guided Residual Approach to Generalizable Human Motion Generation and Estimation
by: Kang, Zixi, et al.
Published: (2025)
by: Kang, Zixi, et al.
Published: (2025)
Similar Items
-
Seeing is Improving: Visual Feedback for Iterative Text Layout Refinement
by: Guo, Junrong, et al.
Published: (2026) -
Boosting Semi-Supervised Scene Text Recognition via Viewing and Summarizing
by: Qu, Yadong, et al.
Published: (2024) -
DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations
by: Qi, Tianhao, et al.
Published: (2024) -
Mask$^2$DiT: Dual Mask-based Diffusion Transformer for Multi-Scene Long Video Generation
by: Qi, Tianhao, et al.
Published: (2025) -
CreatiParser: Generative Image Parsing of Raster Graphic Designs into Editable Layers
by: Chen, Weidong, et al.
Published: (2026)