Saved in:
| Main Authors: | Liu, Hongen, Sun, Di, Wang, Jiahao, Liu, Yi, Pan, Gang |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2405.19194 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
GlyphMastero: A Glyph Encoder for High-Fidelity Scene Text Editing
by: Wang, Tong, et al.
Published: (2025)
by: Wang, Tong, et al.
Published: (2025)
PP-FormulaNet: Bridging Accuracy and Efficiency in Advanced Formula Recognition
by: Liu, Hongen, et al.
Published: (2025)
by: Liu, Hongen, et al.
Published: (2025)
WeEdit: A Dataset, Benchmark and Glyph-Guided Framework for Text-centric Image Editing
by: Zhang, Hui, et al.
Published: (2026)
by: Zhang, Hui, et al.
Published: (2026)
CBDiff:Conditional Bernoulli Diffusion Models for Image Forgery Localization
by: Lei, Zhou, et al.
Published: (2025)
by: Lei, Zhou, et al.
Published: (2025)
FashionLOGO: Prompting Multimodal Large Language Models for Fashion Logo Embeddings
by: Wang, Zhen, et al.
Published: (2023)
by: Wang, Zhen, et al.
Published: (2023)
UniGlyph: Unified Segmentation-Conditioned Diffusion for Precise Visual Text Synthesis
by: Wang, Yuanrui, et al.
Published: (2025)
by: Wang, Yuanrui, et al.
Published: (2025)
LOGO: A Long-Form Video Dataset for Group Action Quality Assessment
by: Zhang, Shiyi, et al.
Published: (2024)
by: Zhang, Shiyi, et al.
Published: (2024)
GlyphDraw2: Automatic Generation of Complex Glyph Posters with Diffusion Models and Large Language Models
by: Ma, Jian, et al.
Published: (2024)
by: Ma, Jian, et al.
Published: (2024)
GlyphPrinter: Region-Grouped Direct Preference Optimization for Glyph-Accurate Visual Text Rendering
by: Shuai, Xincheng, et al.
Published: (2026)
by: Shuai, Xincheng, et al.
Published: (2026)
Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering
by: Liu, Zeyu, et al.
Published: (2024)
by: Liu, Zeyu, et al.
Published: (2024)
FreeText: Training-Free Text Rendering in Diffusion Transformers via Attention Localization and Spectral Glyph Injection
by: Zhang, Ruiqiang, et al.
Published: (2026)
by: Zhang, Ruiqiang, et al.
Published: (2026)
TextBlockV2: Towards Precise-Detection-Free Scene Text Spotting with Pre-trained Language Model
by: Lyu, Jiahao, et al.
Published: (2024)
by: Lyu, Jiahao, et al.
Published: (2024)
Glyph: Scaling Context Windows via Visual-Text Compression
by: Cheng, Jiale, et al.
Published: (2025)
by: Cheng, Jiale, et al.
Published: (2025)
FF-LOGO: Cross-Modality Point Cloud Registration with Feature Filtering and Local to Global Optimization
by: Ma, Nan, et al.
Published: (2023)
by: Ma, Nan, et al.
Published: (2023)
DiffInk: Glyph- and Style-Aware Latent Diffusion Transformer for Text to Online Handwriting Generation
by: Pan, Wei, et al.
Published: (2025)
by: Pan, Wei, et al.
Published: (2025)
AnyArtisticGlyph: Multilingual Controllable Artistic Glyph Generation
by: Lu, Xiongbo, et al.
Published: (2025)
by: Lu, Xiongbo, et al.
Published: (2025)
TextMaster: A Unified Framework for Realistic Text Editing via Glyph-Style Dual-Control
by: Yan, Zhenyu, et al.
Published: (2024)
by: Yan, Zhenyu, et al.
Published: (2024)
Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering
by: Liu, Zeyu, et al.
Published: (2024)
by: Liu, Zeyu, et al.
Published: (2024)
Empowering Backbone Models for Visual Text Generation with Input Granularity Control and Glyph-Aware Training
by: Li, Wenbo, et al.
Published: (2024)
by: Li, Wenbo, et al.
Published: (2024)
GLDesigner: Leveraging Multi-Modal LLMs as Designer for Enhanced Aesthetic Text Glyph Layouts
by: He, Junwen, et al.
Published: (2024)
by: He, Junwen, et al.
Published: (2024)
Training-Free Occluded Text Rendering via Glyph Priors and Attention-Guided Semantic Blending
by: Hou, Jingqi, et al.
Published: (2026)
by: Hou, Jingqi, et al.
Published: (2026)
GlyphPattern: An Abstract Pattern Recognition Benchmark for Vision-Language Models
by: Wu, Zixuan, et al.
Published: (2024)
by: Wu, Zixuan, et al.
Published: (2024)
Decoupling Layout from Glyph in Online Chinese Handwriting Generation
by: Ren, Min-Si, et al.
Published: (2024)
by: Ren, Min-Si, et al.
Published: (2024)
GlyphBanana: Advancing Precise Text Rendering Through Agentic Workflows
by: Yan, Zexuan, et al.
Published: (2026)
by: Yan, Zexuan, et al.
Published: (2026)
HDGlyph: A Hierarchical Disentangled Glyph-Based Framework for Long-Tail Text Rendering in Diffusion Models
by: Zhuang, Shuhan, et al.
Published: (2025)
by: Zhuang, Shuhan, et al.
Published: (2025)
GloTSFormer: Global Video Text Spotting Transformer
by: Wang, Han, et al.
Published: (2024)
by: Wang, Han, et al.
Published: (2024)
GoMatching++: Parameter- and Data-Efficient Arbitrary-Shaped Video Text Spotting and Benchmarking
by: He, Haibin, et al.
Published: (2025)
by: He, Haibin, et al.
Published: (2025)
CoST: Efficient Collaborative Perception From Unified Spatiotemporal Perspective
by: Tang, Zongheng, et al.
Published: (2025)
by: Tang, Zongheng, et al.
Published: (2025)
Restore Text First, Enhance Image Later: Two-Stage Scene Text Image Super-Resolution with Glyph Structure Guidance
by: Luo, Minxing, et al.
Published: (2025)
by: Luo, Minxing, et al.
Published: (2025)
Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval
by: Wang, Jiamian, et al.
Published: (2024)
by: Wang, Jiamian, et al.
Published: (2024)
Navigation Instruction Generation with BEV Perception and Large Language Models
by: Fan, Sheng, et al.
Published: (2024)
by: Fan, Sheng, et al.
Published: (2024)
PP-OCRv5: A Specialized 5M-Parameter Model Rivaling Billion-Parameter Vision-Language Models on OCR Tasks
by: Cui, Cheng, et al.
Published: (2026)
by: Cui, Cheng, et al.
Published: (2026)
GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration
by: Huang, Kaiyi, et al.
Published: (2024)
by: Huang, Kaiyi, et al.
Published: (2024)
GoMatching: A Simple Baseline for Video Text Spotting via Long and Short Term Matching
by: He, Haibin, et al.
Published: (2024)
by: He, Haibin, et al.
Published: (2024)
LLMs as Bridges: Reformulating Grounded Multimodal Named Entity Recognition
by: Li, Jinyuan, et al.
Published: (2024)
by: Li, Jinyuan, et al.
Published: (2024)
LRANet++: Low-Rank Approximation Network for Accurate and Efficient Text Spotting
by: Su, Yuchen, et al.
Published: (2025)
by: Su, Yuchen, et al.
Published: (2025)
Hyper-Local Deformable Transformers for Text Spotting on Historical Maps
by: Lin, Yijun, et al.
Published: (2025)
by: Lin, Yijun, et al.
Published: (2025)
Video-Text Dataset Construction from Multi-AI Feedback: Promoting Weak-to-Strong Preference Learning for Video Large Language Models
by: Yi, Hao, et al.
Published: (2024)
by: Yi, Hao, et al.
Published: (2024)
Stroke Modeling Enables Vectorized Character Generation with Large Vectorized Glyph Model
by: Zhang, Xinyue, et al.
Published: (2025)
by: Zhang, Xinyue, et al.
Published: (2025)
SwinTextSpotter v2: Towards Better Synergy for Scene Text Spotting
by: Huang, Mingxin, et al.
Published: (2024)
by: Huang, Mingxin, et al.
Published: (2024)
Similar Items
-
GlyphMastero: A Glyph Encoder for High-Fidelity Scene Text Editing
by: Wang, Tong, et al.
Published: (2025) -
PP-FormulaNet: Bridging Accuracy and Efficiency in Advanced Formula Recognition
by: Liu, Hongen, et al.
Published: (2025) -
WeEdit: A Dataset, Benchmark and Glyph-Guided Framework for Text-centric Image Editing
by: Zhang, Hui, et al.
Published: (2026) -
CBDiff:Conditional Bernoulli Diffusion Models for Image Forgery Localization
by: Lei, Zhou, et al.
Published: (2025) -
FashionLOGO: Prompting Multimodal Large Language Models for Fashion Logo Embeddings
by: Wang, Zhen, et al.
Published: (2023)