Saved in:
| Main Authors: | Li, Chao, Jiang, Chen, Liu, Xiaolong, Zhao, Jun, Wang, Guoxin |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2409.17524 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
JoyHallo: Digital human model for Mandarin
by: Shi, Sheng, et al.
Published: (2024)
by: Shi, Sheng, et al.
Published: (2024)
JoyVASA: Portrait and Animal Image Animation with Diffusion-Based Audio-Driven Facial Dynamics and Head Motion Generation
by: Cao, Xuyang, et al.
Published: (2024)
by: Cao, Xuyang, et al.
Published: (2024)
AnyText: Multilingual Visual Text Generation And Editing
by: Tuo, Yuxiang, et al.
Published: (2023)
by: Tuo, Yuxiang, et al.
Published: (2023)
MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering
by: Tang, Jingqun, et al.
Published: (2024)
by: Tang, Jingqun, et al.
Published: (2024)
Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering
by: Liu, Zeyu, et al.
Published: (2024)
by: Liu, Zeyu, et al.
Published: (2024)
SceneVTG++: Controllable Multilingual Visual Text Generation in the Wild
by: Liu, Jiawei, et al.
Published: (2025)
by: Liu, Jiawei, et al.
Published: (2025)
JoyStreamer: Unlocking Highly Expressive Avatars via Harmonized Text-Audio Conditioning
by: Wang, Ruikui, et al.
Published: (2026)
by: Wang, Ruikui, et al.
Published: (2026)
StyleTextGen: Style-Conditioned Multilingual Scene Text Generation
by: Chen, Zeyu, et al.
Published: (2026)
by: Chen, Zeyu, et al.
Published: (2026)
Semantic Draw Engineering for Text-to-Image Creation
by: Li, Yang, et al.
Published: (2023)
by: Li, Yang, et al.
Published: (2023)
Qwen-Image-Bench: From Generation to Creation in Text-to-Image Evaluation
by: Li, Niantong, et al.
Published: (2026)
by: Li, Niantong, et al.
Published: (2026)
UM-Text: A Unified Multimodal Model for Image Understanding and Visual Text Editing
by: Ma, Lichen, et al.
Published: (2026)
by: Ma, Lichen, et al.
Published: (2026)
Constructing Multilingual Visual-Text Datasets Revealing Visual Multilingual Ability of Vision Language Models
by: Atuhurra, Jesse, et al.
Published: (2024)
by: Atuhurra, Jesse, et al.
Published: (2024)
DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning
by: Zhao, Ruowen, et al.
Published: (2025)
by: Zhao, Ruowen, et al.
Published: (2025)
Text-to-Edit: Controllable End-to-End Video Ad Creation via Multimodal LLMs
by: Cheng, Dabing, et al.
Published: (2025)
by: Cheng, Dabing, et al.
Published: (2025)
EasyText: Controllable Diffusion Transformer for Multilingual Text Rendering
by: Lu, Runnan, et al.
Published: (2025)
by: Lu, Runnan, et al.
Published: (2025)
JoyGen: Audio-Driven 3D Depth-Aware Talking-Face Video Editing
by: Wang, Qili, et al.
Published: (2025)
by: Wang, Qili, et al.
Published: (2025)
DiffChat: Learning to Chat with Text-to-Image Synthesis Models for Interactive Image Creation
by: Wang, Jiapeng, et al.
Published: (2024)
by: Wang, Jiapeng, et al.
Published: (2024)
VisualTrans: A Benchmark for Real-World Visual Transformation Reasoning
by: Ji, Yuheng, et al.
Published: (2025)
by: Ji, Yuheng, et al.
Published: (2025)
From Visuals to Vocabulary: Establishing Equivalence Between Image and Text Token Through Autoregressive Pre-training in MLLMs
by: Li, Mingxiao, et al.
Published: (2025)
by: Li, Mingxiao, et al.
Published: (2025)
StyleBlend: Enhancing Style-Specific Content Creation in Text-to-Image Diffusion Models
by: Chen, Zichong, et al.
Published: (2025)
by: Chen, Zichong, et al.
Published: (2025)
Robustness-Aware 3D Object Detection in Autonomous Driving: A Review and Outlook
by: Song, Ziying, et al.
Published: (2024)
by: Song, Ziying, et al.
Published: (2024)
Diffusion-Based Visual Art Creation: A Survey and New Perspectives
by: Wang, Bingyuan, et al.
Published: (2024)
by: Wang, Bingyuan, et al.
Published: (2024)
DanceText: A Training-Free Layered Framework for Controllable Multilingual Text Transformation in Images
by: Yu, Zhenyu, et al.
Published: (2025)
by: Yu, Zhenyu, et al.
Published: (2025)
TextFlux: An OCR-Free DiT Model for High-Fidelity Multilingual Scene Text Synthesis
by: Xie, Yu, et al.
Published: (2025)
by: Xie, Yu, et al.
Published: (2025)
Exploring Robust Face-Voice Matching in Multilingual Environments
by: Tang, Jiehui, et al.
Published: (2024)
by: Tang, Jiehui, et al.
Published: (2024)
PTTA: A Pure Text-to-Animation Framework for High-Quality Creation
by: Chen, Ruiqi, et al.
Published: (2025)
by: Chen, Ruiqi, et al.
Published: (2025)
RepText: Rendering Visual Text via Replicating
by: Wang, Haofan, et al.
Published: (2025)
by: Wang, Haofan, et al.
Published: (2025)
ShotVerse: Advancing Cinematic Camera Control for Text-Driven Multi-Shot Video Creation
by: Yang, Songlin, et al.
Published: (2026)
by: Yang, Songlin, et al.
Published: (2026)
MHAD: Multimodal Home Activity Dataset with Multi-Angle Videos and Synchronized Physiological Signals
by: Yu, Lei, et al.
Published: (2024)
by: Yu, Lei, et al.
Published: (2024)
VisionCreator: A Native Visual-Generation Agentic Model with Understanding, Thinking, Planning and Creation
by: Lai, Jinxiang, et al.
Published: (2026)
by: Lai, Jinxiang, et al.
Published: (2026)
3SHNet: Boosting Image-Sentence Retrieval via Visual Semantic-Spatial Self-Highlighting
by: Ge, Xuri, et al.
Published: (2024)
by: Ge, Xuri, et al.
Published: (2024)
EfficientDreamer: High-Fidelity and Robust 3D Creation via Orthogonal-view Diffusion Prior
by: Hu, Zhipeng, et al.
Published: (2023)
by: Hu, Zhipeng, et al.
Published: (2023)
Research on Multilingual Natural Scene Text Detection Algorithm
by: Wang, Tao
Published: (2023)
by: Wang, Tao
Published: (2023)
Video Creation by Demonstration
by: Sun, Yihong, et al.
Published: (2024)
by: Sun, Yihong, et al.
Published: (2024)
Text-Guided Coarse-to-Fine Fusion Network for Robust Remote Sensing Visual Question Answering
by: Zhao, Zhicheng, et al.
Published: (2024)
by: Zhao, Zhicheng, et al.
Published: (2024)
DeepSolo++: Let Transformer Decoder with Explicit Points Solo for Multilingual Text Spotting
by: Ye, Maoyuan, et al.
Published: (2023)
by: Ye, Maoyuan, et al.
Published: (2023)
Investigating Text Insulation and Attention Mechanisms for Complex Visual Text Generation
by: Tai, Ying, et al.
Published: (2025)
by: Tai, Ying, et al.
Published: (2025)
Towards Robust Text-to-Image Person Retrieval: Multi-View Reformulation for Semantic Compensation
by: Yuan, Chao, et al.
Published: (2026)
by: Yuan, Chao, et al.
Published: (2026)
TextPecker: Rewarding Structural Anomaly Quantification for Enhancing Visual Text Rendering
by: Zhu, Hanshen, et al.
Published: (2026)
by: Zhu, Hanshen, et al.
Published: (2026)
Parrot: Multilingual Visual Instruction Tuning
by: Sun, Hai-Long, et al.
Published: (2024)
by: Sun, Hai-Long, et al.
Published: (2024)
Similar Items
-
JoyHallo: Digital human model for Mandarin
by: Shi, Sheng, et al.
Published: (2024) -
JoyVASA: Portrait and Animal Image Animation with Diffusion-Based Audio-Driven Facial Dynamics and Head Motion Generation
by: Cao, Xuyang, et al.
Published: (2024) -
AnyText: Multilingual Visual Text Generation And Editing
by: Tuo, Yuxiang, et al.
Published: (2023) -
MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering
by: Tang, Jingqun, et al.
Published: (2024) -
Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering
by: Liu, Zeyu, et al.
Published: (2024)