:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Chen, Jingye, Zhao, Yuzhong, Huang, Yupan, Cui, Lei, Dong, Li, Lv, Tengchao, Chen, Qifeng, Wei, Furu
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2503.21172
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

KOSMOS-2.5: A Multimodal Literate Model
by: Lv, Tengchao, et al.
Published: (2023)

DocReward: A Document Reward Model for Structuring and Stylizing
by: Liu, Junpeng, et al.
Published: (2025)

TALE: Training-free Cross-domain Image Composition via Adaptive Latent Manipulation and Energy-guided Optimization
by: Pham, Kien T., et al.
Published: (2024)

Rethinking Layered Graphic Design Generation with a Top-Down Approach
by: Chen, Jingye, et al.
Published: (2025)

PEACE: Empowering Geologic Map Holistic Understanding with MLLMs
by: Huang, Yangyu, et al.
Published: (2025)

Does Synthetic Layered Design Data Benefit Layered Design Decomposition?
by: Wu, Kam Man, et al.
Published: (2026)

Kosmos-G: Generating Images in Context with Multimodal Large Language Models
by: Pan, Xichen, et al.
Published: (2023)

Large Motion Video Autoencoding with Cross-modal Video VAE
by: Xing, Yazhou, et al.
Published: (2024)

Towards Generalist Game Players: An Investigation of Foundation Models in the Game Multiverse
by: Zhang, Kuan, et al.
Published: (2026)

Hunyuan-Game: Industrial-grade Intelligent Game Creation Model
by: Li, Ruihuang, et al.
Published: (2025)

BEV-VAE: Multi-view Image Generation with Spatial Consistency for Autonomous Driving
by: Chen, Zeming, et al.
Published: (2025)

From Virtual Games to Real-World Play
by: Sun, Wenqiang, et al.
Published: (2025)

Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation
by: Wu, Xun, et al.
Published: (2024)

Optimizing Prompts for Text-to-Image Generation
by: Hao, Yaru, et al.
Published: (2022)

Play to Generalize: Learning to Reason Through Game Play
by: Xie, Yunfei, et al.
Published: (2025)

Hunyuan-GameCraft-2: Instruction-following Interactive Game World Model
by: Tang, Junshu, et al.
Published: (2025)

GameGen-X: Interactive Open-world Game Video Generation
by: Che, Haoxuan, et al.
Published: (2024)

AvatarArtist: Open-Domain 4D Avatarization
by: Liu, Hongyu, et al.
Published: (2025)

GameFactory: Creating New Games with Generative Interactive Videos
by: Yu, Jiwen, et al.
Published: (2025)

CharaConsist: Fine-Grained Consistent Character Generation
by: Wang, Mengyu, et al.
Published: (2025)

Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency
by: Liu, Tianqi, et al.
Published: (2025)

TIGaussian: Disentangle Gaussians for Spatial-Awared Text-Image-3D Alignment
by: Liu, Jiarun, et al.
Published: (2026)

Domain Game: Disentangle Anatomical Feature for Single Domain Generalized Segmentation
by: Chen, Hao, et al.
Published: (2024)

Diagnosing the Compositional Knowledge of Vision Language Models from a Game-Theoretic View
by: Wang, Jin, et al.
Published: (2024)

FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation
by: Mu, Xinzhi, et al.
Published: (2024)

LibraGen: Playing a Balance Game in Subject-Driven Video Generation
by: Zhu, Jiahao, et al.
Published: (2026)

UniPlane: Unified Plane Detection and Reconstruction from Posed Monocular Videos
by: Huang, Yuzhong, et al.
Published: (2024)

Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition
by: Li, Jiaqi, et al.
Published: (2025)

4DGen: Grounded 4D Content Generation with Spatial-temporal Consistency
by: Yin, Yuyang, et al.
Published: (2023)

EEdit: Rethinking the Spatial and Temporal Redundancy for Efficient Image Editing
by: Yan, Zexuan, et al.
Published: (2025)

Synergizing Understanding and Generation with Interleaved Analyzing-Drafting Thinking
by: Wu, Shengqiong, et al.
Published: (2026)

OrientDream: Streamlining Text-to-3D Generation with Explicit Orientation Control
by: Huang, Yuzhong, et al.
Published: (2024)

Spatial Chain-of-Thought: Bridging Understanding and Generation Models for Spatial Reasoning Generation
by: Chen, Wei, et al.
Published: (2026)

GTAutoAct: An Automatic Datasets Generation Framework Based on Game Engine Redevelopment for Action Recognition
by: Song, Xingyu, et al.
Published: (2024)

Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models
by: Huang, Yupan, et al.
Published: (2023)

Hierarchical Fine-grained Preference Optimization for Physically Plausible Video Generation
by: Chen, Harold Haodong, et al.
Published: (2025)

GameTileNet: A Semantic Dataset for Low-Resolution Game Art in Procedural Content Generation
by: Chen, Yi-Chun, et al.
Published: (2025)

PrivImage: Differentially Private Synthetic Image Generation using Diffusion Models with Semantic-Aware Pretraining
by: Li, Kecen, et al.
Published: (2023)

GUI Agents for Continual Game Generation
by: Huang, Yixu, et al.
Published: (2026)

GameIR: A Large-Scale Synthesized Ground-Truth Dataset for Image Restoration over Gaming Content
by: Zhou, Lebin, et al.
Published: (2024)