:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Jin, Jiongchao, Zhao, Shengchu, Chen, Dajun, Jiang, Wei, Li, Yong
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2505.19554
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MP-GUI: Modality Perception with MLLMs for GUI Understanding
by: Wang, Ziwei, et al.
Published: (2025)

Generating Animated Layouts as Structured Text Representations
by: Shin, Yeonsang, et al.
Published: (2025)

HumanOmni: A Large Vision-Speech Language Model for Human-Centric Video Understanding
by: Zhao, Jiaxing, et al.
Published: (2025)

LayoutCoT: Unleashing the Deep Reasoning Potential of Large Language Models for Layout Generation
by: Shi, Hengyu, et al.
Published: (2025)

FICGen: Frequency-Inspired Contextual Disentanglement for Layout-driven Degraded Image Generation
by: Wang, Wenzhuang, et al.
Published: (2025)

OmniHuman: A Large-scale Dataset and Benchmark for Human-Centric Video Generation
by: Zhu, Lei, et al.
Published: (2026)

StructLayoutFormer:Conditional Structured Layout Generation via Structure Serialization and Disentanglement
by: Hu, Xin, et al.
Published: (2025)

Vision-Centric Activation and Coordination for Multimodal Large Language Models
by: Wang, Yunnan, et al.
Published: (2025)

LLplace: The 3D Indoor Scene Layout Generation and Editing via Large Language Model
by: Yang, Yixuan, et al.
Published: (2024)

LayoutDiffusion: Controllable Diffusion Model for Layout-to-image Generation
by: Zheng, Guangcong, et al.
Published: (2023)

LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding
by: Luo, Chuwei, et al.
Published: (2024)

Uni-Layout: Integrating Human Feedback in Unified Layout Generation and Evaluation
by: Lu, Shuo, et al.
Published: (2025)

Human-MME: A Holistic Evaluation Benchmark for Human-Centric Multimodal Large Language Models
by: Liu, Yuansen, et al.
Published: (2025)

OpenHumanVid: A Large-Scale High-Quality Dataset for Enhancing Human-Centric Video Generation
by: Li, Hui, et al.
Published: (2024)

MobileFlow: A Multimodal LLM For Mobile GUI Agent
by: Nong, Songqin, et al.
Published: (2024)

Auteur: Language-Driven Cinematographic Framing for Human-Centric Video Generation
by: Kizil, Muhammed Burak, et al.
Published: (2026)

ReLayout: Versatile and Structure-Preserving Design Layout Editing via Relation-Aware Design Reconstruction
by: Lin, Jiawei, et al.
Published: (2026)

ReLayout: Integrating Relation Reasoning for Content-aware Layout Generation with Multi-modal Large Language Models
by: Tian, Jiaxu, et al.
Published: (2025)

Manga Generation via Layout-controllable Diffusion
by: Chen, Siyu, et al.
Published: (2024)

LayoutRAG: Retrieval-Augmented Model for Content-agnostic Conditional Layout Generation
by: Wu, Yuxuan, et al.
Published: (2025)

SVRepair: Structured Visual Reasoning for Automated Program Repair
by: Tang, Xiaoxuan, et al.
Published: (2026)

Spatial Diffusion for Cell Layout Generation
by: Li, Chen, et al.
Published: (2024)

Hitem3D 2.0: Multi-View Guided Native 3D Texture Generation
by: He, Huiang, et al.
Published: (2026)

Controllable Generation of Large-Scale 3D Urban Layouts with Semantic and Structural Guidance
by: Niu, Mengyuan, et al.
Published: (2025)

Relation-Aware Diffusion Model for Controllable Poster Layout Generation
by: Li, Fengheng, et al.
Published: (2023)

Exploiting Aggregation and Segregation of Representations for Domain Adaptive Human Pose Estimation
by: Peng, Qucheng, et al.
Published: (2024)

A Two-Stage System for Layout-Controlled Image Generation using Large Language Models and Diffusion Models
by: Koch, Jan-Hendrik, et al.
Published: (2025)

ActionArt: Advancing Multimodal Large Models for Fine-Grained Human-Centric Video Understanding
by: Peng, Yi-Xing, et al.
Published: (2025)

LDGen: Enhancing Text-to-Image Synthesis via Large Language Model-Driven Language Representation
by: Li, Pengzhi, et al.
Published: (2025)

Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation
by: Liao, Kang, et al.
Published: (2025)

CreatiLayout: Siamese Multimodal Diffusion Transformer for Creative Layout-to-Image Generation
by: Zhang, Hui, et al.
Published: (2024)

Physics-based Scene Layout Generation from Human Motion
by: Li, Jianan, et al.
Published: (2024)

No More Ambiguity in 360° Room Layout via Bi-Layout Estimation
by: Tsai, Yu-Ju, et al.
Published: (2024)

ComposeAnyone: Controllable Layout-to-Human Generation with Decoupled Multimodal Conditions
by: Zhang, Shiyue, et al.
Published: (2025)

LayoutDiT: Exploring Content-Graphic Balance in Layout Generation with Diffusion Transformer
by: Li, Yu, et al.
Published: (2024)

StreamingEffect: Real-Time Human-Centric Video Effect Generation
by: Song, Yiren, et al.
Published: (2026)

Lumen: Unleashing Versatile Vision-Centric Capabilities of Large Multimodal Models
by: Jiao, Yang, et al.
Published: (2024)

HOG-Layout: Hierarchical 3D Scene Generation, Optimization and Editing via Vision-Language Models
by: Jiang, Haiyan, et al.
Published: (2026)

TableSeq: Unified Generation of Structure, Content, and Layout
by: Hamdi, Laziz, et al.
Published: (2026)

LayoutFlow: Flow Matching for Layout Generation
by: Guerreiro, Julian Jorge Andrade, et al.
Published: (2024)