Saved in:
| Main Authors: | Jin, Jiongchao, Zhao, Shengchu, Chen, Dajun, Jiang, Wei, Li, Yong |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.19554 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MP-GUI: Modality Perception with MLLMs for GUI Understanding
by: Wang, Ziwei, et al.
Published: (2025)
by: Wang, Ziwei, et al.
Published: (2025)
Generating Animated Layouts as Structured Text Representations
by: Shin, Yeonsang, et al.
Published: (2025)
by: Shin, Yeonsang, et al.
Published: (2025)
HumanOmni: A Large Vision-Speech Language Model for Human-Centric Video Understanding
by: Zhao, Jiaxing, et al.
Published: (2025)
by: Zhao, Jiaxing, et al.
Published: (2025)
LayoutCoT: Unleashing the Deep Reasoning Potential of Large Language Models for Layout Generation
by: Shi, Hengyu, et al.
Published: (2025)
by: Shi, Hengyu, et al.
Published: (2025)
FICGen: Frequency-Inspired Contextual Disentanglement for Layout-driven Degraded Image Generation
by: Wang, Wenzhuang, et al.
Published: (2025)
by: Wang, Wenzhuang, et al.
Published: (2025)
OmniHuman: A Large-scale Dataset and Benchmark for Human-Centric Video Generation
by: Zhu, Lei, et al.
Published: (2026)
by: Zhu, Lei, et al.
Published: (2026)
StructLayoutFormer:Conditional Structured Layout Generation via Structure Serialization and Disentanglement
by: Hu, Xin, et al.
Published: (2025)
by: Hu, Xin, et al.
Published: (2025)
Vision-Centric Activation and Coordination for Multimodal Large Language Models
by: Wang, Yunnan, et al.
Published: (2025)
by: Wang, Yunnan, et al.
Published: (2025)
LLplace: The 3D Indoor Scene Layout Generation and Editing via Large Language Model
by: Yang, Yixuan, et al.
Published: (2024)
by: Yang, Yixuan, et al.
Published: (2024)
LayoutDiffusion: Controllable Diffusion Model for Layout-to-image Generation
by: Zheng, Guangcong, et al.
Published: (2023)
by: Zheng, Guangcong, et al.
Published: (2023)
LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding
by: Luo, Chuwei, et al.
Published: (2024)
by: Luo, Chuwei, et al.
Published: (2024)
Uni-Layout: Integrating Human Feedback in Unified Layout Generation and Evaluation
by: Lu, Shuo, et al.
Published: (2025)
by: Lu, Shuo, et al.
Published: (2025)
Human-MME: A Holistic Evaluation Benchmark for Human-Centric Multimodal Large Language Models
by: Liu, Yuansen, et al.
Published: (2025)
by: Liu, Yuansen, et al.
Published: (2025)
OpenHumanVid: A Large-Scale High-Quality Dataset for Enhancing Human-Centric Video Generation
by: Li, Hui, et al.
Published: (2024)
by: Li, Hui, et al.
Published: (2024)
MobileFlow: A Multimodal LLM For Mobile GUI Agent
by: Nong, Songqin, et al.
Published: (2024)
by: Nong, Songqin, et al.
Published: (2024)
Auteur: Language-Driven Cinematographic Framing for Human-Centric Video Generation
by: Kizil, Muhammed Burak, et al.
Published: (2026)
by: Kizil, Muhammed Burak, et al.
Published: (2026)
ReLayout: Versatile and Structure-Preserving Design Layout Editing via Relation-Aware Design Reconstruction
by: Lin, Jiawei, et al.
Published: (2026)
by: Lin, Jiawei, et al.
Published: (2026)
ReLayout: Integrating Relation Reasoning for Content-aware Layout Generation with Multi-modal Large Language Models
by: Tian, Jiaxu, et al.
Published: (2025)
by: Tian, Jiaxu, et al.
Published: (2025)
Manga Generation via Layout-controllable Diffusion
by: Chen, Siyu, et al.
Published: (2024)
by: Chen, Siyu, et al.
Published: (2024)
LayoutRAG: Retrieval-Augmented Model for Content-agnostic Conditional Layout Generation
by: Wu, Yuxuan, et al.
Published: (2025)
by: Wu, Yuxuan, et al.
Published: (2025)
SVRepair: Structured Visual Reasoning for Automated Program Repair
by: Tang, Xiaoxuan, et al.
Published: (2026)
by: Tang, Xiaoxuan, et al.
Published: (2026)
Spatial Diffusion for Cell Layout Generation
by: Li, Chen, et al.
Published: (2024)
by: Li, Chen, et al.
Published: (2024)
Hitem3D 2.0: Multi-View Guided Native 3D Texture Generation
by: He, Huiang, et al.
Published: (2026)
by: He, Huiang, et al.
Published: (2026)
Controllable Generation of Large-Scale 3D Urban Layouts with Semantic and Structural Guidance
by: Niu, Mengyuan, et al.
Published: (2025)
by: Niu, Mengyuan, et al.
Published: (2025)
Relation-Aware Diffusion Model for Controllable Poster Layout Generation
by: Li, Fengheng, et al.
Published: (2023)
by: Li, Fengheng, et al.
Published: (2023)
Exploiting Aggregation and Segregation of Representations for Domain Adaptive Human Pose Estimation
by: Peng, Qucheng, et al.
Published: (2024)
by: Peng, Qucheng, et al.
Published: (2024)
A Two-Stage System for Layout-Controlled Image Generation using Large Language Models and Diffusion Models
by: Koch, Jan-Hendrik, et al.
Published: (2025)
by: Koch, Jan-Hendrik, et al.
Published: (2025)
ActionArt: Advancing Multimodal Large Models for Fine-Grained Human-Centric Video Understanding
by: Peng, Yi-Xing, et al.
Published: (2025)
by: Peng, Yi-Xing, et al.
Published: (2025)
LDGen: Enhancing Text-to-Image Synthesis via Large Language Model-Driven Language Representation
by: Li, Pengzhi, et al.
Published: (2025)
by: Li, Pengzhi, et al.
Published: (2025)
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation
by: Liao, Kang, et al.
Published: (2025)
by: Liao, Kang, et al.
Published: (2025)
CreatiLayout: Siamese Multimodal Diffusion Transformer for Creative Layout-to-Image Generation
by: Zhang, Hui, et al.
Published: (2024)
by: Zhang, Hui, et al.
Published: (2024)
Physics-based Scene Layout Generation from Human Motion
by: Li, Jianan, et al.
Published: (2024)
by: Li, Jianan, et al.
Published: (2024)
No More Ambiguity in 360° Room Layout via Bi-Layout Estimation
by: Tsai, Yu-Ju, et al.
Published: (2024)
by: Tsai, Yu-Ju, et al.
Published: (2024)
ComposeAnyone: Controllable Layout-to-Human Generation with Decoupled Multimodal Conditions
by: Zhang, Shiyue, et al.
Published: (2025)
by: Zhang, Shiyue, et al.
Published: (2025)
LayoutDiT: Exploring Content-Graphic Balance in Layout Generation with Diffusion Transformer
by: Li, Yu, et al.
Published: (2024)
by: Li, Yu, et al.
Published: (2024)
StreamingEffect: Real-Time Human-Centric Video Effect Generation
by: Song, Yiren, et al.
Published: (2026)
by: Song, Yiren, et al.
Published: (2026)
Lumen: Unleashing Versatile Vision-Centric Capabilities of Large Multimodal Models
by: Jiao, Yang, et al.
Published: (2024)
by: Jiao, Yang, et al.
Published: (2024)
HOG-Layout: Hierarchical 3D Scene Generation, Optimization and Editing via Vision-Language Models
by: Jiang, Haiyan, et al.
Published: (2026)
by: Jiang, Haiyan, et al.
Published: (2026)
TableSeq: Unified Generation of Structure, Content, and Layout
by: Hamdi, Laziz, et al.
Published: (2026)
by: Hamdi, Laziz, et al.
Published: (2026)
LayoutFlow: Flow Matching for Layout Generation
by: Guerreiro, Julian Jorge Andrade, et al.
Published: (2024)
by: Guerreiro, Julian Jorge Andrade, et al.
Published: (2024)
Similar Items
-
MP-GUI: Modality Perception with MLLMs for GUI Understanding
by: Wang, Ziwei, et al.
Published: (2025) -
Generating Animated Layouts as Structured Text Representations
by: Shin, Yeonsang, et al.
Published: (2025) -
HumanOmni: A Large Vision-Speech Language Model for Human-Centric Video Understanding
by: Zhao, Jiaxing, et al.
Published: (2025) -
LayoutCoT: Unleashing the Deep Reasoning Potential of Large Language Models for Layout Generation
by: Shi, Hengyu, et al.
Published: (2025) -
FICGen: Frequency-Inspired Contextual Disentanglement for Layout-driven Degraded Image Generation
by: Wang, Wenzhuang, et al.
Published: (2025)