:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yang, Tao, Luo, Yingmin, Qi, Zhongang, Wu, Yang, Shan, Ying, Chen, Chang Wen
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2406.02884
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

LayoutDiffusion: Controllable Diffusion Model for Layout-to-image Generation
by: Zheng, Guangcong, et al.
Published: (2023)

UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning
by: Liu, Ye, et al.
Published: (2025)

E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding
by: Liu, Ye, et al.
Published: (2024)

StyleAdapter: A Unified Stylized Image Generation Model
by: Wang, Zhouxia, et al.
Published: (2023)

ConsistCompose: Unified Multimodal Layout Control for Image Composition
by: Shi, Xuanke, et al.
Published: (2025)

EA-VTR: Event-Aware Video-Text Retrieval
by: Ma, Zongyang, et al.
Published: (2024)

CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities
by: Wu, Tao, et al.
Published: (2024)

SciPostLayout: A Dataset for Layout Analysis and Layout Generation of Scientific Posters
by: Tanaka, Shohei, et al.
Published: (2024)

DreamPoster: A Unified Framework for Image-Conditioned Generative Poster Design
by: Hu, Xiwei, et al.
Published: (2025)

SphereDiffusion: Spherical Geometry-Aware Distortion Resilient Diffusion Model
by: Wu, Tao, et al.
Published: (2024)

ReLayout: Integrating Relation Reasoning for Content-aware Layout Generation with Multi-modal Large Language Models
by: Tian, Jiaxu, et al.
Published: (2025)

Large Motion Model for Unified Multi-Modal Motion Generation
by: Zhang, Mingyuan, et al.
Published: (2024)

VideoMaker: Zero-shot Customized Video Generation with the Inherent Force of Video Diffusion Models
by: Wu, Tao, et al.
Published: (2024)

SGAT4PASS: Spherical Geometry-Aware Transformer for PAnoramic Semantic Segmentation
by: Li, Xuewei, et al.
Published: (2023)

Relation-Aware Diffusion Model for Controllable Poster Layout Generation
by: Li, Fengheng, et al.
Published: (2023)

PosterCopilot: Toward Layout Reasoning and Controllable Editing for Professional Graphic Design
by: Wei, Jiazhe, et al.
Published: (2025)

PosterOmni: Generalized Artistic Poster Creation via Task Distillation and Unified Reward Feedback
by: Chen, Sixiang, et al.
Published: (2026)

PosterCraft: Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework
by: Chen, SiXiang, et al.
Published: (2025)

Adaptive Perception for Unified Visual Multi-modal Object Tracking
by: Hu, Xiantao, et al.
Published: (2025)

JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation
by: Liu, Kai, et al.
Published: (2025)

PosterLlama: Bridging Design Ability of Langauge Model to Contents-Aware Layout Generation
by: Seol, Jaejung, et al.
Published: (2024)

MC-LLaVA: Multi-Concept Personalized Vision-Language Model
by: An, Ruichuan, et al.
Published: (2025)

DOGR: Towards Versatile Visual Document Grounding and Referring
by: Zhou, Yinan, et al.
Published: (2024)

LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding
by: Luo, Chuwei, et al.
Published: (2024)

HYDRA: Unifying Multi-modal Generation and Understanding via Representation-Harmonized Tokenization
by: Qiu, Xuerui, et al.
Published: (2026)

u-LLaVA: Unifying Multi-Modal Tasks via Large Language Model
by: Xu, Jinjin, et al.
Published: (2023)

Liquid: Language Models are Scalable and Unified Multi-modal Generators
by: Wu, Junfeng, et al.
Published: (2024)

LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model
by: AI, Inclusion, et al.
Published: (2026)

LLaVAction: evaluating and training multi-modal large language models for action understanding
by: Qi, Haozhe, et al.
Published: (2025)

MC-LLaVA: Multi-Concept Personalized Vision-Language Model
by: An, Ruichuan, et al.
Published: (2024)

PosterIQ: A Design Perspective Benchmark for Poster Understanding and Generation
by: Feng, Yuheng, et al.
Published: (2026)

OmniDocLayout: Towards Diverse Document Layout Generation via Coarse-to-Fine LLM Learning
by: Kang, Hengrui, et al.
Published: (2025)

Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding
by: Wang, Zining, et al.
Published: (2025)

uLayout: Unified Room Layout Estimation for Perspective and Panoramic Images
by: Lee, Jonathan, et al.
Published: (2025)

LLM-enhanced Action-aware Multi-modal Prompt Tuning for Image-Text Matching
by: Tian, Mengxiao, et al.
Published: (2025)

AttriHuman-3D: Editable 3D Human Avatar Generation with Attribute Decomposition and Indexing
by: Yang, Fan, et al.
Published: (2023)

Weakly-Supervised Temporal Action Localization by Progressive Complementary Learning
by: Du, Jia-Run, et al.
Published: (2022)

Scan-and-Print: Patch-level Data Summarization and Augmentation for Content-aware Layout Generation in Poster Design
by: Hsu, HsiaoYuan, et al.
Published: (2025)

UniM$^2$AE: Multi-modal Masked Autoencoders with Unified 3D Representation for 3D Perception in Autonomous Driving
by: Zou, Jian, et al.
Published: (2023)

Mono2Stereo: A Benchmark and Empirical Study for Stereo Conversion
by: Yu, Songsong, et al.
Published: (2025)