Saved in:
| Main Authors: | Yang, Tao, Luo, Yingmin, Qi, Zhongang, Wu, Yang, Shan, Ying, Chen, Chang Wen |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2406.02884 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
LayoutDiffusion: Controllable Diffusion Model for Layout-to-image Generation
by: Zheng, Guangcong, et al.
Published: (2023)
by: Zheng, Guangcong, et al.
Published: (2023)
UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning
by: Liu, Ye, et al.
Published: (2025)
by: Liu, Ye, et al.
Published: (2025)
E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding
by: Liu, Ye, et al.
Published: (2024)
by: Liu, Ye, et al.
Published: (2024)
StyleAdapter: A Unified Stylized Image Generation Model
by: Wang, Zhouxia, et al.
Published: (2023)
by: Wang, Zhouxia, et al.
Published: (2023)
ConsistCompose: Unified Multimodal Layout Control for Image Composition
by: Shi, Xuanke, et al.
Published: (2025)
by: Shi, Xuanke, et al.
Published: (2025)
EA-VTR: Event-Aware Video-Text Retrieval
by: Ma, Zongyang, et al.
Published: (2024)
by: Ma, Zongyang, et al.
Published: (2024)
CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities
by: Wu, Tao, et al.
Published: (2024)
by: Wu, Tao, et al.
Published: (2024)
SciPostLayout: A Dataset for Layout Analysis and Layout Generation of Scientific Posters
by: Tanaka, Shohei, et al.
Published: (2024)
by: Tanaka, Shohei, et al.
Published: (2024)
DreamPoster: A Unified Framework for Image-Conditioned Generative Poster Design
by: Hu, Xiwei, et al.
Published: (2025)
by: Hu, Xiwei, et al.
Published: (2025)
SphereDiffusion: Spherical Geometry-Aware Distortion Resilient Diffusion Model
by: Wu, Tao, et al.
Published: (2024)
by: Wu, Tao, et al.
Published: (2024)
ReLayout: Integrating Relation Reasoning for Content-aware Layout Generation with Multi-modal Large Language Models
by: Tian, Jiaxu, et al.
Published: (2025)
by: Tian, Jiaxu, et al.
Published: (2025)
Large Motion Model for Unified Multi-Modal Motion Generation
by: Zhang, Mingyuan, et al.
Published: (2024)
by: Zhang, Mingyuan, et al.
Published: (2024)
VideoMaker: Zero-shot Customized Video Generation with the Inherent Force of Video Diffusion Models
by: Wu, Tao, et al.
Published: (2024)
by: Wu, Tao, et al.
Published: (2024)
SGAT4PASS: Spherical Geometry-Aware Transformer for PAnoramic Semantic Segmentation
by: Li, Xuewei, et al.
Published: (2023)
by: Li, Xuewei, et al.
Published: (2023)
Relation-Aware Diffusion Model for Controllable Poster Layout Generation
by: Li, Fengheng, et al.
Published: (2023)
by: Li, Fengheng, et al.
Published: (2023)
PosterCopilot: Toward Layout Reasoning and Controllable Editing for Professional Graphic Design
by: Wei, Jiazhe, et al.
Published: (2025)
by: Wei, Jiazhe, et al.
Published: (2025)
PosterOmni: Generalized Artistic Poster Creation via Task Distillation and Unified Reward Feedback
by: Chen, Sixiang, et al.
Published: (2026)
by: Chen, Sixiang, et al.
Published: (2026)
PosterCraft: Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework
by: Chen, SiXiang, et al.
Published: (2025)
by: Chen, SiXiang, et al.
Published: (2025)
Adaptive Perception for Unified Visual Multi-modal Object Tracking
by: Hu, Xiantao, et al.
Published: (2025)
by: Hu, Xiantao, et al.
Published: (2025)
JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation
by: Liu, Kai, et al.
Published: (2025)
by: Liu, Kai, et al.
Published: (2025)
PosterLlama: Bridging Design Ability of Langauge Model to Contents-Aware Layout Generation
by: Seol, Jaejung, et al.
Published: (2024)
by: Seol, Jaejung, et al.
Published: (2024)
MC-LLaVA: Multi-Concept Personalized Vision-Language Model
by: An, Ruichuan, et al.
Published: (2025)
by: An, Ruichuan, et al.
Published: (2025)
DOGR: Towards Versatile Visual Document Grounding and Referring
by: Zhou, Yinan, et al.
Published: (2024)
by: Zhou, Yinan, et al.
Published: (2024)
LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding
by: Luo, Chuwei, et al.
Published: (2024)
by: Luo, Chuwei, et al.
Published: (2024)
HYDRA: Unifying Multi-modal Generation and Understanding via Representation-Harmonized Tokenization
by: Qiu, Xuerui, et al.
Published: (2026)
by: Qiu, Xuerui, et al.
Published: (2026)
u-LLaVA: Unifying Multi-Modal Tasks via Large Language Model
by: Xu, Jinjin, et al.
Published: (2023)
by: Xu, Jinjin, et al.
Published: (2023)
Liquid: Language Models are Scalable and Unified Multi-modal Generators
by: Wu, Junfeng, et al.
Published: (2024)
by: Wu, Junfeng, et al.
Published: (2024)
LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model
by: AI, Inclusion, et al.
Published: (2026)
by: AI, Inclusion, et al.
Published: (2026)
LLaVAction: evaluating and training multi-modal large language models for action understanding
by: Qi, Haozhe, et al.
Published: (2025)
by: Qi, Haozhe, et al.
Published: (2025)
MC-LLaVA: Multi-Concept Personalized Vision-Language Model
by: An, Ruichuan, et al.
Published: (2024)
by: An, Ruichuan, et al.
Published: (2024)
PosterIQ: A Design Perspective Benchmark for Poster Understanding and Generation
by: Feng, Yuheng, et al.
Published: (2026)
by: Feng, Yuheng, et al.
Published: (2026)
OmniDocLayout: Towards Diverse Document Layout Generation via Coarse-to-Fine LLM Learning
by: Kang, Hengrui, et al.
Published: (2025)
by: Kang, Hengrui, et al.
Published: (2025)
Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding
by: Wang, Zining, et al.
Published: (2025)
by: Wang, Zining, et al.
Published: (2025)
uLayout: Unified Room Layout Estimation for Perspective and Panoramic Images
by: Lee, Jonathan, et al.
Published: (2025)
by: Lee, Jonathan, et al.
Published: (2025)
LLM-enhanced Action-aware Multi-modal Prompt Tuning for Image-Text Matching
by: Tian, Mengxiao, et al.
Published: (2025)
by: Tian, Mengxiao, et al.
Published: (2025)
AttriHuman-3D: Editable 3D Human Avatar Generation with Attribute Decomposition and Indexing
by: Yang, Fan, et al.
Published: (2023)
by: Yang, Fan, et al.
Published: (2023)
Weakly-Supervised Temporal Action Localization by Progressive Complementary Learning
by: Du, Jia-Run, et al.
Published: (2022)
by: Du, Jia-Run, et al.
Published: (2022)
Scan-and-Print: Patch-level Data Summarization and Augmentation for Content-aware Layout Generation in Poster Design
by: Hsu, HsiaoYuan, et al.
Published: (2025)
by: Hsu, HsiaoYuan, et al.
Published: (2025)
UniM$^2$AE: Multi-modal Masked Autoencoders with Unified 3D Representation for 3D Perception in Autonomous Driving
by: Zou, Jian, et al.
Published: (2023)
by: Zou, Jian, et al.
Published: (2023)
Mono2Stereo: A Benchmark and Empirical Study for Stereo Conversion
by: Yu, Songsong, et al.
Published: (2025)
by: Yu, Songsong, et al.
Published: (2025)
Similar Items
-
LayoutDiffusion: Controllable Diffusion Model for Layout-to-image Generation
by: Zheng, Guangcong, et al.
Published: (2023) -
UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning
by: Liu, Ye, et al.
Published: (2025) -
E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding
by: Liu, Ye, et al.
Published: (2024) -
StyleAdapter: A Unified Stylized Image Generation Model
by: Wang, Zhouxia, et al.
Published: (2023) -
ConsistCompose: Unified Multimodal Layout Control for Image Composition
by: Shi, Xuanke, et al.
Published: (2025)