Saved in:
| Main Authors: | Chen, Jian, Zhang, Ruiyi, Zhou, Yufan, Jain, Rajiv, Xu, Zhiqiang, Rossi, Ryan, Chen, Changyou |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.04754 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MMR: Evaluating Reading Ability of Large Multimodal Models
by: Chen, Jian, et al.
Published: (2024)
by: Chen, Jian, et al.
Published: (2024)
A High-Quality Text-Rich Image Instruction Tuning Dataset via Hybrid Instruction Generation
by: Zhou, Shijie, et al.
Published: (2024)
by: Zhou, Shijie, et al.
Published: (2024)
TRINS: Towards Multimodal Language Models that Can Read
by: Zhang, Ruiyi, et al.
Published: (2024)
by: Zhang, Ruiyi, et al.
Published: (2024)
LLaVA-Read: Enhancing Reading Ability of Multimodal Language Models
by: Zhang, Ruiyi, et al.
Published: (2024)
by: Zhang, Ruiyi, et al.
Published: (2024)
SV-RAG: LoRA-Contextualizing Adaptation of MLLMs for Long Document Understanding
by: Chen, Jian, et al.
Published: (2024)
by: Chen, Jian, et al.
Published: (2024)
Multimodal LLMs as Customized Reward Models for Text-to-Image Generation
by: Zhou, Shijie, et al.
Published: (2025)
by: Zhou, Shijie, et al.
Published: (2025)
VisR-Bench: An Empirical Study on Visual Retrieval-Augmented Generation for Multilingual Long Document Understanding
by: Chen, Jian, et al.
Published: (2025)
by: Chen, Jian, et al.
Published: (2025)
Enhancing Diffusion Posterior Sampling for Inverse Problems by Integrating Crafted Measurements
by: Zhou, Shijie, et al.
Published: (2024)
by: Zhou, Shijie, et al.
Published: (2024)
TextLap: Customizing Language Models for Text-to-Layout Planning
by: Chen, Jian, et al.
Published: (2024)
by: Chen, Jian, et al.
Published: (2024)
GUI-AIMA: Aligning Intrinsic Multimodal Attention with a Context Anchor for GUI Grounding
by: Zhou, Shijie, et al.
Published: (2025)
by: Zhou, Shijie, et al.
Published: (2025)
Towards Visual Text Grounding of Multimodal Large Language Model
by: Li, Ming, et al.
Published: (2025)
by: Li, Ming, et al.
Published: (2025)
ARTIST: Improving the Generation of Text-rich Images with Disentangled Diffusion Models and Large Language Models
by: Zhang, Jianyi, et al.
Published: (2024)
by: Zhang, Jianyi, et al.
Published: (2024)
DiffPattern-Flex: Efficient Layout Pattern Generation via Discrete Diffusion
by: Wang, Zixiao, et al.
Published: (2025)
by: Wang, Zixiao, et al.
Published: (2025)
Customization Assistant for Text-to-image Generation
by: Zhou, Yufan, et al.
Published: (2023)
by: Zhou, Yufan, et al.
Published: (2023)
Manga Generation via Layout-controllable Diffusion
by: Chen, Siyu, et al.
Published: (2024)
by: Chen, Siyu, et al.
Published: (2024)
LayoutDiffusion: Controllable Diffusion Model for Layout-to-image Generation
by: Zheng, Guangcong, et al.
Published: (2023)
by: Zheng, Guangcong, et al.
Published: (2023)
Spatial Diffusion for Cell Layout Generation
by: Li, Chen, et al.
Published: (2024)
by: Li, Chen, et al.
Published: (2024)
AesthetiQ: Enhancing Graphic Layout Design via Aesthetic-Aware Preference Alignment of Multi-modal Large Language Models
by: Patnaik, Sohan, et al.
Published: (2025)
by: Patnaik, Sohan, et al.
Published: (2025)
Uni-Layout: Integrating Human Feedback in Unified Layout Generation and Evaluation
by: Lu, Shuo, et al.
Published: (2025)
by: Lu, Shuo, et al.
Published: (2025)
Zigzag Diffusion Sampling: Diffusion Models Can Self-Improve via Self-Reflection
by: Bai, Lichen, et al.
Published: (2024)
by: Bai, Lichen, et al.
Published: (2024)
Towards Large-scale Chemical Reaction Image Parsing via a Multimodal Large Language Model
by: Chen, Yufan, et al.
Published: (2025)
by: Chen, Yufan, et al.
Published: (2025)
PatternPaint: Practical Layout Pattern Generation Using Diffusion-Based Inpainting
by: Zhou, Guanglei, et al.
Published: (2024)
by: Zhou, Guanglei, et al.
Published: (2024)
MusiXQA: Advancing Visual Music Understanding in Multimodal Large Language Models
by: Chen, Jian, et al.
Published: (2025)
by: Chen, Jian, et al.
Published: (2025)
Layout2Scene: 3D Semantic Layout Guided Scene Generation via Geometry and Appearance Diffusion Priors
by: Chen, Minglin, et al.
Published: (2025)
by: Chen, Minglin, et al.
Published: (2025)
Diffusion Models For Multi-Modal Generative Modeling
by: Chen, Changyou, et al.
Published: (2024)
by: Chen, Changyou, et al.
Published: (2024)
HybriDLA: Hybrid Generation for Document Layout Analysis
by: Chen, Yufan, et al.
Published: (2025)
by: Chen, Yufan, et al.
Published: (2025)
Craft: Cross-modal Aligned Features Improve Robustness of Prompt Tuning
by: Sun, Jingchen, et al.
Published: (2024)
by: Sun, Jingchen, et al.
Published: (2024)
Lay-Your-Scene: Natural Scene Layout Generation with Diffusion Transformers
by: Srivastava, Divyansh, et al.
Published: (2025)
by: Srivastava, Divyansh, et al.
Published: (2025)
Layout-Corrector: Alleviating Layout Sticking Phenomenon in Discrete Diffusion Model
by: Iwai, Shoma, et al.
Published: (2024)
by: Iwai, Shoma, et al.
Published: (2024)
PEEKABOO: Interactive Video Generation via Masked-Diffusion
by: Jain, Yash, et al.
Published: (2023)
by: Jain, Yash, et al.
Published: (2023)
A Geometric Perspective on Diffusion Models
by: Chen, Defang, et al.
Published: (2023)
by: Chen, Defang, et al.
Published: (2023)
SFDLA: Source-Free Document Layout Analysis
by: Tewes, Sebastian, et al.
Published: (2025)
by: Tewes, Sebastian, et al.
Published: (2025)
STAY Diffusion: Styled Layout Diffusion Model for Diverse Layout-to-Image Generation
by: Wang, Ruyu, et al.
Published: (2025)
by: Wang, Ruyu, et al.
Published: (2025)
Golden Noise for Diffusion Models: A Learning Framework
by: Zhou, Zikai, et al.
Published: (2024)
by: Zhou, Zikai, et al.
Published: (2024)
Enhancing Image Layout Control with Loss-Guided Diffusion Models
by: Patel, Zakaria, et al.
Published: (2024)
by: Patel, Zakaria, et al.
Published: (2024)
Training-Free Layout-to-Image Generation with Marginal Attention Constraints
by: Chen, Huancheng, et al.
Published: (2024)
by: Chen, Huancheng, et al.
Published: (2024)
MajutsuCity: Language-driven Aesthetic-adaptive City Generation with Controllable 3D Assets and Layouts
by: Huang, Zilong, et al.
Published: (2025)
by: Huang, Zilong, et al.
Published: (2025)
ReLayout: Integrating Relation Reasoning for Content-aware Layout Generation with Multi-modal Large Language Models
by: Tian, Jiaxu, et al.
Published: (2025)
by: Tian, Jiaxu, et al.
Published: (2025)
RoDLA: Benchmarking the Robustness of Document Layout Analysis Models
by: Chen, Yufan, et al.
Published: (2024)
by: Chen, Yufan, et al.
Published: (2024)
TaxaDiffusion: Progressively Trained Diffusion Model for Fine-Grained Species Generation
by: Monsefi, Amin Karimi, et al.
Published: (2025)
by: Monsefi, Amin Karimi, et al.
Published: (2025)
Similar Items
-
MMR: Evaluating Reading Ability of Large Multimodal Models
by: Chen, Jian, et al.
Published: (2024) -
A High-Quality Text-Rich Image Instruction Tuning Dataset via Hybrid Instruction Generation
by: Zhou, Shijie, et al.
Published: (2024) -
TRINS: Towards Multimodal Language Models that Can Read
by: Zhang, Ruiyi, et al.
Published: (2024) -
LLaVA-Read: Enhancing Reading Ability of Multimodal Language Models
by: Zhang, Ruiyi, et al.
Published: (2024) -
SV-RAG: LoRA-Contextualizing Adaptation of MLLMs for Long Document Understanding
by: Chen, Jian, et al.
Published: (2024)