:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yang, Ling, Yu, Zhaochen, Meng, Chenlin, Xu, Minkai, Ermon, Stefano, Cui, Bin
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2401.11708
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Contextualized Diffusion Models for Text-Guided Image and Video Generation
by: Yang, Ling, et al.
Published: (2024)

Consistency Flow Matching: Defining Straight Flows with Velocity Consistency
by: Yang, Ling, et al.
Published: (2024)

RealCompo: Balancing Realism and Compositionality Improves Text-to-Image Diffusion Models
by: Zhang, Xinchen, et al.
Published: (2024)

DistillKac: Few-Step Image Generation via Damped Wave Equations
by: Han, Weiqiao, et al.
Published: (2025)

DreamPropeller: Supercharge Text-to-3D Generation with Parallel Sampling
by: Zhou, Linqi, et al.
Published: (2023)

DiffusionSat: A Generative Foundation Model for Satellite Imagery
by: Khanna, Samar, et al.
Published: (2023)

On the Scalability of Diffusion-based Text-to-Image Generation
by: Li, Hao, et al.
Published: (2024)

Geometric Trajectory Diffusion Models
by: Han, Jiaqi, et al.
Published: (2024)

Divergence Minimization Preference Optimization for Diffusion Model Alignment
by: Li, Binxu, et al.
Published: (2025)

RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction
by: Wang, Yuchi, et al.
Published: (2025)

A Multimodal Recaptioning Framework to Account for Perceptual Diversity Across Languages in Vision-Language Modeling
by: Buettner, Kyle, et al.
Published: (2025)

Enhance Multimodal Consistency and Coherence for Text-Image Plan Generation
by: Lu, Xiaoxin, et al.
Published: (2025)

Trans4D: Realistic Geometry-Aware Transition for Compositional Text-to-4D Synthesis
by: Zeng, Bohan, et al.
Published: (2024)

Multimodal LLMs as Customized Reward Models for Text-to-Image Generation
by: Zhou, Shijie, et al.
Published: (2025)

VQGraph: Rethinking Graph Representation Space for Bridging GNNs and MLPs
by: Yang, Ling, et al.
Published: (2023)

VideoTetris: Towards Compositional Text-to-Video Generation
by: Tian, Ye, et al.
Published: (2024)

LLMControl: Grounded Control of Text-to-Image Diffusion-based Synthesis with Multimodal LLMs
by: Wang, Jiaze, et al.
Published: (2025)

Improving Diffusion Inverse Problem Solving with Decoupled Noise Annealing
by: Zhang, Bingliang, et al.
Published: (2024)

Efficient Scaling of Diffusion Transformers for Text-to-Image Generation
by: Li, Hao, et al.
Published: (2024)

HarvestNet: A Dataset for Detecting Smallholder Farming Activity Using Harvest Piles and Remote Sensing
by: Xu, Jonathan, et al.
Published: (2023)

StyleAR: Customizing Multimodal Autoregressive Model for Style-Aligned Text-to-Image Generation
by: Wu, Yi, et al.
Published: (2025)

MM-ISTS: Cooperating Irregularly Sampled Time Series Forecasting with Multimodal Vision-Text LLMs
by: Lei, Zhi, et al.
Published: (2026)

Improved Object-Centric Diffusion Learning with Registers and Contrastive Alignment
by: Nguyen, Bac, et al.
Published: (2026)

FineViT: Progressively Unlocking Fine-Grained Perception with Dense Recaptions
by: Zhao, Peisen, et al.
Published: (2026)

Multimodal Prompt Decoupling Attack on the Safety Filters in Text-to-Image Models
by: Peng, Xingkai, et al.
Published: (2025)

What If We Recaption Billions of Web Images with LLaMA-3?
by: Li, Xianhang, et al.
Published: (2024)

Control Color: Multimodal Diffusion-based Interactive Image Colorization
by: Liang, Zhexin, et al.
Published: (2024)

ExPLoRA: Parameter-Efficient Extended Pre-Training to Adapt Vision Transformers under Domain Shifts
by: Khanna, Samar, et al.
Published: (2024)

Uncovering the Text Embedding in Text-to-Image Diffusion Models
by: Yu, Hu, et al.
Published: (2024)

Improving Diffusion-Based Image Synthesis with Context Prediction
by: Yang, Ling, et al.
Published: (2024)

MUMU: Bootstrapping Multimodal Image Generation from Text-to-Image Data
by: Berman, William, et al.
Published: (2024)

Structure-Guided Adversarial Training of Diffusion Models
by: Yang, Ling, et al.
Published: (2024)

Text Speaks Louder than Vision: ASCII Art Reveals Textual Biases in Vision-Language Models
by: Wang, Zhaochen, et al.
Published: (2025)

MULTI: Multimodal Understanding Leaderboard with Text and Images
by: Zhu, Zichen, et al.
Published: (2024)

SVGDreamer: Text Guided SVG Generation with Diffusion Model
by: Xing, Ximing, et al.
Published: (2023)

DiffusionNFT: Online Diffusion Reinforcement with Forward Process
by: Zheng, Kaiwen, et al.
Published: (2025)

Multi-path Exploration and Feedback Adjustment for Text-to-Image Person Retrieval
by: Kang, Bin, et al.
Published: (2024)

Paired Image Generation with Diffusion-Guided Diffusion Models
by: Zhang, Haoxuan, et al.
Published: (2025)

HanDiffuser: Text-to-Image Generation With Realistic Hand Appearances
by: Narasimhaswamy, Supreeth, et al.
Published: (2024)

15M Multimodal Facial Image-Text Dataset
by: Dai, Dawei, et al.
Published: (2024)