Saved in:
| Main Authors: | Yang, Ling, Yu, Zhaochen, Meng, Chenlin, Xu, Minkai, Ermon, Stefano, Cui, Bin |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2401.11708 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Contextualized Diffusion Models for Text-Guided Image and Video Generation
by: Yang, Ling, et al.
Published: (2024)
by: Yang, Ling, et al.
Published: (2024)
Consistency Flow Matching: Defining Straight Flows with Velocity Consistency
by: Yang, Ling, et al.
Published: (2024)
by: Yang, Ling, et al.
Published: (2024)
RealCompo: Balancing Realism and Compositionality Improves Text-to-Image Diffusion Models
by: Zhang, Xinchen, et al.
Published: (2024)
by: Zhang, Xinchen, et al.
Published: (2024)
DistillKac: Few-Step Image Generation via Damped Wave Equations
by: Han, Weiqiao, et al.
Published: (2025)
by: Han, Weiqiao, et al.
Published: (2025)
DreamPropeller: Supercharge Text-to-3D Generation with Parallel Sampling
by: Zhou, Linqi, et al.
Published: (2023)
by: Zhou, Linqi, et al.
Published: (2023)
DiffusionSat: A Generative Foundation Model for Satellite Imagery
by: Khanna, Samar, et al.
Published: (2023)
by: Khanna, Samar, et al.
Published: (2023)
On the Scalability of Diffusion-based Text-to-Image Generation
by: Li, Hao, et al.
Published: (2024)
by: Li, Hao, et al.
Published: (2024)
Geometric Trajectory Diffusion Models
by: Han, Jiaqi, et al.
Published: (2024)
by: Han, Jiaqi, et al.
Published: (2024)
Divergence Minimization Preference Optimization for Diffusion Model Alignment
by: Li, Binxu, et al.
Published: (2025)
by: Li, Binxu, et al.
Published: (2025)
RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction
by: Wang, Yuchi, et al.
Published: (2025)
by: Wang, Yuchi, et al.
Published: (2025)
A Multimodal Recaptioning Framework to Account for Perceptual Diversity Across Languages in Vision-Language Modeling
by: Buettner, Kyle, et al.
Published: (2025)
by: Buettner, Kyle, et al.
Published: (2025)
Enhance Multimodal Consistency and Coherence for Text-Image Plan Generation
by: Lu, Xiaoxin, et al.
Published: (2025)
by: Lu, Xiaoxin, et al.
Published: (2025)
Trans4D: Realistic Geometry-Aware Transition for Compositional Text-to-4D Synthesis
by: Zeng, Bohan, et al.
Published: (2024)
by: Zeng, Bohan, et al.
Published: (2024)
Multimodal LLMs as Customized Reward Models for Text-to-Image Generation
by: Zhou, Shijie, et al.
Published: (2025)
by: Zhou, Shijie, et al.
Published: (2025)
VQGraph: Rethinking Graph Representation Space for Bridging GNNs and MLPs
by: Yang, Ling, et al.
Published: (2023)
by: Yang, Ling, et al.
Published: (2023)
VideoTetris: Towards Compositional Text-to-Video Generation
by: Tian, Ye, et al.
Published: (2024)
by: Tian, Ye, et al.
Published: (2024)
LLMControl: Grounded Control of Text-to-Image Diffusion-based Synthesis with Multimodal LLMs
by: Wang, Jiaze, et al.
Published: (2025)
by: Wang, Jiaze, et al.
Published: (2025)
Improving Diffusion Inverse Problem Solving with Decoupled Noise Annealing
by: Zhang, Bingliang, et al.
Published: (2024)
by: Zhang, Bingliang, et al.
Published: (2024)
Efficient Scaling of Diffusion Transformers for Text-to-Image Generation
by: Li, Hao, et al.
Published: (2024)
by: Li, Hao, et al.
Published: (2024)
HarvestNet: A Dataset for Detecting Smallholder Farming Activity Using Harvest Piles and Remote Sensing
by: Xu, Jonathan, et al.
Published: (2023)
by: Xu, Jonathan, et al.
Published: (2023)
StyleAR: Customizing Multimodal Autoregressive Model for Style-Aligned Text-to-Image Generation
by: Wu, Yi, et al.
Published: (2025)
by: Wu, Yi, et al.
Published: (2025)
MM-ISTS: Cooperating Irregularly Sampled Time Series Forecasting with Multimodal Vision-Text LLMs
by: Lei, Zhi, et al.
Published: (2026)
by: Lei, Zhi, et al.
Published: (2026)
Improved Object-Centric Diffusion Learning with Registers and Contrastive Alignment
by: Nguyen, Bac, et al.
Published: (2026)
by: Nguyen, Bac, et al.
Published: (2026)
FineViT: Progressively Unlocking Fine-Grained Perception with Dense Recaptions
by: Zhao, Peisen, et al.
Published: (2026)
by: Zhao, Peisen, et al.
Published: (2026)
Multimodal Prompt Decoupling Attack on the Safety Filters in Text-to-Image Models
by: Peng, Xingkai, et al.
Published: (2025)
by: Peng, Xingkai, et al.
Published: (2025)
What If We Recaption Billions of Web Images with LLaMA-3?
by: Li, Xianhang, et al.
Published: (2024)
by: Li, Xianhang, et al.
Published: (2024)
Control Color: Multimodal Diffusion-based Interactive Image Colorization
by: Liang, Zhexin, et al.
Published: (2024)
by: Liang, Zhexin, et al.
Published: (2024)
ExPLoRA: Parameter-Efficient Extended Pre-Training to Adapt Vision Transformers under Domain Shifts
by: Khanna, Samar, et al.
Published: (2024)
by: Khanna, Samar, et al.
Published: (2024)
Uncovering the Text Embedding in Text-to-Image Diffusion Models
by: Yu, Hu, et al.
Published: (2024)
by: Yu, Hu, et al.
Published: (2024)
Improving Diffusion-Based Image Synthesis with Context Prediction
by: Yang, Ling, et al.
Published: (2024)
by: Yang, Ling, et al.
Published: (2024)
MUMU: Bootstrapping Multimodal Image Generation from Text-to-Image Data
by: Berman, William, et al.
Published: (2024)
by: Berman, William, et al.
Published: (2024)
Structure-Guided Adversarial Training of Diffusion Models
by: Yang, Ling, et al.
Published: (2024)
by: Yang, Ling, et al.
Published: (2024)
Text Speaks Louder than Vision: ASCII Art Reveals Textual Biases in Vision-Language Models
by: Wang, Zhaochen, et al.
Published: (2025)
by: Wang, Zhaochen, et al.
Published: (2025)
MULTI: Multimodal Understanding Leaderboard with Text and Images
by: Zhu, Zichen, et al.
Published: (2024)
by: Zhu, Zichen, et al.
Published: (2024)
SVGDreamer: Text Guided SVG Generation with Diffusion Model
by: Xing, Ximing, et al.
Published: (2023)
by: Xing, Ximing, et al.
Published: (2023)
DiffusionNFT: Online Diffusion Reinforcement with Forward Process
by: Zheng, Kaiwen, et al.
Published: (2025)
by: Zheng, Kaiwen, et al.
Published: (2025)
Multi-path Exploration and Feedback Adjustment for Text-to-Image Person Retrieval
by: Kang, Bin, et al.
Published: (2024)
by: Kang, Bin, et al.
Published: (2024)
Paired Image Generation with Diffusion-Guided Diffusion Models
by: Zhang, Haoxuan, et al.
Published: (2025)
by: Zhang, Haoxuan, et al.
Published: (2025)
HanDiffuser: Text-to-Image Generation With Realistic Hand Appearances
by: Narasimhaswamy, Supreeth, et al.
Published: (2024)
by: Narasimhaswamy, Supreeth, et al.
Published: (2024)
15M Multimodal Facial Image-Text Dataset
by: Dai, Dawei, et al.
Published: (2024)
by: Dai, Dawei, et al.
Published: (2024)
Similar Items
-
Contextualized Diffusion Models for Text-Guided Image and Video Generation
by: Yang, Ling, et al.
Published: (2024) -
Consistency Flow Matching: Defining Straight Flows with Velocity Consistency
by: Yang, Ling, et al.
Published: (2024) -
RealCompo: Balancing Realism and Compositionality Improves Text-to-Image Diffusion Models
by: Zhang, Xinchen, et al.
Published: (2024) -
DistillKac: Few-Step Image Generation via Damped Wave Equations
by: Han, Weiqiao, et al.
Published: (2025) -
DreamPropeller: Supercharge Text-to-3D Generation with Parallel Sampling
by: Zhou, Linqi, et al.
Published: (2023)