Saved in:
| Main Authors: | Chen, Kai, Xie, Enze, Chen, Zhe, Wang, Yibo, Hong, Lanqing, Li, Zhenguo, Yeung, Dit-Yan |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2306.04607 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MagicDrive: Street View Generation with Diverse 3D Geometry Control
by: Gao, Ruiyuan, et al.
Published: (2023)
by: Gao, Ruiyuan, et al.
Published: (2023)
DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception
by: Wang, Yibo, et al.
Published: (2024)
by: Wang, Yibo, et al.
Published: (2024)
Mixed Autoencoder for Self-supervised Visual Representation Learning
by: Chen, Kai, et al.
Published: (2023)
by: Chen, Kai, et al.
Published: (2023)
Implicit Concept Removal of Diffusion Models
by: Liu, Zhili, et al.
Published: (2023)
by: Liu, Zhili, et al.
Published: (2023)
Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation
by: Gou, Yunhao, et al.
Published: (2024)
by: Gou, Yunhao, et al.
Published: (2024)
Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts
by: Zhao, Yuyang, et al.
Published: (2023)
by: Zhao, Yuyang, et al.
Published: (2023)
TrackDiffusion: Tracklet-Conditioned Video Generation via Diffusion Models
by: Li, Pengxiang, et al.
Published: (2023)
by: Li, Pengxiang, et al.
Published: (2023)
Animate124: Animating One Image to 4D Dynamic Scene
by: Zhao, Yuyang, et al.
Published: (2023)
by: Zhao, Yuyang, et al.
Published: (2023)
GeoDiffusion: A Training-Free Framework for Accurate 3D Geometric Conditioning in Image Generation
by: Mueller, Phillip, et al.
Published: (2025)
by: Mueller, Phillip, et al.
Published: (2025)
GeoDiffuser: Geometry-Based Image Editing with Diffusion Models
by: Sajnani, Rahul, et al.
Published: (2024)
by: Sajnani, Rahul, et al.
Published: (2024)
MagicDrive-V2: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control
by: Gao, Ruiyuan, et al.
Published: (2024)
by: Gao, Ruiyuan, et al.
Published: (2024)
CoherenDream: Boosting Holistic Text Coherence in 3D Generation via Multimodal Large Language Models Feedback
by: Jiang, Chenhan, et al.
Published: (2025)
by: Jiang, Chenhan, et al.
Published: (2025)
MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes
by: Gao, Ruiyuan, et al.
Published: (2024)
by: Gao, Ruiyuan, et al.
Published: (2024)
Mixture of Cluster-conditional LoRA Experts for Vision-language Instruction Tuning
by: Gou, Yunhao, et al.
Published: (2023)
by: Gou, Yunhao, et al.
Published: (2023)
TransformMix: Learning Transformation and Mixing Strategies from Data
by: Cheung, Tsz-Him, et al.
Published: (2024)
by: Cheung, Tsz-Him, et al.
Published: (2024)
Automated Evaluation of Large Vision-Language Models on Self-driving Corner Cases
by: Chen, Kai, et al.
Published: (2024)
by: Chen, Kai, et al.
Published: (2024)
Learning High-resolution Vector Representation from Multi-Camera Images for 3D Object Detection
by: Chen, Zhili, et al.
Published: (2024)
by: Chen, Zhili, et al.
Published: (2024)
Editing Massive Concepts in Text-to-Image Diffusion Models
by: Xiong, Tianwei, et al.
Published: (2024)
by: Xiong, Tianwei, et al.
Published: (2024)
T2I-CompBench++: An Enhanced and Comprehensive Benchmark for Compositional Text-to-image Generation
by: Huang, Kaiyi, et al.
Published: (2023)
by: Huang, Kaiyi, et al.
Published: (2023)
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
by: Chen, Junsong, et al.
Published: (2024)
by: Chen, Junsong, et al.
Published: (2024)
Divide and Conquer: Language Models can Plan and Self-Correct for Compositional Text-to-Image Generation
by: Wang, Zhenyu, et al.
Published: (2024)
by: Wang, Zhenyu, et al.
Published: (2024)
Unified Triplet-Level Hallucination Evaluation for Large Vision-Language Models
by: Wu, Junjie, et al.
Published: (2024)
by: Wu, Junjie, et al.
Published: (2024)
JointDreamer: Ensuring Geometry Consistency and Text Congruence in Text-to-3D Generation via Joint Score Distillation
by: Jiang, Chenhan, et al.
Published: (2024)
by: Jiang, Chenhan, et al.
Published: (2024)
PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models
by: Chen, Junsong, et al.
Published: (2024)
by: Chen, Junsong, et al.
Published: (2024)
CVT-xRF: Contrastive In-Voxel Transformer for 3D Consistent Radiance Fields from Sparse Inputs
by: Zhong, Yingji, et al.
Published: (2024)
by: Zhong, Yingji, et al.
Published: (2024)
SplatMesh: Interactive 3D Segmentation and Editing Using Mesh-Based Gaussian Splatting
by: Zhou, Kaichen, et al.
Published: (2023)
by: Zhou, Kaichen, et al.
Published: (2023)
PixArt-$α$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
by: Chen, Junsong, et al.
Published: (2023)
by: Chen, Junsong, et al.
Published: (2023)
Task-customized Masked AutoEncoder via Mixture of Cluster-conditional Experts
by: Liu, Zhili, et al.
Published: (2024)
by: Liu, Zhili, et al.
Published: (2024)
Accelerating Diffusion Sampling with Optimized Time Steps
by: Xue, Shuchen, et al.
Published: (2024)
by: Xue, Shuchen, et al.
Published: (2024)
Reasoning-Aligned Perception Decoupling for Scalable Multi-modal Reasoning
by: Gou, Yunhao, et al.
Published: (2025)
by: Gou, Yunhao, et al.
Published: (2025)
FreeScale: Scaling 3D Scenes via Certainty-Aware Free-View Generation
by: Jiang, Chenhan, et al.
Published: (2026)
by: Jiang, Chenhan, et al.
Published: (2026)
G-NAS: Generalizable Neural Architecture Search for Single Domain Generalization Object Detection
by: Wu, Fan, et al.
Published: (2024)
by: Wu, Fan, et al.
Published: (2024)
Open-Vocabulary Object Detection with Meta Prompt Representation and Instance Contrastive Optimization
by: Wang, Zhao, et al.
Published: (2024)
by: Wang, Zhao, et al.
Published: (2024)
RTGen: Generating Region-Text Pairs for Open-Vocabulary Object Detection
by: Chen, Fangyi, et al.
Published: (2024)
by: Chen, Fangyi, et al.
Published: (2024)
DriveCoT: Integrating Chain-of-Thought Reasoning with End-to-End Driving
by: Wang, Tianqi, et al.
Published: (2024)
by: Wang, Tianqi, et al.
Published: (2024)
Corrupted but Not Broken: Understanding and Mitigating the Negative Impacts of Corrupted Data in Visual Instruction Tuning
by: Gou, Yunhao, et al.
Published: (2025)
by: Gou, Yunhao, et al.
Published: (2025)
Char-SAM: Turning Segment Anything Model into Scene Text Segmentation Annotator with Character-level Visual Prompts
by: Xie, Enze, et al.
Published: (2024)
by: Xie, Enze, et al.
Published: (2024)
Empowering Sparse-Input Neural Radiance Fields with Dual-Level Semantic Guidance from Dense Novel Views
by: Zhong, Yingji, et al.
Published: (2025)
by: Zhong, Yingji, et al.
Published: (2025)
VGGT-Det: Mining VGGT Internal Priors for Sensor-Geometry-Free Multi-View Indoor 3D Object Detection
by: Cao, Yang, et al.
Published: (2026)
by: Cao, Yang, et al.
Published: (2026)
SCOUT: Semi-supervised Camouflaged Object Detection by Utilizing Text and Adaptive Data Selection
by: Yan, Weiqi, et al.
Published: (2025)
by: Yan, Weiqi, et al.
Published: (2025)
Similar Items
-
MagicDrive: Street View Generation with Diverse 3D Geometry Control
by: Gao, Ruiyuan, et al.
Published: (2023) -
DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception
by: Wang, Yibo, et al.
Published: (2024) -
Mixed Autoencoder for Self-supervised Visual Representation Learning
by: Chen, Kai, et al.
Published: (2023) -
Implicit Concept Removal of Diffusion Models
by: Liu, Zhili, et al.
Published: (2023) -
Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation
by: Gou, Yunhao, et al.
Published: (2024)