:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Chen, Kai, Xie, Enze, Chen, Zhe, Wang, Yibo, Hong, Lanqing, Li, Zhenguo, Yeung, Dit-Yan
Format:	Preprint
Published:	2023
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2306.04607
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MagicDrive: Street View Generation with Diverse 3D Geometry Control
by: Gao, Ruiyuan, et al.
Published: (2023)

DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception
by: Wang, Yibo, et al.
Published: (2024)

Mixed Autoencoder for Self-supervised Visual Representation Learning
by: Chen, Kai, et al.
Published: (2023)

Implicit Concept Removal of Diffusion Models
by: Liu, Zhili, et al.
Published: (2023)

Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation
by: Gou, Yunhao, et al.
Published: (2024)

Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts
by: Zhao, Yuyang, et al.
Published: (2023)

TrackDiffusion: Tracklet-Conditioned Video Generation via Diffusion Models
by: Li, Pengxiang, et al.
Published: (2023)

Animate124: Animating One Image to 4D Dynamic Scene
by: Zhao, Yuyang, et al.
Published: (2023)

GeoDiffusion: A Training-Free Framework for Accurate 3D Geometric Conditioning in Image Generation
by: Mueller, Phillip, et al.
Published: (2025)

GeoDiffuser: Geometry-Based Image Editing with Diffusion Models
by: Sajnani, Rahul, et al.
Published: (2024)

MagicDrive-V2: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control
by: Gao, Ruiyuan, et al.
Published: (2024)

CoherenDream: Boosting Holistic Text Coherence in 3D Generation via Multimodal Large Language Models Feedback
by: Jiang, Chenhan, et al.
Published: (2025)

MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes
by: Gao, Ruiyuan, et al.
Published: (2024)

Mixture of Cluster-conditional LoRA Experts for Vision-language Instruction Tuning
by: Gou, Yunhao, et al.
Published: (2023)

TransformMix: Learning Transformation and Mixing Strategies from Data
by: Cheung, Tsz-Him, et al.
Published: (2024)

Automated Evaluation of Large Vision-Language Models on Self-driving Corner Cases
by: Chen, Kai, et al.
Published: (2024)

Learning High-resolution Vector Representation from Multi-Camera Images for 3D Object Detection
by: Chen, Zhili, et al.
Published: (2024)

Editing Massive Concepts in Text-to-Image Diffusion Models
by: Xiong, Tianwei, et al.
Published: (2024)

T2I-CompBench++: An Enhanced and Comprehensive Benchmark for Compositional Text-to-image Generation
by: Huang, Kaiyi, et al.
Published: (2023)

PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
by: Chen, Junsong, et al.
Published: (2024)

Divide and Conquer: Language Models can Plan and Self-Correct for Compositional Text-to-Image Generation
by: Wang, Zhenyu, et al.
Published: (2024)

Unified Triplet-Level Hallucination Evaluation for Large Vision-Language Models
by: Wu, Junjie, et al.
Published: (2024)

JointDreamer: Ensuring Geometry Consistency and Text Congruence in Text-to-3D Generation via Joint Score Distillation
by: Jiang, Chenhan, et al.
Published: (2024)

PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models
by: Chen, Junsong, et al.
Published: (2024)

CVT-xRF: Contrastive In-Voxel Transformer for 3D Consistent Radiance Fields from Sparse Inputs
by: Zhong, Yingji, et al.
Published: (2024)

SplatMesh: Interactive 3D Segmentation and Editing Using Mesh-Based Gaussian Splatting
by: Zhou, Kaichen, et al.
Published: (2023)

PixArt-$α$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
by: Chen, Junsong, et al.
Published: (2023)

Task-customized Masked AutoEncoder via Mixture of Cluster-conditional Experts
by: Liu, Zhili, et al.
Published: (2024)

Accelerating Diffusion Sampling with Optimized Time Steps
by: Xue, Shuchen, et al.
Published: (2024)

Reasoning-Aligned Perception Decoupling for Scalable Multi-modal Reasoning
by: Gou, Yunhao, et al.
Published: (2025)

FreeScale: Scaling 3D Scenes via Certainty-Aware Free-View Generation
by: Jiang, Chenhan, et al.
Published: (2026)

G-NAS: Generalizable Neural Architecture Search for Single Domain Generalization Object Detection
by: Wu, Fan, et al.
Published: (2024)

Open-Vocabulary Object Detection with Meta Prompt Representation and Instance Contrastive Optimization
by: Wang, Zhao, et al.
Published: (2024)

RTGen: Generating Region-Text Pairs for Open-Vocabulary Object Detection
by: Chen, Fangyi, et al.
Published: (2024)

DriveCoT: Integrating Chain-of-Thought Reasoning with End-to-End Driving
by: Wang, Tianqi, et al.
Published: (2024)

Corrupted but Not Broken: Understanding and Mitigating the Negative Impacts of Corrupted Data in Visual Instruction Tuning
by: Gou, Yunhao, et al.
Published: (2025)

Char-SAM: Turning Segment Anything Model into Scene Text Segmentation Annotator with Character-level Visual Prompts
by: Xie, Enze, et al.
Published: (2024)

Empowering Sparse-Input Neural Radiance Fields with Dual-Level Semantic Guidance from Dense Novel Views
by: Zhong, Yingji, et al.
Published: (2025)

VGGT-Det: Mining VGGT Internal Priors for Sensor-Geometry-Free Multi-View Indoor 3D Object Detection
by: Cao, Yang, et al.
Published: (2026)

SCOUT: Semi-supervised Camouflaged Object Detection by Utilizing Text and Adaptive Data Selection
by: Yan, Weiqi, et al.
Published: (2025)