Saved in:
| Main Authors: | Li, Leheng, Qiu, Weichao, Yan, Xu, He, Jing, Zhou, Kaiqiang, Cai, Yingjie, Lian, Qing, Liu, Bingbing, Chen, Ying-Cong |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2410.04932 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SyntheOcc: Synthesize Geometric-Controlled Street View Images through 3D Semantic MPIs
by: Li, Leheng, et al.
Published: (2024)
by: Li, Leheng, et al.
Published: (2024)
Adv3D: Generating 3D Adversarial Examples for 3D Object Detection in Driving Scenarios with NeRF
by: Li, Leheng, et al.
Published: (2023)
by: Li, Leheng, et al.
Published: (2023)
DisEnvisioner: Disentangled and Enriched Visual Prompt for Customized Image Generation
by: He, Jing, et al.
Published: (2024)
by: He, Jing, et al.
Published: (2024)
Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction
by: He, Jing, et al.
Published: (2024)
by: He, Jing, et al.
Published: (2024)
StyleBooth: Image Style Editing with Multimodal Instruction
by: Han, Zhen, et al.
Published: (2024)
by: Han, Zhen, et al.
Published: (2024)
GroundingBooth: Grounding Text-to-Image Customization
by: Xiong, Zhexiao, et al.
Published: (2024)
by: Xiong, Zhexiao, et al.
Published: (2024)
InstructBooth: Instruction-following Personalized Text-to-Image Generation
by: Chae, Daewon, et al.
Published: (2023)
by: Chae, Daewon, et al.
Published: (2023)
PhotoFramer: Multi-modal Image Composition Instruction
by: You, Zhiyuan, et al.
Published: (2025)
by: You, Zhiyuan, et al.
Published: (2025)
SPOT-Occ: Sparse Prototype-guided Transformer for Camera-based 3D Occupancy Prediction
by: Chen, Suzeyu, et al.
Published: (2026)
by: Chen, Suzeyu, et al.
Published: (2026)
Forging Vision Foundation Models for Autonomous Driving: Challenges, Methodologies, and Opportunities
by: Yan, Xu, et al.
Published: (2024)
by: Yan, Xu, et al.
Published: (2024)
Statistical inference for high-dimensional convoluted rank regression
by: Cai, Leheng, et al.
Published: (2024)
by: Cai, Leheng, et al.
Published: (2024)
OmniColor: A Unified Framework for Multi-modal Lineart Colorization
by: Zhang, Xulu, et al.
Published: (2026)
by: Zhang, Xulu, et al.
Published: (2026)
RoboOmni: Proactive Robot Manipulation in Omni-modal Context
by: Wang, Siyin, et al.
Published: (2025)
by: Wang, Siyin, et al.
Published: (2025)
MultiBooth: Towards Generating All Your Concepts in an Image from Text
by: Zhu, Chenyang, et al.
Published: (2024)
by: Zhu, Chenyang, et al.
Published: (2024)
An Efficient Occupancy World Model via Decoupled Dynamic Flow and Image-assisted Training
by: Zhang, Haiming, et al.
Published: (2024)
by: Zhang, Haiming, et al.
Published: (2024)
DreamVideo-Omni: Omni-Motion Controlled Multi-Subject Video Customization with Latent Identity Reinforcement Learning
by: Wei, Yujie, et al.
Published: (2026)
by: Wei, Yujie, et al.
Published: (2026)
Efficient Depth-Guided Urban View Synthesis
by: Miao, Sheng, et al.
Published: (2024)
by: Miao, Sheng, et al.
Published: (2024)
Omni-DuplexEval: Evaluating Real-time Duplex Omni-modal Interaction
by: He, Chaoqun, et al.
Published: (2026)
by: He, Chaoqun, et al.
Published: (2026)
DLWM: Dual Latent World Models enable Holistic Gaussian-centric Pre-training in Autonomous Driving
by: Zhu, Yiyao, et al.
Published: (2026)
by: Zhu, Yiyao, et al.
Published: (2026)
Multi-level Cross-modal Alignment for Image Clustering
by: Qiu, Liping, et al.
Published: (2024)
by: Qiu, Liping, et al.
Published: (2024)
From Sparse to Dense Functional Data: Phase Transitions from a Simultaneous Inference Perspective
by: Cai, Leheng, et al.
Published: (2024)
by: Cai, Leheng, et al.
Published: (2024)
From sparse to dense functional time series: phase transitions of detecting structural breaks and beyond
by: Cai, Leheng, et al.
Published: (2024)
by: Cai, Leheng, et al.
Published: (2024)
OmniEval: A Benchmark for Evaluating Omni-modal Models with Visual, Auditory, and Textual Inputs
by: Zhang, Yiman, et al.
Published: (2025)
by: Zhang, Yiman, et al.
Published: (2025)
Explore the Limits of Omni-modal Pretraining at Scale
by: Zhang, Yiyuan, et al.
Published: (2024)
by: Zhang, Yiyuan, et al.
Published: (2024)
OmniOCR: Generalist OCR for Ethnic Minority Languages
by: Liu, Bonan, et al.
Published: (2026)
by: Liu, Bonan, et al.
Published: (2026)
InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue
by: Tong, Wenwen, et al.
Published: (2025)
by: Tong, Wenwen, et al.
Published: (2025)
Omni-Fusion of Spatial and Spectral for Hyperspectral Image Segmentation
by: Zhang, Qing, et al.
Published: (2025)
by: Zhang, Qing, et al.
Published: (2025)
AttnDreamBooth: Towards Text-Aligned Personalized Text-to-Image Generation
by: Pang, Lianyu, et al.
Published: (2024)
by: Pang, Lianyu, et al.
Published: (2024)
UNO-Bench: A Unified Benchmark for Exploring the Compositional Law Between Uni-modal and Omni-modal in Omni Models
by: Chen, Chen, et al.
Published: (2025)
by: Chen, Chen, et al.
Published: (2025)
T-Rex-Omni: Integrating Negative Visual Prompt in Generic Object Detection
by: Zhou, Jiazhou, et al.
Published: (2025)
by: Zhou, Jiazhou, et al.
Published: (2025)
High-Frequency Anti-DreamBooth: Robust Defense against Personalized Image Synthesis
by: Onikubo, Takuto, et al.
Published: (2024)
by: Onikubo, Takuto, et al.
Published: (2024)
ViT-Lens: Towards Omni-modal Representations
by: Lei, Weixian, et al.
Published: (2023)
by: Lei, Weixian, et al.
Published: (2023)
Multi-modal MRI-Based Alzheimer's Disease Diagnosis with Transformer-based Image Synthesis and Transfer Learning
by: Qiu, Jason
Published: (2026)
by: Qiu, Jason
Published: (2026)
GuardReasoner-Omni: A Reasoning-based Multi-modal Guardrail for Text, Image, Video, and Audio
by: Zhu, Zhenhao, et al.
Published: (2026)
by: Zhu, Zhenhao, et al.
Published: (2026)
Coherent and Multi-modality Image Inpainting via Latent Space Optimization
by: Pan, Lingzhi, et al.
Published: (2024)
by: Pan, Lingzhi, et al.
Published: (2024)
GeoMM: On Geodesic Perspective for Multi-modal Learning
by: Mei, Shibin, et al.
Published: (2025)
by: Mei, Shibin, et al.
Published: (2025)
OmniScience: A Large-scale Multi-modal Dataset for Scientific Image Understanding
by: Tao, Haoyi, et al.
Published: (2026)
by: Tao, Haoyi, et al.
Published: (2026)
Calvia flaveola Booth
by: POORANI, J.
Published: (2023)
by: POORANI, J.
Published: (2023)
Audio-Omni: Extending Multi-modal Understanding to Versatile Audio Generation and Editing
by: Tian, Zeyue, et al.
Published: (2026)
by: Tian, Zeyue, et al.
Published: (2026)
Instruct-Imagen: Image Generation with Multi-modal Instruction
by: Hu, Hexiang, et al.
Published: (2024)
by: Hu, Hexiang, et al.
Published: (2024)
Similar Items
-
SyntheOcc: Synthesize Geometric-Controlled Street View Images through 3D Semantic MPIs
by: Li, Leheng, et al.
Published: (2024) -
Adv3D: Generating 3D Adversarial Examples for 3D Object Detection in Driving Scenarios with NeRF
by: Li, Leheng, et al.
Published: (2023) -
DisEnvisioner: Disentangled and Enriched Visual Prompt for Customized Image Generation
by: He, Jing, et al.
Published: (2024) -
Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction
by: He, Jing, et al.
Published: (2024) -
StyleBooth: Image Style Editing with Multimodal Instruction
by: Han, Zhen, et al.
Published: (2024)