:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Li, Leheng, Qiu, Weichao, Yan, Xu, He, Jing, Zhou, Kaiqiang, Cai, Yingjie, Lian, Qing, Liu, Bingbing, Chen, Ying-Cong
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2410.04932
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

SyntheOcc: Synthesize Geometric-Controlled Street View Images through 3D Semantic MPIs
by: Li, Leheng, et al.
Published: (2024)

Adv3D: Generating 3D Adversarial Examples for 3D Object Detection in Driving Scenarios with NeRF
by: Li, Leheng, et al.
Published: (2023)

DisEnvisioner: Disentangled and Enriched Visual Prompt for Customized Image Generation
by: He, Jing, et al.
Published: (2024)

Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction
by: He, Jing, et al.
Published: (2024)

StyleBooth: Image Style Editing with Multimodal Instruction
by: Han, Zhen, et al.
Published: (2024)

GroundingBooth: Grounding Text-to-Image Customization
by: Xiong, Zhexiao, et al.
Published: (2024)

InstructBooth: Instruction-following Personalized Text-to-Image Generation
by: Chae, Daewon, et al.
Published: (2023)

PhotoFramer: Multi-modal Image Composition Instruction
by: You, Zhiyuan, et al.
Published: (2025)

SPOT-Occ: Sparse Prototype-guided Transformer for Camera-based 3D Occupancy Prediction
by: Chen, Suzeyu, et al.
Published: (2026)

Forging Vision Foundation Models for Autonomous Driving: Challenges, Methodologies, and Opportunities
by: Yan, Xu, et al.
Published: (2024)

Statistical inference for high-dimensional convoluted rank regression
by: Cai, Leheng, et al.
Published: (2024)

OmniColor: A Unified Framework for Multi-modal Lineart Colorization
by: Zhang, Xulu, et al.
Published: (2026)

RoboOmni: Proactive Robot Manipulation in Omni-modal Context
by: Wang, Siyin, et al.
Published: (2025)

MultiBooth: Towards Generating All Your Concepts in an Image from Text
by: Zhu, Chenyang, et al.
Published: (2024)

An Efficient Occupancy World Model via Decoupled Dynamic Flow and Image-assisted Training
by: Zhang, Haiming, et al.
Published: (2024)

DreamVideo-Omni: Omni-Motion Controlled Multi-Subject Video Customization with Latent Identity Reinforcement Learning
by: Wei, Yujie, et al.
Published: (2026)

Efficient Depth-Guided Urban View Synthesis
by: Miao, Sheng, et al.
Published: (2024)

Omni-DuplexEval: Evaluating Real-time Duplex Omni-modal Interaction
by: He, Chaoqun, et al.
Published: (2026)

DLWM: Dual Latent World Models enable Holistic Gaussian-centric Pre-training in Autonomous Driving
by: Zhu, Yiyao, et al.
Published: (2026)

Multi-level Cross-modal Alignment for Image Clustering
by: Qiu, Liping, et al.
Published: (2024)

From Sparse to Dense Functional Data: Phase Transitions from a Simultaneous Inference Perspective
by: Cai, Leheng, et al.
Published: (2024)

From sparse to dense functional time series: phase transitions of detecting structural breaks and beyond
by: Cai, Leheng, et al.
Published: (2024)

OmniEval: A Benchmark for Evaluating Omni-modal Models with Visual, Auditory, and Textual Inputs
by: Zhang, Yiman, et al.
Published: (2025)

Explore the Limits of Omni-modal Pretraining at Scale
by: Zhang, Yiyuan, et al.
Published: (2024)

OmniOCR: Generalist OCR for Ethnic Minority Languages
by: Liu, Bonan, et al.
Published: (2026)

InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue
by: Tong, Wenwen, et al.
Published: (2025)

Omni-Fusion of Spatial and Spectral for Hyperspectral Image Segmentation
by: Zhang, Qing, et al.
Published: (2025)

AttnDreamBooth: Towards Text-Aligned Personalized Text-to-Image Generation
by: Pang, Lianyu, et al.
Published: (2024)

UNO-Bench: A Unified Benchmark for Exploring the Compositional Law Between Uni-modal and Omni-modal in Omni Models
by: Chen, Chen, et al.
Published: (2025)

T-Rex-Omni: Integrating Negative Visual Prompt in Generic Object Detection
by: Zhou, Jiazhou, et al.
Published: (2025)

High-Frequency Anti-DreamBooth: Robust Defense against Personalized Image Synthesis
by: Onikubo, Takuto, et al.
Published: (2024)

ViT-Lens: Towards Omni-modal Representations
by: Lei, Weixian, et al.
Published: (2023)

Multi-modal MRI-Based Alzheimer's Disease Diagnosis with Transformer-based Image Synthesis and Transfer Learning
by: Qiu, Jason
Published: (2026)

GuardReasoner-Omni: A Reasoning-based Multi-modal Guardrail for Text, Image, Video, and Audio
by: Zhu, Zhenhao, et al.
Published: (2026)

Coherent and Multi-modality Image Inpainting via Latent Space Optimization
by: Pan, Lingzhi, et al.
Published: (2024)

GeoMM: On Geodesic Perspective for Multi-modal Learning
by: Mei, Shibin, et al.
Published: (2025)

OmniScience: A Large-scale Multi-modal Dataset for Scientific Image Understanding
by: Tao, Haoyi, et al.
Published: (2026)

Calvia flaveola Booth
by: POORANI, J.
Published: (2023)

Audio-Omni: Extending Multi-modal Understanding to Versatile Audio Generation and Editing
by: Tian, Zeyue, et al.
Published: (2026)

Instruct-Imagen: Image Generation with Multi-modal Instruction
by: Hu, Hexiang, et al.
Published: (2024)