Saved in:
| Main Authors: | Lei, Chenyang, Chen, Liyi, Cen, Jun, Chen, Xiao, Lei, Zhen, Heide, Felix, Liu, Ziwei, Chen, Qifeng, Zhang, Zhaoxiang |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2409.08083 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SimCMF: A Simple Cross-modal Fine-tuning Strategy from Vision Foundation Models to Any Imaging Modality
by: Lei, Chenyang, et al.
Published: (2024)
by: Lei, Chenyang, et al.
Published: (2024)
Robust Depth Enhancement via Polarization Prompt Fusion Tuning
by: Ikemura, Kei, et al.
Published: (2024)
by: Ikemura, Kei, et al.
Published: (2024)
General Geometry-aware Weakly Supervised 3D Object Detection
by: Zhang, Guowen, et al.
Published: (2024)
by: Zhang, Guowen, et al.
Published: (2024)
FIRM: Flexible Interactive Reflection reMoval
by: Chen, Xiao, et al.
Published: (2024)
by: Chen, Xiao, et al.
Published: (2024)
Automatic Controllable Colorization via Imagination
by: Cong, Xiaoyan, et al.
Published: (2024)
by: Cong, Xiaoyan, et al.
Published: (2024)
Adaptive Domain Learning for Cross-domain Image Denoising
by: Qian, Zian, et al.
Published: (2024)
by: Qian, Zian, et al.
Published: (2024)
BEVDilation: LiDAR-Centric Multi-Modal Fusion for 3D Object Detection
by: Zhang, Guowen, et al.
Published: (2025)
by: Zhang, Guowen, et al.
Published: (2025)
Instruction-based Image Editing with Planning, Reasoning, and Generation
by: Ji, Liya, et al.
Published: (2026)
by: Ji, Liya, et al.
Published: (2026)
FreeTuner: Any Subject in Any Style with Training-free Diffusion
by: Xu, Youcan, et al.
Published: (2024)
by: Xu, Youcan, et al.
Published: (2024)
Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-Concept Alignment and Retention
by: Chen, Zuyao, et al.
Published: (2023)
by: Chen, Zuyao, et al.
Published: (2023)
GPT4SGG: Synthesizing Scene Graphs from Holistic and Region-specific Narratives
by: Chen, Zuyao, et al.
Published: (2023)
by: Chen, Zuyao, et al.
Published: (2023)
Polarization Wavefront Lidar: Learning Large Scene Reconstruction from Polarized Wavefronts
by: Scheuble, Dominik, et al.
Published: (2024)
by: Scheuble, Dominik, et al.
Published: (2024)
Generalized and Efficient 2D Gaussian Splatting for Arbitrary-scale Super-Resolution
by: Chen, Du, et al.
Published: (2025)
by: Chen, Du, et al.
Published: (2025)
Using Left and Right Brains Together: Towards Vision and Language Planning
by: Cen, Jun, et al.
Published: (2024)
by: Cen, Jun, et al.
Published: (2024)
MDReID: Modality-Decoupled Learning for Any-to-Any Multi-Modal Object Re-Identification
by: Feng, Yingying, et al.
Published: (2025)
by: Feng, Yingying, et al.
Published: (2025)
PAM: A Propagation-Based Model for Segmenting Any 3D Objects across Multi-Modal Medical Images
by: Chen, Zifan, et al.
Published: (2024)
by: Chen, Zifan, et al.
Published: (2024)
Improving Large Vision-Language Models' Understanding for Flow Field Data
by: Zhang, Xiaomei, et al.
Published: (2025)
by: Zhang, Xiaomei, et al.
Published: (2025)
Adapting Vision Foundation Models for Real-time Ultrasound Image Segmentation
by: Zhang, Xiaoran, et al.
Published: (2025)
by: Zhang, Xiaoran, et al.
Published: (2025)
All-in-One: Transferring Vision Foundation Models into Stereo Matching
by: Zhou, Jingyi, et al.
Published: (2024)
by: Zhou, Jingyi, et al.
Published: (2024)
DiT4Edit: Diffusion Transformer for Image Editing
by: Feng, Kunyu, et al.
Published: (2024)
by: Feng, Kunyu, et al.
Published: (2024)
CoCoEdit: Content-Consistent Image Editing via Region Regularized Reinforcement Learning
by: Wu, Yuhui, et al.
Published: (2026)
by: Wu, Yuhui, et al.
Published: (2026)
Transferability Bound Theory: Exploring Relationship between Adversarial Transferability and Flatness
by: Fan, Mingyuan, et al.
Published: (2023)
by: Fan, Mingyuan, et al.
Published: (2023)
4D Panoptic Scene Graph Generation
by: Yang, Jingkang, et al.
Published: (2024)
by: Yang, Jingkang, et al.
Published: (2024)
One2Scene: Geometric Consistent Explorable 3D Scene Generation from a Single Image
by: Wang, Pengfei, et al.
Published: (2026)
by: Wang, Pengfei, et al.
Published: (2026)
4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities
by: Bachmann, Roman, et al.
Published: (2024)
by: Bachmann, Roman, et al.
Published: (2024)
AnyMo: Scaling Any-Modality Conditional Motion Generation with Masked Modeling
by: Li, Yiheng, et al.
Published: (2026)
by: Li, Yiheng, et al.
Published: (2026)
Generative Active Learning for Image Synthesis Personalization
by: Zhang, Xulu, et al.
Published: (2024)
by: Zhang, Xulu, et al.
Published: (2024)
Voxel Mamba: Group-Free State Space Models for Point Cloud based 3D Object Detection
by: Zhang, Guowen, et al.
Published: (2024)
by: Zhang, Guowen, et al.
Published: (2024)
RSRefSeg 2: Decoupling Referring Remote Sensing Image Segmentation with Foundation Models
by: Chen, Keyan, et al.
Published: (2025)
by: Chen, Keyan, et al.
Published: (2025)
Cross-Modal Obfuscation for Jailbreak Attacks on Large Vision-Language Models
by: Jiang, Lei, et al.
Published: (2025)
by: Jiang, Lei, et al.
Published: (2025)
RSRefSeg: Referring Remote Sensing Image Segmentation with Foundation Models
by: Chen, Keyan, et al.
Published: (2025)
by: Chen, Keyan, et al.
Published: (2025)
Top-Down Guidance for Learning Object-Centric Representations
by: Zou, Junhong, et al.
Published: (2024)
by: Zou, Junhong, et al.
Published: (2024)
Scaffolding Coordinates to Promote Vision-Language Coordination in Large Multi-Modal Models
by: Lei, Xuanyu, et al.
Published: (2024)
by: Lei, Xuanyu, et al.
Published: (2024)
DiffSpeaker: Speech-Driven 3D Facial Animation with Diffusion Transformer
by: Ma, Zhiyuan, et al.
Published: (2024)
by: Ma, Zhiyuan, et al.
Published: (2024)
MoKus: Leveraging Cross-Modal Knowledge Transfer for Knowledge-Aware Concept Customization
by: Zhu, Chenyang, et al.
Published: (2026)
by: Zhu, Chenyang, et al.
Published: (2026)
Fast Multi-view Consistent 3D Editing with Video Priors
by: Chen, Liyi, et al.
Published: (2025)
by: Chen, Liyi, et al.
Published: (2025)
Omni-3DEdit: Generalized Versatile 3D Editing in One-Pass
by: Liyi, Chen, et al.
Published: (2026)
by: Liyi, Chen, et al.
Published: (2026)
Segment Any 3D Gaussians
by: Cen, Jiazhong, et al.
Published: (2023)
by: Cen, Jiazhong, et al.
Published: (2023)
Seek for Incantations: Towards Accurate Text-to-Image Diffusion Synthesis through Prompt Engineering
by: Yu, Chang, et al.
Published: (2024)
by: Yu, Chang, et al.
Published: (2024)
SimMLM: A Simple Framework for Multi-modal Learning with Missing Modality
by: Li, Sijie, et al.
Published: (2025)
by: Li, Sijie, et al.
Published: (2025)
Similar Items
-
SimCMF: A Simple Cross-modal Fine-tuning Strategy from Vision Foundation Models to Any Imaging Modality
by: Lei, Chenyang, et al.
Published: (2024) -
Robust Depth Enhancement via Polarization Prompt Fusion Tuning
by: Ikemura, Kei, et al.
Published: (2024) -
General Geometry-aware Weakly Supervised 3D Object Detection
by: Zhang, Guowen, et al.
Published: (2024) -
FIRM: Flexible Interactive Reflection reMoval
by: Chen, Xiao, et al.
Published: (2024) -
Automatic Controllable Colorization via Imagination
by: Cong, Xiaoyan, et al.
Published: (2024)