:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Lei, Chenyang, Chen, Liyi, Cen, Jun, Chen, Xiao, Lei, Zhen, Heide, Felix, Chen, Qifeng, Zhang, Zhaoxiang
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2411.18669
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

SimMAT: Exploring Transferability from Vision Foundation Models to Any Image Modality
by: Lei, Chenyang, et al.
Published: (2024)

Robust Depth Enhancement via Polarization Prompt Fusion Tuning
by: Ikemura, Kei, et al.
Published: (2024)

General Geometry-aware Weakly Supervised 3D Object Detection
by: Zhang, Guowen, et al.
Published: (2024)

SimMLM: A Simple Framework for Multi-modal Learning with Missing Modality
by: Li, Sijie, et al.
Published: (2025)

FIRM: Flexible Interactive Reflection reMoval
by: Chen, Xiao, et al.
Published: (2024)

Adaptive Domain Learning for Cross-domain Image Denoising
by: Qian, Zian, et al.
Published: (2024)

Automatic Controllable Colorization via Imagination
by: Cong, Xiaoyan, et al.
Published: (2024)

BEVDilation: LiDAR-Centric Multi-Modal Fusion for 3D Object Detection
by: Zhang, Guowen, et al.
Published: (2025)

FreeTuner: Any Subject in Any Style with Training-free Diffusion
by: Xu, Youcan, et al.
Published: (2024)

Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-Concept Alignment and Retention
by: Chen, Zuyao, et al.
Published: (2023)

GPT4SGG: Synthesizing Scene Graphs from Holistic and Region-specific Narratives
by: Chen, Zuyao, et al.
Published: (2023)

Polarization Wavefront Lidar: Learning Large Scene Reconstruction from Polarized Wavefronts
by: Scheuble, Dominik, et al.
Published: (2024)

TaskGalaxy: Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types
by: Chen, Jiankang, et al.
Published: (2025)

Generalized and Efficient 2D Gaussian Splatting for Arbitrary-scale Super-Resolution
by: Chen, Du, et al.
Published: (2025)

Simple bots breed social punishment in humans
by: Shen, Chen, et al.
Published: (2022)

A Few-Shot Metric Learning Method with Dual-Channel Attention for Cross-Modal Same-Neuron Identification
by: Li, Wenwei, et al.
Published: (2025)

Large Motion Video Autoencoding with Cross-modal Video VAE
by: Xing, Yazhou, et al.
Published: (2024)

Cross-Modal Obfuscation for Jailbreak Attacks on Large Vision-Language Models
by: Jiang, Lei, et al.
Published: (2025)

Instruction-based Image Editing with Planning, Reasoning, and Generation
by: Ji, Liya, et al.
Published: (2026)

Unleashing the Power of Imbalanced Modality Information for Multi-modal Knowledge Graph Completion
by: Zhang, Yichi, et al.
Published: (2024)

Cross-Organ and Cross-Scanner Adenocarcinoma Segmentation using Rein to Fine-tune Vision Foundation Models
by: Cai, Pengzhou, et al.
Published: (2024)

Using Left and Right Brains Together: Towards Vision and Language Planning
by: Cen, Jun, et al.
Published: (2024)

Data Selection for Fine-tuning Vision Language Models via Cross Modal Alignment Trajectories
by: Naharas, Nilay, et al.
Published: (2025)

Deep Class-guided Hashing for Multi-label Cross-modal Retrieval
by: Chen, Hao, et al.
Published: (2024)

Why LLM Safety Guardrails Collapse After Fine-tuning: A Similarity Analysis Between Alignment and Fine-tuning Datasets
by: Hsiung, Lei, et al.
Published: (2025)

AnyECG-Lab: An Exploration Study of Fine-tuning an ECG Foundation Model to Estimate Laboratory Values from Single-Lead ECG Signals
by: Xiao, Yujie, et al.
Published: (2025)

Multi-modal Reasoning with LLMs for Visual Semantic Arithmetic
by: Xu, Chuou, et al.
Published: (2026)

MDReID: Modality-Decoupled Learning for Any-to-Any Multi-Modal Object Re-Identification
by: Feng, Yingying, et al.
Published: (2025)

RA-CMF: Region-Adaptive Conditional MeanFlow for CT Image Reconstruction
by: Apurba, Md Shifatul Ahsan, et al.
Published: (2026)

DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers
by: Chen, Yuntao, et al.
Published: (2024)

Search to Fine-tune Pre-trained Graph Neural Networks for Graph-level Tasks
by: Wang, Zhili, et al.
Published: (2023)

CMF-IoU: Multi-Stage Cross-Modal Fusion 3D Object Detection with IoU Joint Prediction
by: Ning, Zhiwei, et al.
Published: (2025)

Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model
by: Yang, Kai, et al.
Published: (2023)

Fine-tuning an ECG Foundation Model to Predict Coronary CT Angiography Outcomes
by: Xiao, Yujie, et al.
Published: (2025)

SkySense V2: A Unified Foundation Model for Multi-modal Remote Sensing
by: Zhang, Yingying, et al.
Published: (2025)

As Simple as Fine-tuning: LLM Alignment via Bidirectional Negative Feedback Loss
by: Mao, Xin, et al.
Published: (2024)

Cross-modality Attention Adapter: A Glioma Segmentation Fine-tuning Method for SAM Using Multimodal Brain MR Images
by: Shi, Xiaoyu, et al.
Published: (2023)

MCRPL: A Pretrain, Prompt & Fine-tune Paradigm for Non-overlapping Many-to-one Cross-domain Recommendation
by: Liu, Hao, et al.
Published: (2024)

VisionTS++: Cross-Modal Time Series Foundation Model with Continual Pre-trained Vision Backbones
by: Shen, Lefei, et al.
Published: (2025)

DiffSpeaker: Speech-Driven 3D Facial Animation with Diffusion Transformer
by: Ma, Zhiyuan, et al.
Published: (2024)