:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Peng, Chong, He, Liqiang, Su, Dan
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2404.09509
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Dual Associated Encoder for Face Restoration
by: Tsai, Yu-Ju, et al.
Published: (2023)

Improving OOD Generalization of Pre-trained Encoders via Aligned Embedding-Space Ensembles
by: Peng, Shuman, et al.
Published: (2024)

XM-ALIGN: Unified Cross-Modal Embedding Alignment for Face-Voice Association
by: Fang, Zhihua, et al.
Published: (2025)

RFOP: Rethinking Fusion and Orthogonal Projection for Face-Voice Association
by: Hannan, Abdul, et al.
Published: (2025)

Face-Voice Association with Inductive Bias for Maximum Class Separation
by: Moscati, Marta, et al.
Published: (2026)

AlignTok: Aligning Visual Foundation Encoders to Tokenizers for Diffusion Models
by: Chen, Bowei, et al.
Published: (2025)

Social-MAE: A Transformer-Based Multimodal Autoencoder for Face and Voice
by: Bohy, Hugo, et al.
Published: (2025)

Shared Multi-modal Embedding Space for Face-Voice Association
by: Simic, Christopher, et al.
Published: (2025)

Aligning First, Then Fusing: A Novel Weakly Supervised Multimodal Violence Detection Method
by: Jin, Wenping, et al.
Published: (2025)

OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence
by: Tang, Feilong, et al.
Published: (2026)

CogVLA: Cognition-Aligned Vision-Language-Action Model via Instruction-Driven Routing & Sparsification
by: Li, Wei, et al.
Published: (2025)

Categorical Knowledge Fused Recognition: Fusing Hierarchical Knowledge with Image Classification through Aligning and Deep Metric Learning
by: Zhao, Yunfeng, et al.
Published: (2024)

AliFuse: Aligning and Fusing Multi-modal Medical Data for Computer-Aided Diagnosis
by: Chen, Qiuhui, et al.
Published: (2024)

FGAIF: Aligning Large Vision-Language Models with Fine-grained AI Feedback
by: Jing, Liqiang, et al.
Published: (2024)

Towards Improved Proxy-based Deep Metric Learning via Data-Augmented Domain Adaptation
by: Ren, Li, et al.
Published: (2024)

MED-VT++: Unifying Multimodal Learning with a Multiscale Encoder-Decoder Video Transformer
by: Karim, Rezaul, et al.
Published: (2023)

A Shared Encoder Approach to Multimodal Representation Learning
by: Roy, Shuvendu, et al.
Published: (2025)

Imperceptible Face Forgery Attack via Adversarial Semantic Mask
by: Liu, Decheng, et al.
Published: (2024)

PAEFF: Precise Alignment and Enhanced Gated Feature Fusion for Face-Voice Association
by: Hannan, Abdul, et al.
Published: (2025)

MagiC: Evaluating Multimodal Cognition Toward Grounded Visual Reasoning
by: Wu, Chengfei, et al.
Published: (2025)

Improving Multimodal Learning via Imbalanced Learning
by: Wei, Shicai, et al.
Published: (2025)

Exploring Robust Face-Voice Matching in Multilingual Environments
by: Tang, Jiehui, et al.
Published: (2024)

Heeding the Inner Voice: Aligning ControlNet Training via Intermediate Features Feedback
by: Konovalova, Nina, et al.
Published: (2025)

Align-cDAE: Alzheimer's Disease Progression Modeling with Attention-Aligned Conditional Diffusion Auto-Encoder
by: Das, Ayantika, et al.
Published: (2026)

VLForgery Face Triad: Detection, Localization and Attribution via Multimodal Large Language Models
by: He, Xinan, et al.
Published: (2025)

Improve Contrastive Clustering Performance by Multiple Fusing-Augmenting ViT Blocks
by: Wang, Cheng, et al.
Published: (2025)

DecoFuse: Decomposing and Fusing the "What", "Where", and "How" for Brain-Inspired fMRI-to-Video Decoding
by: Li, Chong, et al.
Published: (2025)

Unified Multimodal Models as Auto-Encoders
by: Yan, Zhiyuan, et al.
Published: (2025)

MaeFuse: Transferring Omni Features with Pretrained Masked Autoencoders for Infrared and Visible Image Fusion via Guided Training
by: Li, Jiayang, et al.
Published: (2024)

Robust Multimodal Survival Prediction with the Latent Differentiation Conditional Variational AutoEncoder
by: Zhou, Junjie, et al.
Published: (2025)

SafeText: Safe Text-to-image Models via Aligning the Text Encoder
by: Hu, Yuepeng, et al.
Published: (2025)

FaceXBench: Evaluating Multimodal LLMs on Face Understanding
by: Narayan, Kartik, et al.
Published: (2025)

Learning to Fuse: Modality-Aware Adaptive Scheduling for Robust Multimodal Foundation Models
by: Bennett, Liam, et al.
Published: (2025)

Leveraging CLIP Encoder for Multimodal Emotion Recognition
by: Song, Yehun, et al.
Published: (2025)

Counterfactual Explanations for Face Forgery Detection via Adversarial Removal of Artifacts
by: Li, Yang, et al.
Published: (2024)

AlignMiF: Geometry-Aligned Multimodal Implicit Field for LiDAR-Camera Joint Synthesis
by: Tang, Tao, et al.
Published: (2024)

Face2VoiceSync: Lightweight Face-Voice Consistency for Text-Driven Talking Face Generation
by: Kang, Fang, et al.
Published: (2025)

Beyond the Encoder: Joint Encoder-Decoder Contrastive Pre-Training Improves Dense Prediction
by: Quetin, Sébastien, et al.
Published: (2025)

MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models
by: Shen, Leyang, et al.
Published: (2024)

StructAlign: Structured Cross-Modal Alignment for Continual Text-to-Video Retrieval
by: Wang, Shaokun, et al.
Published: (2026)