Saved in:
| Main Authors: | Peng, Chong, He, Liqiang, Su, Dan |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2404.09509 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Dual Associated Encoder for Face Restoration
by: Tsai, Yu-Ju, et al.
Published: (2023)
by: Tsai, Yu-Ju, et al.
Published: (2023)
Improving OOD Generalization of Pre-trained Encoders via Aligned Embedding-Space Ensembles
by: Peng, Shuman, et al.
Published: (2024)
by: Peng, Shuman, et al.
Published: (2024)
XM-ALIGN: Unified Cross-Modal Embedding Alignment for Face-Voice Association
by: Fang, Zhihua, et al.
Published: (2025)
by: Fang, Zhihua, et al.
Published: (2025)
RFOP: Rethinking Fusion and Orthogonal Projection for Face-Voice Association
by: Hannan, Abdul, et al.
Published: (2025)
by: Hannan, Abdul, et al.
Published: (2025)
Face-Voice Association with Inductive Bias for Maximum Class Separation
by: Moscati, Marta, et al.
Published: (2026)
by: Moscati, Marta, et al.
Published: (2026)
AlignTok: Aligning Visual Foundation Encoders to Tokenizers for Diffusion Models
by: Chen, Bowei, et al.
Published: (2025)
by: Chen, Bowei, et al.
Published: (2025)
Social-MAE: A Transformer-Based Multimodal Autoencoder for Face and Voice
by: Bohy, Hugo, et al.
Published: (2025)
by: Bohy, Hugo, et al.
Published: (2025)
Shared Multi-modal Embedding Space for Face-Voice Association
by: Simic, Christopher, et al.
Published: (2025)
by: Simic, Christopher, et al.
Published: (2025)
Aligning First, Then Fusing: A Novel Weakly Supervised Multimodal Violence Detection Method
by: Jin, Wenping, et al.
Published: (2025)
by: Jin, Wenping, et al.
Published: (2025)
OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence
by: Tang, Feilong, et al.
Published: (2026)
by: Tang, Feilong, et al.
Published: (2026)
CogVLA: Cognition-Aligned Vision-Language-Action Model via Instruction-Driven Routing & Sparsification
by: Li, Wei, et al.
Published: (2025)
by: Li, Wei, et al.
Published: (2025)
Categorical Knowledge Fused Recognition: Fusing Hierarchical Knowledge with Image Classification through Aligning and Deep Metric Learning
by: Zhao, Yunfeng, et al.
Published: (2024)
by: Zhao, Yunfeng, et al.
Published: (2024)
AliFuse: Aligning and Fusing Multi-modal Medical Data for Computer-Aided Diagnosis
by: Chen, Qiuhui, et al.
Published: (2024)
by: Chen, Qiuhui, et al.
Published: (2024)
FGAIF: Aligning Large Vision-Language Models with Fine-grained AI Feedback
by: Jing, Liqiang, et al.
Published: (2024)
by: Jing, Liqiang, et al.
Published: (2024)
Towards Improved Proxy-based Deep Metric Learning via Data-Augmented Domain Adaptation
by: Ren, Li, et al.
Published: (2024)
by: Ren, Li, et al.
Published: (2024)
MED-VT++: Unifying Multimodal Learning with a Multiscale Encoder-Decoder Video Transformer
by: Karim, Rezaul, et al.
Published: (2023)
by: Karim, Rezaul, et al.
Published: (2023)
A Shared Encoder Approach to Multimodal Representation Learning
by: Roy, Shuvendu, et al.
Published: (2025)
by: Roy, Shuvendu, et al.
Published: (2025)
Imperceptible Face Forgery Attack via Adversarial Semantic Mask
by: Liu, Decheng, et al.
Published: (2024)
by: Liu, Decheng, et al.
Published: (2024)
PAEFF: Precise Alignment and Enhanced Gated Feature Fusion for Face-Voice Association
by: Hannan, Abdul, et al.
Published: (2025)
by: Hannan, Abdul, et al.
Published: (2025)
MagiC: Evaluating Multimodal Cognition Toward Grounded Visual Reasoning
by: Wu, Chengfei, et al.
Published: (2025)
by: Wu, Chengfei, et al.
Published: (2025)
Improving Multimodal Learning via Imbalanced Learning
by: Wei, Shicai, et al.
Published: (2025)
by: Wei, Shicai, et al.
Published: (2025)
Exploring Robust Face-Voice Matching in Multilingual Environments
by: Tang, Jiehui, et al.
Published: (2024)
by: Tang, Jiehui, et al.
Published: (2024)
Heeding the Inner Voice: Aligning ControlNet Training via Intermediate Features Feedback
by: Konovalova, Nina, et al.
Published: (2025)
by: Konovalova, Nina, et al.
Published: (2025)
Align-cDAE: Alzheimer's Disease Progression Modeling with Attention-Aligned Conditional Diffusion Auto-Encoder
by: Das, Ayantika, et al.
Published: (2026)
by: Das, Ayantika, et al.
Published: (2026)
VLForgery Face Triad: Detection, Localization and Attribution via Multimodal Large Language Models
by: He, Xinan, et al.
Published: (2025)
by: He, Xinan, et al.
Published: (2025)
Improve Contrastive Clustering Performance by Multiple Fusing-Augmenting ViT Blocks
by: Wang, Cheng, et al.
Published: (2025)
by: Wang, Cheng, et al.
Published: (2025)
DecoFuse: Decomposing and Fusing the "What", "Where", and "How" for Brain-Inspired fMRI-to-Video Decoding
by: Li, Chong, et al.
Published: (2025)
by: Li, Chong, et al.
Published: (2025)
Unified Multimodal Models as Auto-Encoders
by: Yan, Zhiyuan, et al.
Published: (2025)
by: Yan, Zhiyuan, et al.
Published: (2025)
MaeFuse: Transferring Omni Features with Pretrained Masked Autoencoders for Infrared and Visible Image Fusion via Guided Training
by: Li, Jiayang, et al.
Published: (2024)
by: Li, Jiayang, et al.
Published: (2024)
Robust Multimodal Survival Prediction with the Latent Differentiation Conditional Variational AutoEncoder
by: Zhou, Junjie, et al.
Published: (2025)
by: Zhou, Junjie, et al.
Published: (2025)
SafeText: Safe Text-to-image Models via Aligning the Text Encoder
by: Hu, Yuepeng, et al.
Published: (2025)
by: Hu, Yuepeng, et al.
Published: (2025)
FaceXBench: Evaluating Multimodal LLMs on Face Understanding
by: Narayan, Kartik, et al.
Published: (2025)
by: Narayan, Kartik, et al.
Published: (2025)
Learning to Fuse: Modality-Aware Adaptive Scheduling for Robust Multimodal Foundation Models
by: Bennett, Liam, et al.
Published: (2025)
by: Bennett, Liam, et al.
Published: (2025)
Leveraging CLIP Encoder for Multimodal Emotion Recognition
by: Song, Yehun, et al.
Published: (2025)
by: Song, Yehun, et al.
Published: (2025)
Counterfactual Explanations for Face Forgery Detection via Adversarial Removal of Artifacts
by: Li, Yang, et al.
Published: (2024)
by: Li, Yang, et al.
Published: (2024)
AlignMiF: Geometry-Aligned Multimodal Implicit Field for LiDAR-Camera Joint Synthesis
by: Tang, Tao, et al.
Published: (2024)
by: Tang, Tao, et al.
Published: (2024)
Face2VoiceSync: Lightweight Face-Voice Consistency for Text-Driven Talking Face Generation
by: Kang, Fang, et al.
Published: (2025)
by: Kang, Fang, et al.
Published: (2025)
Beyond the Encoder: Joint Encoder-Decoder Contrastive Pre-Training Improves Dense Prediction
by: Quetin, Sébastien, et al.
Published: (2025)
by: Quetin, Sébastien, et al.
Published: (2025)
MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models
by: Shen, Leyang, et al.
Published: (2024)
by: Shen, Leyang, et al.
Published: (2024)
StructAlign: Structured Cross-Modal Alignment for Continual Text-to-Video Retrieval
by: Wang, Shaokun, et al.
Published: (2026)
by: Wang, Shaokun, et al.
Published: (2026)
Similar Items
-
Dual Associated Encoder for Face Restoration
by: Tsai, Yu-Ju, et al.
Published: (2023) -
Improving OOD Generalization of Pre-trained Encoders via Aligned Embedding-Space Ensembles
by: Peng, Shuman, et al.
Published: (2024) -
XM-ALIGN: Unified Cross-Modal Embedding Alignment for Face-Voice Association
by: Fang, Zhihua, et al.
Published: (2025) -
RFOP: Rethinking Fusion and Orthogonal Projection for Face-Voice Association
by: Hannan, Abdul, et al.
Published: (2025) -
Face-Voice Association with Inductive Bias for Maximum Class Separation
by: Moscati, Marta, et al.
Published: (2026)