:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Li, Jerry, Oh, Timothy, Hoang, Joseph, Veeramachaneni, Vardhit
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence Multimedia
Online Access:	https://arxiv.org/abs/2507.15875
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

AMB-DSGDN: Adaptive Modality-Balanced Dynamic Semantic Graph Differential Network for Multimodal Emotion Recognition
by: Wang, Yunsheng, et al.
Published: (2026)

AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation
by: Choi, Jeongsoo, et al.
Published: (2025)

Solving Copyright Infringement on Short Video Platforms: Novel Datasets and an Audio Restoration Deep Learning Pipeline
by: Oh, Minwoo, et al.
Published: (2025)

Synthesizing Sentiment-Controlled Feedback For Multimodal Text and Image Data
by: Kumar, Puneet, et al.
Published: (2024)

Enhancing Multimodal Retrieval via Complementary Information Extraction and Alignment
by: Zeng, Delong, et al.
Published: (2026)

WorldGPT: Empowering LLM as Multimodal World Model
by: Ge, Zhiqi, et al.
Published: (2024)

A Survey on Multimodal Benchmarks: In the Era of Large AI Models
by: Li, Lin, et al.
Published: (2024)

RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training
by: Ding, Muhe, et al.
Published: (2024)

PSA-MF: Personality-Sentiment Aligned Multi-Level Fusion for Multimodal Sentiment Analysis
by: Xie, Heng, et al.
Published: (2025)

QMAVIS: Long Video-Audio Understanding using Fusion of Large Multimodal Models
by: Lin, Zixing, et al.
Published: (2026)

Modeling Human Responses to Multimodal AI Content
by: Shen, Zhiqi, et al.
Published: (2025)

Tri-Subspaces Disentanglement for Multimodal Sentiment Analysis
by: Meng, Chunlei, et al.
Published: (2026)

LiveK12Bench: Have Large Multimodal Models Truly Conquered High School-level Examinations?
by: Wang, Xiaohan, et al.
Published: (2026)

Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning
by: Cheng, Zebang, et al.
Published: (2024)

RW-Post: Auditable Evidence-Grounded Multimodal Fact-Checking in the Wild
by: Xu, Danni, et al.
Published: (2026)

Multimodal Emotion Recognition by Fusing Video Semantic in MOOC Learning Scenarios
by: Zhang, Yuan, et al.
Published: (2024)

Contestable Multi-Agent Debate with Arena-based Argumentative Computation for Multimedia Verification
by: Nguyen, Truong Thanh Hung, et al.
Published: (2026)

SEER: Semantic Enhancement and Emotional Reasoning Network for Multimodal Fake News Detection
by: Zhu, Peican, et al.
Published: (2025)

KEN: Knowledge Augmentation and Emotion Guidance Network for Multimodal Fake News Detection
by: Zhu, Peican, et al.
Published: (2025)

HiQuE: Hierarchical Question Embedding Network for Multimodal Depression Detection
by: Jung, Juho, et al.
Published: (2024)

Has Multimodal Learning Delivered Universal Intelligence in Healthcare? A Comprehensive Survey
by: Lin, Qika, et al.
Published: (2024)

Why We Feel: Breaking Boundaries in Emotional Reasoning with Multimodal Large Language Models
by: Lin, Yuxiang, et al.
Published: (2025)

SynthGuard: An Open Platform for Detecting AI-Generated Multimedia with Multimodal LLMs
by: Desai, Shail, et al.
Published: (2025)

Interpretable Multimodal Misinformation Detection with Logic Reasoning
by: Liu, Hui, et al.
Published: (2023)

MaLoRA: Gated Modality LoRA for Key-Space Alignment in Multimodal LLM Fine-Tuning
by: Zheng, Xinhan, et al.
Published: (2025)

Unsupervised Multimodal Clustering for Semantics Discovery in Multimodal Utterances
by: Zhang, Hanlei, et al.
Published: (2024)

Federated Prompt-Tuning with Heterogeneous and Incomplete Multimodal Client Data
by: Phung, Thu Hang, et al.
Published: (2026)

Shapley Value-based Contrastive Alignment for Multimodal Information Extraction
by: Luo, Wen, et al.
Published: (2024)

PETLP: A Privacy-by-Design Pipeline for Social Media Data in AI Research
by: Oh, Nick, et al.
Published: (2025)

Pianist Transformer: Towards Expressive Piano Performance Rendering via Scalable Self-Supervised Pre-Training
by: You, Hong-Jie, et al.
Published: (2025)

PRISM-XR: Empowering Privacy-Aware XR Collaboration with Multimodal Large Language Models
by: Chen, Jiangong, et al.
Published: (2026)

GeoGuess: Multimodal Reasoning based on Hierarchy of Visual Information in Street View
by: Cheng, Fenghua, et al.
Published: (2025)

Knowledge-Guided Dynamic Modality Attention Fusion Framework for Multimodal Sentiment Analysis
by: Feng, Xinyu, et al.
Published: (2024)

PTA: Enhancing Multimodal Sentiment Analysis through Pipelined Prediction and Translation-based Alignment
by: Song, Shezheng, et al.
Published: (2024)

MuPHI: Learning Implicit Multimodal Harm Reasoning via Semantically Grounded Reward Optimization
by: Saha, Anisha, et al.
Published: (2026)

FedNano: Toward Lightweight Federated Tuning for Pretrained Multimodal Large Language Models
by: Zhang, Yao, et al.
Published: (2025)

Semantic Item Graph Enhancement for Multimodal Recommendation
by: Zhang, Xiaoxiong, et al.
Published: (2025)

Investigating Vulnerabilities and Defenses Against Audio-Visual Attacks: A Comprehensive Survey Emphasizing Multimodal Models
by: Wen, Jinming, et al.
Published: (2025)

Personalized Image Generation with Large Multimodal Models
by: Xu, Yiyan, et al.
Published: (2024)

A Survey on Image-text Multimodal Models
by: Guo, Ruifeng, et al.
Published: (2023)