Saved in:
| Main Authors: | Li, Jerry, Oh, Timothy, Hoang, Joseph, Veeramachaneni, Vardhit |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.15875 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
AMB-DSGDN: Adaptive Modality-Balanced Dynamic Semantic Graph Differential Network for Multimodal Emotion Recognition
by: Wang, Yunsheng, et al.
Published: (2026)
by: Wang, Yunsheng, et al.
Published: (2026)
AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation
by: Choi, Jeongsoo, et al.
Published: (2025)
by: Choi, Jeongsoo, et al.
Published: (2025)
Solving Copyright Infringement on Short Video Platforms: Novel Datasets and an Audio Restoration Deep Learning Pipeline
by: Oh, Minwoo, et al.
Published: (2025)
by: Oh, Minwoo, et al.
Published: (2025)
Synthesizing Sentiment-Controlled Feedback For Multimodal Text and Image Data
by: Kumar, Puneet, et al.
Published: (2024)
by: Kumar, Puneet, et al.
Published: (2024)
Enhancing Multimodal Retrieval via Complementary Information Extraction and Alignment
by: Zeng, Delong, et al.
Published: (2026)
by: Zeng, Delong, et al.
Published: (2026)
WorldGPT: Empowering LLM as Multimodal World Model
by: Ge, Zhiqi, et al.
Published: (2024)
by: Ge, Zhiqi, et al.
Published: (2024)
A Survey on Multimodal Benchmarks: In the Era of Large AI Models
by: Li, Lin, et al.
Published: (2024)
by: Li, Lin, et al.
Published: (2024)
RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training
by: Ding, Muhe, et al.
Published: (2024)
by: Ding, Muhe, et al.
Published: (2024)
PSA-MF: Personality-Sentiment Aligned Multi-Level Fusion for Multimodal Sentiment Analysis
by: Xie, Heng, et al.
Published: (2025)
by: Xie, Heng, et al.
Published: (2025)
QMAVIS: Long Video-Audio Understanding using Fusion of Large Multimodal Models
by: Lin, Zixing, et al.
Published: (2026)
by: Lin, Zixing, et al.
Published: (2026)
Modeling Human Responses to Multimodal AI Content
by: Shen, Zhiqi, et al.
Published: (2025)
by: Shen, Zhiqi, et al.
Published: (2025)
Tri-Subspaces Disentanglement for Multimodal Sentiment Analysis
by: Meng, Chunlei, et al.
Published: (2026)
by: Meng, Chunlei, et al.
Published: (2026)
LiveK12Bench: Have Large Multimodal Models Truly Conquered High School-level Examinations?
by: Wang, Xiaohan, et al.
Published: (2026)
by: Wang, Xiaohan, et al.
Published: (2026)
Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning
by: Cheng, Zebang, et al.
Published: (2024)
by: Cheng, Zebang, et al.
Published: (2024)
RW-Post: Auditable Evidence-Grounded Multimodal Fact-Checking in the Wild
by: Xu, Danni, et al.
Published: (2026)
by: Xu, Danni, et al.
Published: (2026)
Multimodal Emotion Recognition by Fusing Video Semantic in MOOC Learning Scenarios
by: Zhang, Yuan, et al.
Published: (2024)
by: Zhang, Yuan, et al.
Published: (2024)
Contestable Multi-Agent Debate with Arena-based Argumentative Computation for Multimedia Verification
by: Nguyen, Truong Thanh Hung, et al.
Published: (2026)
by: Nguyen, Truong Thanh Hung, et al.
Published: (2026)
SEER: Semantic Enhancement and Emotional Reasoning Network for Multimodal Fake News Detection
by: Zhu, Peican, et al.
Published: (2025)
by: Zhu, Peican, et al.
Published: (2025)
KEN: Knowledge Augmentation and Emotion Guidance Network for Multimodal Fake News Detection
by: Zhu, Peican, et al.
Published: (2025)
by: Zhu, Peican, et al.
Published: (2025)
HiQuE: Hierarchical Question Embedding Network for Multimodal Depression Detection
by: Jung, Juho, et al.
Published: (2024)
by: Jung, Juho, et al.
Published: (2024)
Has Multimodal Learning Delivered Universal Intelligence in Healthcare? A Comprehensive Survey
by: Lin, Qika, et al.
Published: (2024)
by: Lin, Qika, et al.
Published: (2024)
Why We Feel: Breaking Boundaries in Emotional Reasoning with Multimodal Large Language Models
by: Lin, Yuxiang, et al.
Published: (2025)
by: Lin, Yuxiang, et al.
Published: (2025)
SynthGuard: An Open Platform for Detecting AI-Generated Multimedia with Multimodal LLMs
by: Desai, Shail, et al.
Published: (2025)
by: Desai, Shail, et al.
Published: (2025)
Interpretable Multimodal Misinformation Detection with Logic Reasoning
by: Liu, Hui, et al.
Published: (2023)
by: Liu, Hui, et al.
Published: (2023)
MaLoRA: Gated Modality LoRA for Key-Space Alignment in Multimodal LLM Fine-Tuning
by: Zheng, Xinhan, et al.
Published: (2025)
by: Zheng, Xinhan, et al.
Published: (2025)
Unsupervised Multimodal Clustering for Semantics Discovery in Multimodal Utterances
by: Zhang, Hanlei, et al.
Published: (2024)
by: Zhang, Hanlei, et al.
Published: (2024)
Federated Prompt-Tuning with Heterogeneous and Incomplete Multimodal Client Data
by: Phung, Thu Hang, et al.
Published: (2026)
by: Phung, Thu Hang, et al.
Published: (2026)
Shapley Value-based Contrastive Alignment for Multimodal Information Extraction
by: Luo, Wen, et al.
Published: (2024)
by: Luo, Wen, et al.
Published: (2024)
PETLP: A Privacy-by-Design Pipeline for Social Media Data in AI Research
by: Oh, Nick, et al.
Published: (2025)
by: Oh, Nick, et al.
Published: (2025)
Pianist Transformer: Towards Expressive Piano Performance Rendering via Scalable Self-Supervised Pre-Training
by: You, Hong-Jie, et al.
Published: (2025)
by: You, Hong-Jie, et al.
Published: (2025)
PRISM-XR: Empowering Privacy-Aware XR Collaboration with Multimodal Large Language Models
by: Chen, Jiangong, et al.
Published: (2026)
by: Chen, Jiangong, et al.
Published: (2026)
GeoGuess: Multimodal Reasoning based on Hierarchy of Visual Information in Street View
by: Cheng, Fenghua, et al.
Published: (2025)
by: Cheng, Fenghua, et al.
Published: (2025)
Knowledge-Guided Dynamic Modality Attention Fusion Framework for Multimodal Sentiment Analysis
by: Feng, Xinyu, et al.
Published: (2024)
by: Feng, Xinyu, et al.
Published: (2024)
PTA: Enhancing Multimodal Sentiment Analysis through Pipelined Prediction and Translation-based Alignment
by: Song, Shezheng, et al.
Published: (2024)
by: Song, Shezheng, et al.
Published: (2024)
MuPHI: Learning Implicit Multimodal Harm Reasoning via Semantically Grounded Reward Optimization
by: Saha, Anisha, et al.
Published: (2026)
by: Saha, Anisha, et al.
Published: (2026)
FedNano: Toward Lightweight Federated Tuning for Pretrained Multimodal Large Language Models
by: Zhang, Yao, et al.
Published: (2025)
by: Zhang, Yao, et al.
Published: (2025)
Semantic Item Graph Enhancement for Multimodal Recommendation
by: Zhang, Xiaoxiong, et al.
Published: (2025)
by: Zhang, Xiaoxiong, et al.
Published: (2025)
Investigating Vulnerabilities and Defenses Against Audio-Visual Attacks: A Comprehensive Survey Emphasizing Multimodal Models
by: Wen, Jinming, et al.
Published: (2025)
by: Wen, Jinming, et al.
Published: (2025)
Personalized Image Generation with Large Multimodal Models
by: Xu, Yiyan, et al.
Published: (2024)
by: Xu, Yiyan, et al.
Published: (2024)
A Survey on Image-text Multimodal Models
by: Guo, Ruifeng, et al.
Published: (2023)
by: Guo, Ruifeng, et al.
Published: (2023)
Similar Items
-
AMB-DSGDN: Adaptive Modality-Balanced Dynamic Semantic Graph Differential Network for Multimodal Emotion Recognition
by: Wang, Yunsheng, et al.
Published: (2026) -
AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation
by: Choi, Jeongsoo, et al.
Published: (2025) -
Solving Copyright Infringement on Short Video Platforms: Novel Datasets and an Audio Restoration Deep Learning Pipeline
by: Oh, Minwoo, et al.
Published: (2025) -
Synthesizing Sentiment-Controlled Feedback For Multimodal Text and Image Data
by: Kumar, Puneet, et al.
Published: (2024) -
Enhancing Multimodal Retrieval via Complementary Information Extraction and Alignment
by: Zeng, Delong, et al.
Published: (2026)