:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhang, Haoshuo, Bo, Yufei, Tao, Meixia
Format:	Preprint
Published:	2025
Subjects:	Multimedia
Online Access:	https://arxiv.org/abs/2508.20057
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Prompt-based Multimodal Semantic Communication for Multi-spectral Image Segmentation
by: Zhang, Haoshuo, et al.
Published: (2025)

BitSemCom: A Bit-Level Semantic Communication Framework with Learnable Probabilistic Mapping
by: Zhang, Haoshuo, et al.
Published: (2025)

Wireless Multi-User Interactive Virtual Reality in Metaverse with Edge-Device Collaborative Computing
by: Xu, Caolu, et al.
Published: (2024)

Multimodal Semantic Communication for Generative Audio-Driven Video Conferencing
by: Tong, Haonan, et al.
Published: (2024)

Multimodal Emotion Recognition by Fusing Video Semantic in MOOC Learning Scenarios
by: Zhang, Yuan, et al.
Published: (2024)

MambaPro: Multi-Modal Object Re-Identification with Mamba Aggregation and Synergistic Prompt
by: Wang, Yuhao, et al.
Published: (2024)

Mixture-of-Prompt-Experts for Multi-modal Semantic Understanding
by: Wu, Zichen, et al.
Published: (2024)

Semi-supervised Semantic Segmentation with Multi-Constraint Consistency Learning
by: Yin, Jianjian, et al.
Published: (2025)

Target Speech Diarization with Multimodal Prompts
by: Jiang, Yidi, et al.
Published: (2024)

Latent Feature-Guided Conditional Diffusion for Generative Image Semantic Communication
by: Chen, Zehao, et al.
Published: (2025)

MSC: A Marine Wildlife Video Dataset with Grounded Segmentation and Clip-Level Captioning
by: Truong, Quang-Trung, et al.
Published: (2025)

Multi-hop Parallel Image Semantic Communication for Distortion Accumulation Mitigation
by: Xie, Bingyan, et al.
Published: (2025)

Language-oriented Semantic Communication for Image Transmission with Fine-Tuned Diffusion Model
by: Wei, Xinfeng, et al.
Published: (2024)

MotionPro: A Precise Motion Controller for Image-to-Video Generation
by: Zhang, Zhongwei, et al.
Published: (2025)

Towards Open-Vocabulary Remote Sensing Image Semantic Segmentation
by: Ye, Chengyang, et al.
Published: (2024)

Universal Organizer of SAM for Unsupervised Semantic Segmentation
by: Li, Tingting, et al.
Published: (2024)

Look, Listen and Segment: Towards Weakly Supervised Audio-visual Semantic Segmentation
by: Li, Chengzhi, et al.
Published: (2026)

Wireless Video Semantic Communication with Decoupled Diffusion Multi-frame Compensation
by: Xie, Bingyan, et al.
Published: (2025)

ISDrama: Immersive Spatial Drama Generation through Multimodal Prompting
by: Zhang, Yu, et al.
Published: (2025)

Evolutionary Multimodal Reasoning via Hierarchical Semantic Representation for Intent Recognition
by: Zhou, Qianrui, et al.
Published: (2026)

ASAP: Advancing Semantic Alignment Promotes Multi-Modal Manipulation Detecting and Grounding
by: Zhang, Zhenxing, et al.
Published: (2024)

Contextual Wireless Video Semantic Communication in MIMO-OFDM Systems
by: Xie, Bingyan, et al.
Published: (2026)

ProFD: Prompt-Guided Feature Disentangling for Occluded Person Re-Identification
by: Cui, Can, et al.
Published: (2024)

Multi Agents Semantic Emotion Aligned Music to Image Generation with Music Derived Captions
by: Shi, Junchang, et al.
Published: (2025)

Towards Open-Vocabulary Video Semantic Segmentation
by: Li, Xinhao, et al.
Published: (2024)

Joint Optimization of Buffer Delay and HARQ for Video Communications
by: Cheng, Baoping, et al.
Published: (2024)

WVSC: Wireless Video Semantic Communication with Multi-frame Compensation
by: Xie, Bingyan, et al.
Published: (2025)

ESIQA: Perceptual Quality Assessment of Vision-Pro-based Egocentric Spatial Images
by: Zhu, Xilei, et al.
Published: (2024)

Token-Level Contrastive Learning with Modality-Aware Prompting for Multimodal Intent Recognition
by: Zhou, Qianrui, et al.
Published: (2023)

Unsupervised Multimodal Clustering for Semantics Discovery in Multimodal Utterances
by: Zhang, Hanlei, et al.
Published: (2024)

Towards Multimodal Sentiment Analysis via Contrastive Cross-modal Retrieval Augmentation and Hierachical Prompts
by: Zhao, Xianbing, et al.
Published: (2025)

Open-Vocabulary Audio-Visual Semantic Segmentation
by: Guo, Ruohao, et al.
Published: (2024)

MAR3: Multi-Agent Recognition, Reasoning, and Reflection for Reference Audio-Visual Segmentation
by: Zhao, Yuan, et al.
Published: (2026)

DiffCL: A Diffusion-Based Contrastive Learning Framework with Semantic Alignment for Multimodal Recommendations
by: Song, Qiya, et al.
Published: (2025)

Think before You Leap: Content-Aware Low-Cost Edge-Assisted Video Semantic Segmentation
by: Yan, Mingxuan, et al.
Published: (2024)

MMC: Iterative Refinement of VLM Reasoning via MCTS-based Multimodal Critique
by: Liu, Shuhang, et al.
Published: (2025)

Semantic Item Graph Enhancement for Multimodal Recommendation
by: Zhang, Xiaoxiong, et al.
Published: (2025)

AMB-DSGDN: Adaptive Modality-Balanced Dynamic Semantic Graph Differential Network for Multimodal Emotion Recognition
by: Wang, Yunsheng, et al.
Published: (2026)

ARGUS: Defending Against Multimodal Indirect Prompt Injection via Steering Instruction-Following Behavior
by: Lu, Weikai, et al.
Published: (2025)

Exploring the Role of Audio in Multimodal Misinformation Detection
by: Liu, Moyang, et al.
Published: (2024)