Saved in:
| Main Authors: | Zhang, Haoshuo, Bo, Yufei, Tao, Meixia |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.20057 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Prompt-based Multimodal Semantic Communication for Multi-spectral Image Segmentation
by: Zhang, Haoshuo, et al.
Published: (2025)
by: Zhang, Haoshuo, et al.
Published: (2025)
BitSemCom: A Bit-Level Semantic Communication Framework with Learnable Probabilistic Mapping
by: Zhang, Haoshuo, et al.
Published: (2025)
by: Zhang, Haoshuo, et al.
Published: (2025)
Wireless Multi-User Interactive Virtual Reality in Metaverse with Edge-Device Collaborative Computing
by: Xu, Caolu, et al.
Published: (2024)
by: Xu, Caolu, et al.
Published: (2024)
Multimodal Semantic Communication for Generative Audio-Driven Video Conferencing
by: Tong, Haonan, et al.
Published: (2024)
by: Tong, Haonan, et al.
Published: (2024)
Multimodal Emotion Recognition by Fusing Video Semantic in MOOC Learning Scenarios
by: Zhang, Yuan, et al.
Published: (2024)
by: Zhang, Yuan, et al.
Published: (2024)
MambaPro: Multi-Modal Object Re-Identification with Mamba Aggregation and Synergistic Prompt
by: Wang, Yuhao, et al.
Published: (2024)
by: Wang, Yuhao, et al.
Published: (2024)
Mixture-of-Prompt-Experts for Multi-modal Semantic Understanding
by: Wu, Zichen, et al.
Published: (2024)
by: Wu, Zichen, et al.
Published: (2024)
Semi-supervised Semantic Segmentation with Multi-Constraint Consistency Learning
by: Yin, Jianjian, et al.
Published: (2025)
by: Yin, Jianjian, et al.
Published: (2025)
Target Speech Diarization with Multimodal Prompts
by: Jiang, Yidi, et al.
Published: (2024)
by: Jiang, Yidi, et al.
Published: (2024)
Latent Feature-Guided Conditional Diffusion for Generative Image Semantic Communication
by: Chen, Zehao, et al.
Published: (2025)
by: Chen, Zehao, et al.
Published: (2025)
MSC: A Marine Wildlife Video Dataset with Grounded Segmentation and Clip-Level Captioning
by: Truong, Quang-Trung, et al.
Published: (2025)
by: Truong, Quang-Trung, et al.
Published: (2025)
Multi-hop Parallel Image Semantic Communication for Distortion Accumulation Mitigation
by: Xie, Bingyan, et al.
Published: (2025)
by: Xie, Bingyan, et al.
Published: (2025)
Language-oriented Semantic Communication for Image Transmission with Fine-Tuned Diffusion Model
by: Wei, Xinfeng, et al.
Published: (2024)
by: Wei, Xinfeng, et al.
Published: (2024)
MotionPro: A Precise Motion Controller for Image-to-Video Generation
by: Zhang, Zhongwei, et al.
Published: (2025)
by: Zhang, Zhongwei, et al.
Published: (2025)
Towards Open-Vocabulary Remote Sensing Image Semantic Segmentation
by: Ye, Chengyang, et al.
Published: (2024)
by: Ye, Chengyang, et al.
Published: (2024)
Universal Organizer of SAM for Unsupervised Semantic Segmentation
by: Li, Tingting, et al.
Published: (2024)
by: Li, Tingting, et al.
Published: (2024)
Look, Listen and Segment: Towards Weakly Supervised Audio-visual Semantic Segmentation
by: Li, Chengzhi, et al.
Published: (2026)
by: Li, Chengzhi, et al.
Published: (2026)
Wireless Video Semantic Communication with Decoupled Diffusion Multi-frame Compensation
by: Xie, Bingyan, et al.
Published: (2025)
by: Xie, Bingyan, et al.
Published: (2025)
ISDrama: Immersive Spatial Drama Generation through Multimodal Prompting
by: Zhang, Yu, et al.
Published: (2025)
by: Zhang, Yu, et al.
Published: (2025)
Evolutionary Multimodal Reasoning via Hierarchical Semantic Representation for Intent Recognition
by: Zhou, Qianrui, et al.
Published: (2026)
by: Zhou, Qianrui, et al.
Published: (2026)
ASAP: Advancing Semantic Alignment Promotes Multi-Modal Manipulation Detecting and Grounding
by: Zhang, Zhenxing, et al.
Published: (2024)
by: Zhang, Zhenxing, et al.
Published: (2024)
Contextual Wireless Video Semantic Communication in MIMO-OFDM Systems
by: Xie, Bingyan, et al.
Published: (2026)
by: Xie, Bingyan, et al.
Published: (2026)
ProFD: Prompt-Guided Feature Disentangling for Occluded Person Re-Identification
by: Cui, Can, et al.
Published: (2024)
by: Cui, Can, et al.
Published: (2024)
Multi Agents Semantic Emotion Aligned Music to Image Generation with Music Derived Captions
by: Shi, Junchang, et al.
Published: (2025)
by: Shi, Junchang, et al.
Published: (2025)
Towards Open-Vocabulary Video Semantic Segmentation
by: Li, Xinhao, et al.
Published: (2024)
by: Li, Xinhao, et al.
Published: (2024)
Joint Optimization of Buffer Delay and HARQ for Video Communications
by: Cheng, Baoping, et al.
Published: (2024)
by: Cheng, Baoping, et al.
Published: (2024)
WVSC: Wireless Video Semantic Communication with Multi-frame Compensation
by: Xie, Bingyan, et al.
Published: (2025)
by: Xie, Bingyan, et al.
Published: (2025)
ESIQA: Perceptual Quality Assessment of Vision-Pro-based Egocentric Spatial Images
by: Zhu, Xilei, et al.
Published: (2024)
by: Zhu, Xilei, et al.
Published: (2024)
Token-Level Contrastive Learning with Modality-Aware Prompting for Multimodal Intent Recognition
by: Zhou, Qianrui, et al.
Published: (2023)
by: Zhou, Qianrui, et al.
Published: (2023)
Unsupervised Multimodal Clustering for Semantics Discovery in Multimodal Utterances
by: Zhang, Hanlei, et al.
Published: (2024)
by: Zhang, Hanlei, et al.
Published: (2024)
Towards Multimodal Sentiment Analysis via Contrastive Cross-modal Retrieval Augmentation and Hierachical Prompts
by: Zhao, Xianbing, et al.
Published: (2025)
by: Zhao, Xianbing, et al.
Published: (2025)
Open-Vocabulary Audio-Visual Semantic Segmentation
by: Guo, Ruohao, et al.
Published: (2024)
by: Guo, Ruohao, et al.
Published: (2024)
MAR3: Multi-Agent Recognition, Reasoning, and Reflection for Reference Audio-Visual Segmentation
by: Zhao, Yuan, et al.
Published: (2026)
by: Zhao, Yuan, et al.
Published: (2026)
DiffCL: A Diffusion-Based Contrastive Learning Framework with Semantic Alignment for Multimodal Recommendations
by: Song, Qiya, et al.
Published: (2025)
by: Song, Qiya, et al.
Published: (2025)
Think before You Leap: Content-Aware Low-Cost Edge-Assisted Video Semantic Segmentation
by: Yan, Mingxuan, et al.
Published: (2024)
by: Yan, Mingxuan, et al.
Published: (2024)
MMC: Iterative Refinement of VLM Reasoning via MCTS-based Multimodal Critique
by: Liu, Shuhang, et al.
Published: (2025)
by: Liu, Shuhang, et al.
Published: (2025)
Semantic Item Graph Enhancement for Multimodal Recommendation
by: Zhang, Xiaoxiong, et al.
Published: (2025)
by: Zhang, Xiaoxiong, et al.
Published: (2025)
AMB-DSGDN: Adaptive Modality-Balanced Dynamic Semantic Graph Differential Network for Multimodal Emotion Recognition
by: Wang, Yunsheng, et al.
Published: (2026)
by: Wang, Yunsheng, et al.
Published: (2026)
ARGUS: Defending Against Multimodal Indirect Prompt Injection via Steering Instruction-Following Behavior
by: Lu, Weikai, et al.
Published: (2025)
by: Lu, Weikai, et al.
Published: (2025)
Exploring the Role of Audio in Multimodal Misinformation Detection
by: Liu, Moyang, et al.
Published: (2024)
by: Liu, Moyang, et al.
Published: (2024)
Similar Items
-
Prompt-based Multimodal Semantic Communication for Multi-spectral Image Segmentation
by: Zhang, Haoshuo, et al.
Published: (2025) -
BitSemCom: A Bit-Level Semantic Communication Framework with Learnable Probabilistic Mapping
by: Zhang, Haoshuo, et al.
Published: (2025) -
Wireless Multi-User Interactive Virtual Reality in Metaverse with Edge-Device Collaborative Computing
by: Xu, Caolu, et al.
Published: (2024) -
Multimodal Semantic Communication for Generative Audio-Driven Video Conferencing
by: Tong, Haonan, et al.
Published: (2024) -
Multimodal Emotion Recognition by Fusing Video Semantic in MOOC Learning Scenarios
by: Zhang, Yuan, et al.
Published: (2024)