Saved in:
| Main Authors: | Zheng, Linfeng, Chen, Peilin, Wang, Shiqi |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2407.14936 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Compact Visual Data Representation for Green Multimedia -- A Human Visual System Perspective
by: Chen, Peilin, et al.
Published: (2024)
by: Chen, Peilin, et al.
Published: (2024)
MindTuner: Cross-Subject Visual Decoding with Visual Fingerprint and Semantic Correction
by: Gong, Zixuan, et al.
Published: (2024)
by: Gong, Zixuan, et al.
Published: (2024)
Perception-Aware Video Semantic Communication
by: Huang, Yinhuan, et al.
Published: (2026)
by: Huang, Yinhuan, et al.
Published: (2026)
Deep Shape-Texture Statistics for Completely Blind Image Quality Evaluation
by: Li, Yixuan, et al.
Published: (2024)
by: Li, Yixuan, et al.
Published: (2024)
A Multi-modal Fusion Network for Terrain Perception Based on Illumination Aware
by: Wang, Rui, et al.
Published: (2025)
by: Wang, Rui, et al.
Published: (2025)
Wills Aligner: Multi-Subject Collaborative Brain Visual Decoding
by: Bao, Guangyin, et al.
Published: (2024)
by: Bao, Guangyin, et al.
Published: (2024)
Token Communications: A Large Model-Driven Framework for Cross-modal Context-aware Semantic Communications
by: Qiao, Li, et al.
Published: (2025)
by: Qiao, Li, et al.
Published: (2025)
An Emotion Recognition Framework via Cross-modal Alignment of EEG and Eye Movement Data
by: Wang, Jianlu, et al.
Published: (2025)
by: Wang, Jianlu, et al.
Published: (2025)
Learning Brain Representation with Hierarchical Visual Embeddings
by: Zheng, Jiawen, et al.
Published: (2026)
by: Zheng, Jiawen, et al.
Published: (2026)
When Video Coding Meets Multimodal Large Language Models: A Unified Paradigm for Video Coding
by: Zhang, Pingping, et al.
Published: (2024)
by: Zhang, Pingping, et al.
Published: (2024)
CrossPT-EEG: A Benchmark for Cross-Participant and Cross-Time Generalization of EEG-based Visual Decoding
by: Zhu, Shuqi, et al.
Published: (2024)
by: Zhu, Shuqi, et al.
Published: (2024)
MViR: Multi-View Visual-Semantic Representation for Fake News Detection
by: Liang, Haochen, et al.
Published: (2026)
by: Liang, Haochen, et al.
Published: (2026)
SciCom Wiki: A Digital Library to Support the Science Communication Knowledge Infrastructure for Videos and Podcasts
by: Wittenborg, Tim, et al.
Published: (2025)
by: Wittenborg, Tim, et al.
Published: (2025)
Deep Reversible Consistency Learning for Cross-modal Retrieval
by: Pu, Ruitao, et al.
Published: (2025)
by: Pu, Ruitao, et al.
Published: (2025)
CLASP: Cross-modal Salient Anchor-based Semantic Propagation for Weakly-supervised Dense Audio-Visual Event Localization
by: Zhou, Jinxing, et al.
Published: (2025)
by: Zhou, Jinxing, et al.
Published: (2025)
Mixture-of-Prompt-Experts for Multi-modal Semantic Understanding
by: Wu, Zichen, et al.
Published: (2024)
by: Wu, Zichen, et al.
Published: (2024)
Cross-Layer Encrypted Semantic Communication Framework for Panoramic Video Transmission
by: Gao, Haixiao, et al.
Published: (2024)
by: Gao, Haixiao, et al.
Published: (2024)
Latent Feature-Guided Conditional Diffusion for Generative Image Semantic Communication
by: Chen, Zehao, et al.
Published: (2025)
by: Chen, Zehao, et al.
Published: (2025)
Towards Multimodal Sentiment Analysis via Contrastive Cross-modal Retrieval Augmentation and Hierachical Prompts
by: Zhao, Xianbing, et al.
Published: (2025)
by: Zhao, Xianbing, et al.
Published: (2025)
Audio-Guided Visual Perception for Audio-Visual Navigation
by: Wang, Yi, et al.
Published: (2025)
by: Wang, Yi, et al.
Published: (2025)
Cross-Modality and Within-Modality Regularization for Audio-Visual DeepFake Detection
by: Zou, Heqing, et al.
Published: (2024)
by: Zou, Heqing, et al.
Published: (2024)
CAMeL: Cross-modality Adaptive Meta-Learning for Text-based Person Retrieval
by: Yu, Hang, et al.
Published: (2025)
by: Yu, Hang, et al.
Published: (2025)
Voxel-GS: Quantized Scaffold Gaussian Splatting Compression with Run-Length Coding
by: Fu, Chunyang, et al.
Published: (2025)
by: Fu, Chunyang, et al.
Published: (2025)
Unified Coding for Both Human Perception and Generalized Machine Analytics with CLIP Supervision
by: Yin, Kangsheng, et al.
Published: (2025)
by: Yin, Kangsheng, et al.
Published: (2025)
Brain-Grasp: Graph-based Saliency Priors for Improved fMRI-based Visual Brain Decoding
by: Moradi, Mohammad, et al.
Published: (2026)
by: Moradi, Mohammad, et al.
Published: (2026)
Seeing Sarcasm Through Different Eyes: Analyzing Multimodal Sarcasm Perception in Large Vision-Language Models
by: Chen, Junjie, et al.
Published: (2025)
by: Chen, Junjie, et al.
Published: (2025)
Learning Long-Range Action Representation by Two-Stream Mamba Pyramid Network for Figure Skating Assessment
by: Wang, Fengshun, et al.
Published: (2025)
by: Wang, Fengshun, et al.
Published: (2025)
Multimodal Semantic Communication for Generative Audio-Driven Video Conferencing
by: Tong, Haonan, et al.
Published: (2024)
by: Tong, Haonan, et al.
Published: (2024)
Contextual Wireless Video Semantic Communication in MIMO-OFDM Systems
by: Xie, Bingyan, et al.
Published: (2026)
by: Xie, Bingyan, et al.
Published: (2026)
Visual Grounding with Multi-modal Conditional Adaptation
by: Yao, Ruilin, et al.
Published: (2024)
by: Yao, Ruilin, et al.
Published: (2024)
Mitigating Cross-modal Representation Bias for Multicultural Image-to-Recipe Retrieval
by: Wang, Qing, et al.
Published: (2025)
by: Wang, Qing, et al.
Published: (2025)
Iterative Residual Cross-Attention Mechanism: An Integrated Approach for Audio-Visual Navigation Tasks
by: Zhang, Hailong, et al.
Published: (2025)
by: Zhang, Hailong, et al.
Published: (2025)
Language-oriented Semantic Communication for Image Transmission with Fine-Tuned Diffusion Model
by: Wei, Xinfeng, et al.
Published: (2024)
by: Wei, Xinfeng, et al.
Published: (2024)
GSSF: Generalized Structural Sparse Function for Deep Cross-modal Metric Learning
by: Diao, Haiwen, et al.
Published: (2024)
by: Diao, Haiwen, et al.
Published: (2024)
Robust Symbolic Reasoning for Visual Narratives via Hierarchical and Semantically Normalized Knowledge Graphs
by: Chen, Yi-Chun
Published: (2025)
by: Chen, Yi-Chun
Published: (2025)
Deep Mamba Multi-modal Learning
by: Zhu, Jian, et al.
Published: (2024)
by: Zhu, Jian, et al.
Published: (2024)
MMoFusion: Multi-modal Co-Speech Motion Generation with Diffusion Model
by: Wang, Sen, et al.
Published: (2024)
by: Wang, Sen, et al.
Published: (2024)
CueNet: Robust Audio-Visual Speaker Extraction through Cross-Modal Cue Mining and Interaction
by: Wang, Jiadong, et al.
Published: (2026)
by: Wang, Jiadong, et al.
Published: (2026)
Tile Classification Based Viewport Prediction with Multi-modal Fusion Transformer
by: Zhang, Zhihao, et al.
Published: (2023)
by: Zhang, Zhihao, et al.
Published: (2023)
Open-Vocabulary Audio-Visual Semantic Segmentation
by: Guo, Ruohao, et al.
Published: (2024)
by: Guo, Ruohao, et al.
Published: (2024)
Similar Items
-
Compact Visual Data Representation for Green Multimedia -- A Human Visual System Perspective
by: Chen, Peilin, et al.
Published: (2024) -
MindTuner: Cross-Subject Visual Decoding with Visual Fingerprint and Semantic Correction
by: Gong, Zixuan, et al.
Published: (2024) -
Perception-Aware Video Semantic Communication
by: Huang, Yinhuan, et al.
Published: (2026) -
Deep Shape-Texture Statistics for Completely Blind Image Quality Evaluation
by: Li, Yixuan, et al.
Published: (2024) -
A Multi-modal Fusion Network for Terrain Perception Based on Illumination Aware
by: Wang, Rui, et al.
Published: (2025)