:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zheng, Linfeng, Chen, Peilin, Wang, Shiqi
Format:	Preprint
Published:	2024
Subjects:	Multimedia
Online Access:	https://arxiv.org/abs/2407.14936
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Compact Visual Data Representation for Green Multimedia -- A Human Visual System Perspective
by: Chen, Peilin, et al.
Published: (2024)

MindTuner: Cross-Subject Visual Decoding with Visual Fingerprint and Semantic Correction
by: Gong, Zixuan, et al.
Published: (2024)

Perception-Aware Video Semantic Communication
by: Huang, Yinhuan, et al.
Published: (2026)

Deep Shape-Texture Statistics for Completely Blind Image Quality Evaluation
by: Li, Yixuan, et al.
Published: (2024)

A Multi-modal Fusion Network for Terrain Perception Based on Illumination Aware
by: Wang, Rui, et al.
Published: (2025)

Wills Aligner: Multi-Subject Collaborative Brain Visual Decoding
by: Bao, Guangyin, et al.
Published: (2024)

Token Communications: A Large Model-Driven Framework for Cross-modal Context-aware Semantic Communications
by: Qiao, Li, et al.
Published: (2025)

An Emotion Recognition Framework via Cross-modal Alignment of EEG and Eye Movement Data
by: Wang, Jianlu, et al.
Published: (2025)

Learning Brain Representation with Hierarchical Visual Embeddings
by: Zheng, Jiawen, et al.
Published: (2026)

When Video Coding Meets Multimodal Large Language Models: A Unified Paradigm for Video Coding
by: Zhang, Pingping, et al.
Published: (2024)

CrossPT-EEG: A Benchmark for Cross-Participant and Cross-Time Generalization of EEG-based Visual Decoding
by: Zhu, Shuqi, et al.
Published: (2024)

MViR: Multi-View Visual-Semantic Representation for Fake News Detection
by: Liang, Haochen, et al.
Published: (2026)

SciCom Wiki: A Digital Library to Support the Science Communication Knowledge Infrastructure for Videos and Podcasts
by: Wittenborg, Tim, et al.
Published: (2025)

Deep Reversible Consistency Learning for Cross-modal Retrieval
by: Pu, Ruitao, et al.
Published: (2025)

CLASP: Cross-modal Salient Anchor-based Semantic Propagation for Weakly-supervised Dense Audio-Visual Event Localization
by: Zhou, Jinxing, et al.
Published: (2025)

Mixture-of-Prompt-Experts for Multi-modal Semantic Understanding
by: Wu, Zichen, et al.
Published: (2024)

Cross-Layer Encrypted Semantic Communication Framework for Panoramic Video Transmission
by: Gao, Haixiao, et al.
Published: (2024)

Latent Feature-Guided Conditional Diffusion for Generative Image Semantic Communication
by: Chen, Zehao, et al.
Published: (2025)

Towards Multimodal Sentiment Analysis via Contrastive Cross-modal Retrieval Augmentation and Hierachical Prompts
by: Zhao, Xianbing, et al.
Published: (2025)

Audio-Guided Visual Perception for Audio-Visual Navigation
by: Wang, Yi, et al.
Published: (2025)

Cross-Modality and Within-Modality Regularization for Audio-Visual DeepFake Detection
by: Zou, Heqing, et al.
Published: (2024)

CAMeL: Cross-modality Adaptive Meta-Learning for Text-based Person Retrieval
by: Yu, Hang, et al.
Published: (2025)

Voxel-GS: Quantized Scaffold Gaussian Splatting Compression with Run-Length Coding
by: Fu, Chunyang, et al.
Published: (2025)

Unified Coding for Both Human Perception and Generalized Machine Analytics with CLIP Supervision
by: Yin, Kangsheng, et al.
Published: (2025)

Brain-Grasp: Graph-based Saliency Priors for Improved fMRI-based Visual Brain Decoding
by: Moradi, Mohammad, et al.
Published: (2026)

Seeing Sarcasm Through Different Eyes: Analyzing Multimodal Sarcasm Perception in Large Vision-Language Models
by: Chen, Junjie, et al.
Published: (2025)

Learning Long-Range Action Representation by Two-Stream Mamba Pyramid Network for Figure Skating Assessment
by: Wang, Fengshun, et al.
Published: (2025)

Multimodal Semantic Communication for Generative Audio-Driven Video Conferencing
by: Tong, Haonan, et al.
Published: (2024)

Contextual Wireless Video Semantic Communication in MIMO-OFDM Systems
by: Xie, Bingyan, et al.
Published: (2026)

Visual Grounding with Multi-modal Conditional Adaptation
by: Yao, Ruilin, et al.
Published: (2024)

Mitigating Cross-modal Representation Bias for Multicultural Image-to-Recipe Retrieval
by: Wang, Qing, et al.
Published: (2025)

Iterative Residual Cross-Attention Mechanism: An Integrated Approach for Audio-Visual Navigation Tasks
by: Zhang, Hailong, et al.
Published: (2025)

Language-oriented Semantic Communication for Image Transmission with Fine-Tuned Diffusion Model
by: Wei, Xinfeng, et al.
Published: (2024)

GSSF: Generalized Structural Sparse Function for Deep Cross-modal Metric Learning
by: Diao, Haiwen, et al.
Published: (2024)

Robust Symbolic Reasoning for Visual Narratives via Hierarchical and Semantically Normalized Knowledge Graphs
by: Chen, Yi-Chun
Published: (2025)

Deep Mamba Multi-modal Learning
by: Zhu, Jian, et al.
Published: (2024)

MMoFusion: Multi-modal Co-Speech Motion Generation with Diffusion Model
by: Wang, Sen, et al.
Published: (2024)

CueNet: Robust Audio-Visual Speaker Extraction through Cross-Modal Cue Mining and Interaction
by: Wang, Jiadong, et al.
Published: (2026)

Tile Classification Based Viewport Prediction with Multi-modal Fusion Transformer
by: Zhang, Zhihao, et al.
Published: (2023)

Open-Vocabulary Audio-Visual Semantic Segmentation
by: Guo, Ruohao, et al.
Published: (2024)