Saved in:
| Main Authors: | Wang, Haofeng, Guo, Yilin, Li, Zehao, Yue, Tong, Wang, Yizong, Zhang, Enci, Lin, Rongqun, Gao, Feng, Wang, Shiqi, Ma, Siwei |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.21865 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
End-to-End RGB-IR Joint Image Compression With Channel-wise Cross-modality Entropy Model
by: Wang, Haofeng, et al.
Published: (2025)
by: Wang, Haofeng, et al.
Published: (2025)
Compact Visual Data Representation for Green Multimedia -- A Human Visual System Perspective
by: Chen, Peilin, et al.
Published: (2024)
by: Chen, Peilin, et al.
Published: (2024)
SPC-NeRF: Spatial Predictive Compression for Voxel Based Radiance Field
by: Song, Zetian, et al.
Published: (2024)
by: Song, Zetian, et al.
Published: (2024)
VARFVV: View-Adaptive Real-Time Interactive Free-View Video Streaming with Edge Computing
by: Hu, Qiang, et al.
Published: (2025)
by: Hu, Qiang, et al.
Published: (2025)
Video Echoed in Music: Semantic, Temporal, and Rhythmic Alignment for Video-to-Music Generation
by: Tong, Xinyi, et al.
Published: (2025)
by: Tong, Xinyi, et al.
Published: (2025)
EidetiCom: A Cross-modal Brain-Computer Semantic Communication Paradigm for Decoding Visual Perception
by: Zheng, Linfeng, et al.
Published: (2024)
by: Zheng, Linfeng, et al.
Published: (2024)
Latent Feature-Guided Conditional Diffusion for Generative Image Semantic Communication
by: Chen, Zehao, et al.
Published: (2025)
by: Chen, Zehao, et al.
Published: (2025)
Voxel-GS: Quantized Scaffold Gaussian Splatting Compression with Run-Length Coding
by: Fu, Chunyang, et al.
Published: (2025)
by: Fu, Chunyang, et al.
Published: (2025)
Period-conscious Time-series Reconstruction under Local Differential Privacy
by: Wang, Yaxuan, et al.
Published: (2026)
by: Wang, Yaxuan, et al.
Published: (2026)
The Rhythm of Tai Chi: Revitalizing Cultural Heritage in Virtual Reality through Interactive Visuals
by: Wang, Xianghan
Published: (2025)
by: Wang, Xianghan
Published: (2025)
An Emotion Recognition Framework via Cross-modal Alignment of EEG and Eye Movement Data
by: Wang, Jianlu, et al.
Published: (2025)
by: Wang, Jianlu, et al.
Published: (2025)
Rethinking Bjøntegaard Delta for Compression Efficiency Evaluation: Are We Calculating It Precisely and Reliably?
by: Hang, Xinyu, et al.
Published: (2024)
by: Hang, Xinyu, et al.
Published: (2024)
Predicting Satisfied User and Machine Ratio for Compressed Images: A Unified Approach
by: Zhang, Qi, et al.
Published: (2024)
by: Zhang, Qi, et al.
Published: (2024)
ASAP: Advancing Semantic Alignment Promotes Multi-Modal Manipulation Detecting and Grounding
by: Zhang, Zhenxing, et al.
Published: (2024)
by: Zhang, Zhenxing, et al.
Published: (2024)
Opinion-Unaware Blind Image Quality Assessment using Multi-Scale Deep Feature Statistics
by: Ni, Zhangkai, et al.
Published: (2024)
by: Ni, Zhangkai, et al.
Published: (2024)
EchoSR: Efficient Context Harnessing for Lightweight Image Super-Resolution
by: Zhao, Hanli, et al.
Published: (2026)
by: Zhao, Hanli, et al.
Published: (2026)
LiveK12Bench: Have Large Multimodal Models Truly Conquered High School-level Examinations?
by: Wang, Xiaohan, et al.
Published: (2026)
by: Wang, Xiaohan, et al.
Published: (2026)
Perceive-Sample-Compress: Towards Real-Time 3D Gaussian Splatting
by: Wang, Zijian, et al.
Published: (2025)
by: Wang, Zijian, et al.
Published: (2025)
EmotionTalk: An Interactive Chinese Multimodal Emotion Dataset With Rich Annotations
by: Sun, Haoqin, et al.
Published: (2025)
by: Sun, Haoqin, et al.
Published: (2025)
Towards Real-World Stickers Use: A New Dataset for Multi-Tag Sticker Recognition
by: Wang, Bingbing, et al.
Published: (2024)
by: Wang, Bingbing, et al.
Published: (2024)
Enkidu: Universal Frequential Perturbation for Real-Time Audio Privacy Protection against Voice Deepfakes
by: Feng, Zhou, et al.
Published: (2025)
by: Feng, Zhou, et al.
Published: (2025)
Real-Time Interactive Hybrid Ocean: Spectrum-Consistent Wave Particle-FFT Coupling
by: Xue, Shengze, et al.
Published: (2025)
by: Xue, Shengze, et al.
Published: (2025)
Virbo: Multimodal Multilingual Avatar Video Generation in Digital Marketing
by: Zhang, Juan, et al.
Published: (2024)
by: Zhang, Juan, et al.
Published: (2024)
Advanced Learning-Based Inter Prediction for Future Video Coding
by: Zhao, Yanchen, et al.
Published: (2024)
by: Zhao, Yanchen, et al.
Published: (2024)
Transforming Video Subjective Testing with Training, Engagement, and Real-Time Feedback
by: Rahul, Kumar, et al.
Published: (2026)
by: Rahul, Kumar, et al.
Published: (2026)
Mitigating Multimodal Inconsistency via Cognitive Dual-Pathway Reasoning for Intent Recognition
by: Wang, Yifan, et al.
Published: (2026)
by: Wang, Yifan, et al.
Published: (2026)
DeepStream: Prototyping Deep Joint Source-Channel Coding for Real-Time Multimedia Transmissions
by: Chi, Kaiyi, et al.
Published: (2025)
by: Chi, Kaiyi, et al.
Published: (2025)
Detached and Interactive Multimodal Learning
by: Fan, Yunfeng, et al.
Published: (2024)
by: Fan, Yunfeng, et al.
Published: (2024)
TCC-Bench: Benchmarking the Traditional Chinese Culture Understanding Capabilities of MLLMs
by: Xu, Pengju, et al.
Published: (2025)
by: Xu, Pengju, et al.
Published: (2025)
Hue4U: Real-Time Personalized Color Correction in Augmented Reality
by: Qin, Jingwen, et al.
Published: (2025)
by: Qin, Jingwen, et al.
Published: (2025)
RealBench: A Chinese Multi-image Understanding Benchmark Close to Real-world Scenarios
by: Zhao, Fei, et al.
Published: (2025)
by: Zhao, Fei, et al.
Published: (2025)
SSNVC: Single Stream Neural Video Compression with Implicit Temporal Information
by: Wang, Feng, et al.
Published: (2024)
by: Wang, Feng, et al.
Published: (2024)
Muse: A Multimodal Conversational Recommendation Dataset with Scenario-Grounded User Profiles
by: Wang, Zihan, et al.
Published: (2024)
by: Wang, Zihan, et al.
Published: (2024)
Volume Tracking Based Reference Mesh Extraction for Time-Varying Mesh Compression
by: Chen, Guodong, et al.
Published: (2024)
by: Chen, Guodong, et al.
Published: (2024)
Physics-Aware Novel-View Acoustic Synthesis with Vision-Language Priors and 3D Acoustic Environment Modeling
by: Fan, Congyi, et al.
Published: (2026)
by: Fan, Congyi, et al.
Published: (2026)
TCAN: Text-oriented Cross Attention Network for Multimodal Sentiment Analysis
by: Quan, Weize, et al.
Published: (2024)
by: Quan, Weize, et al.
Published: (2024)
Semi-supervised Chinese Poem-to-Painting Generation via Cycle-consistent Adversarial Networks
by: Lu, Zhengyang, et al.
Published: (2024)
by: Lu, Zhengyang, et al.
Published: (2024)
LMM4Edit: Benchmarking and Evaluating Multimodal Image Editing with LMMs
by: Xu, Zitong, et al.
Published: (2025)
by: Xu, Zitong, et al.
Published: (2025)
CueNet: Robust Audio-Visual Speaker Extraction through Cross-Modal Cue Mining and Interaction
by: Wang, Jiadong, et al.
Published: (2026)
by: Wang, Jiadong, et al.
Published: (2026)
THE WASTIVE: An Interactive Ebb and Flow of Digital Fabrication Waste
by: Shan, Yifan, et al.
Published: (2025)
by: Shan, Yifan, et al.
Published: (2025)
Similar Items
-
End-to-End RGB-IR Joint Image Compression With Channel-wise Cross-modality Entropy Model
by: Wang, Haofeng, et al.
Published: (2025) -
Compact Visual Data Representation for Green Multimedia -- A Human Visual System Perspective
by: Chen, Peilin, et al.
Published: (2024) -
SPC-NeRF: Spatial Predictive Compression for Voxel Based Radiance Field
by: Song, Zetian, et al.
Published: (2024) -
VARFVV: View-Adaptive Real-Time Interactive Free-View Video Streaming with Edge Computing
by: Hu, Qiang, et al.
Published: (2025) -
Video Echoed in Music: Semantic, Temporal, and Rhythmic Alignment for Video-to-Music Generation
by: Tong, Xinyi, et al.
Published: (2025)