Saved in:
| Main Authors: | Chen, Weilong, Xu, Wenxuan, Chen, Haoran, Zhang, Xinran, Qin, Zhijin, Zhang, Yanru, Han, Zhu |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2408.12616 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Large AI Model-Enabled Generative Semantic Communications for Image Transmission
by: Ma, Qiyu, et al.
Published: (2025)
by: Ma, Qiyu, et al.
Published: (2025)
VQ-DeepISC: Vector Quantized-Enabled Digital Semantic Communication with Channel Adaptive Image Transmission
by: Chen, Jianqiao, et al.
Published: (2025)
by: Chen, Jianqiao, et al.
Published: (2025)
POLYCHARTQA: Benchmarking Large Vision-Language Models with Multilingual Chart Question Answering
by: Xu, Yichen, et al.
Published: (2025)
by: Xu, Yichen, et al.
Published: (2025)
Knowledge-Base based Semantic Image Transmission Using CLIP
by: Li, Chongyang, et al.
Published: (2025)
by: Li, Chongyang, et al.
Published: (2025)
Mitigating Hallucinations in Video Large Language Models via Spatiotemporal-Semantic Contrastive Decoding
by: Gao, Yuansheng, et al.
Published: (2026)
by: Gao, Yuansheng, et al.
Published: (2026)
Gram-Anchored Prompt Learning for Vision-Language Models via Second-Order Statistics
by: Chen, Minglei, et al.
Published: (2026)
by: Chen, Minglei, et al.
Published: (2026)
Semantic-Clipping: Efficient Vision-Language Modeling with Semantic-Guidedd Visual Selection
by: Li, Bangzheng, et al.
Published: (2025)
by: Li, Bangzheng, et al.
Published: (2025)
Semantic Feature Decomposition based Semantic Communication System of Images with Large-scale Visual Generation Models
by: Fan, Senran, et al.
Published: (2024)
by: Fan, Senran, et al.
Published: (2024)
Evolving Prompt Adaptation for Vision-Language Models
by: Zhang, Enming, et al.
Published: (2026)
by: Zhang, Enming, et al.
Published: (2026)
How to Evaluate Semantic Communications for Images with ViTScore Metric?
by: Zhu, Tingting, et al.
Published: (2023)
by: Zhu, Tingting, et al.
Published: (2023)
Security Risk of Misalignment between Text and Image in Multi-modal Model
by: Wang, Xiaosen, et al.
Published: (2025)
by: Wang, Xiaosen, et al.
Published: (2025)
Large Language Model-Driven Distributed Integrated Multimodal Sensing and Semantic Communications
by: Peng, Yubo, et al.
Published: (2025)
by: Peng, Yubo, et al.
Published: (2025)
An Empirical Study on the Robustness of YOLO Models for Underwater Object Detection
by: Nabahirwa, Edwine, et al.
Published: (2025)
by: Nabahirwa, Edwine, et al.
Published: (2025)
ESA: Energy-Based Shot Assembly Optimization for Automatic Video Editing
by: Chen, Yaosen, et al.
Published: (2025)
by: Chen, Yaosen, et al.
Published: (2025)
StyleGuard: Preventing Text-to-Image-Model-based Style Mimicry Attacks by Style Perturbations
by: Li, Yanjie, et al.
Published: (2025)
by: Li, Yanjie, et al.
Published: (2025)
Swarm Intelligence in Geo-Localization: A Multi-Agent Large Vision-Language Model Collaborative Framework
by: Han, Xiao, et al.
Published: (2024)
by: Han, Xiao, et al.
Published: (2024)
Beyond Visual Safety: Jailbreaking Multimodal Large Language Models for Harmful Image Generation via Semantic-Agnostic Inputs
by: Yu, Mingyu, et al.
Published: (2026)
by: Yu, Mingyu, et al.
Published: (2026)
LLMTrack: Semantic Multi-Object Tracking with Multi-modal Large Language Models
by: Liao, Pan, et al.
Published: (2026)
by: Liao, Pan, et al.
Published: (2026)
SegMix:Shuffle-based Feedback Learning for Semantic Segmentation of Pathology Images
by: Yan, Zhiling, et al.
Published: (2026)
by: Yan, Zhiling, et al.
Published: (2026)
mDPO: Conditional Preference Optimization for Multimodal Large Language Models
by: Wang, Fei, et al.
Published: (2024)
by: Wang, Fei, et al.
Published: (2024)
TSVC:Tripartite Learning with Semantic Variation Consistency for Robust Image-Text Retrieval
by: Lyu, Shuai, et al.
Published: (2025)
by: Lyu, Shuai, et al.
Published: (2025)
A Structured Review of Underwater Object Detection Challenges and Solutions: From Traditional to Large Vision Language Models
by: Nabahirwa, Edwine, et al.
Published: (2025)
by: Nabahirwa, Edwine, et al.
Published: (2025)
GameVerse: Can Vision-Language Models Learn from Video-based Reflection?
by: Zhang, Kuan, et al.
Published: (2026)
by: Zhang, Kuan, et al.
Published: (2026)
ICDM: Interference Cancellation Diffusion Models for Wireless Semantic Communications
by: Wu, Tong, et al.
Published: (2025)
by: Wu, Tong, et al.
Published: (2025)
TMT: Cross-domain Semantic Segmentation with Region-adaptive Transferability Estimation
by: Zhang, Enming, et al.
Published: (2025)
by: Zhang, Enming, et al.
Published: (2025)
A Cognitive Evaluation Benchmark of Image Reasoning and Description for Large Vision-Language Models
by: Song, Xiujie, et al.
Published: (2024)
by: Song, Xiujie, et al.
Published: (2024)
Can Multimodal Large Language Models Truly Understand Small Objects?
by: Han, Fujun, et al.
Published: (2026)
by: Han, Fujun, et al.
Published: (2026)
JDPNet: A Network Based on Joint Degradation Processing for Underwater Image Enhancement
by: Ye, Tao, et al.
Published: (2025)
by: Ye, Tao, et al.
Published: (2025)
Physical Prompt Injection Attacks on Large Vision-Language Models
by: Ling, Chen, et al.
Published: (2026)
by: Ling, Chen, et al.
Published: (2026)
ChartM$^3$: Benchmarking Chart Editing with Multimodal Instructions
by: Yang, Donglu, et al.
Published: (2025)
by: Yang, Donglu, et al.
Published: (2025)
Global Semantic-Guided Sub-image Feature Weight Allocation in High-Resolution Large Vision-Language Models
by: Liang, Yuxuan, et al.
Published: (2025)
by: Liang, Yuxuan, et al.
Published: (2025)
Empowering Semantic-Sensitive Underwater Image Enhancement with VLM
by: Fan, Guodong, et al.
Published: (2026)
by: Fan, Guodong, et al.
Published: (2026)
ERGeoBench:A Comprehensive Benchmark for Embodied Reasoning and Geo-localization in Multimodal Large Language Models
by: Xue, Kaiwen, et al.
Published: (2026)
by: Xue, Kaiwen, et al.
Published: (2026)
XiHeFusion: Harnessing Large Language Models for Science Communication in Nuclear Fusion
by: Wang, Xiao, et al.
Published: (2025)
by: Wang, Xiao, et al.
Published: (2025)
Deep Hashing with Semantic Hash Centers for Image Retrieval
by: Chen, Li, et al.
Published: (2025)
by: Chen, Li, et al.
Published: (2025)
Dual-Granularity Semantic Prompting for Language Guidance Infrared Small Target Detection
by: Wang, Zixuan, et al.
Published: (2025)
by: Wang, Zixuan, et al.
Published: (2025)
Single Image Unlearning: Efficient Machine Unlearning in Multimodal Large Language Models
by: Li, Jiaqi, et al.
Published: (2024)
by: Li, Jiaqi, et al.
Published: (2024)
Semantically Aware UAV Landing Site Assessment from Remote Sensing Imagery via Multimodal Large Language Models
by: Hua, Chunliang, et al.
Published: (2026)
by: Hua, Chunliang, et al.
Published: (2026)
Conjugated Semantic Pool Improves OOD Detection with Pre-trained Vision-Language Models
by: Chen, Mengyuan, et al.
Published: (2024)
by: Chen, Mengyuan, et al.
Published: (2024)
Breaking Modality Heterogeneity in Low-Bit Quantization for Large Vision-Language Models
by: Zhong, Yi, et al.
Published: (2026)
by: Zhong, Yi, et al.
Published: (2026)
Similar Items
-
Large AI Model-Enabled Generative Semantic Communications for Image Transmission
by: Ma, Qiyu, et al.
Published: (2025) -
VQ-DeepISC: Vector Quantized-Enabled Digital Semantic Communication with Channel Adaptive Image Transmission
by: Chen, Jianqiao, et al.
Published: (2025) -
POLYCHARTQA: Benchmarking Large Vision-Language Models with Multilingual Chart Question Answering
by: Xu, Yichen, et al.
Published: (2025) -
Knowledge-Base based Semantic Image Transmission Using CLIP
by: Li, Chongyang, et al.
Published: (2025) -
Mitigating Hallucinations in Video Large Language Models via Spatiotemporal-Semantic Contrastive Decoding
by: Gao, Yuansheng, et al.
Published: (2026)