Saved in:
| Main Authors: | Li, Binzhe, Wang, Shurun, Wang, Shiqi, Ye, Yan |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2407.17060 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
EVTP-IVS: Effective Visual Token Pruning For Unifying Instruction Visual Segmentation In Multi-Modal Large Language Models
by: Zhu, Wenhui, et al.
Published: (2025)
by: Zhu, Wenhui, et al.
Published: (2025)
Generative Visual Compression: A Review
by: Chen, Bolin, et al.
Published: (2024)
by: Chen, Bolin, et al.
Published: (2024)
MindVL: Towards Efficient and Effective Training of Multimodal Large Language Models on Ascend NPUs
by: Chen, Feilong, et al.
Published: (2025)
by: Chen, Feilong, et al.
Published: (2025)
Logic-in-Frames: Dynamic Keyframe Search via Visual Semantic-Logical Verification for Long Video Understanding
by: Guo, Weiyu, et al.
Published: (2025)
by: Guo, Weiyu, et al.
Published: (2025)
VisionUnite: A Vision-Language Foundation Model for Ophthalmology Enhanced with Clinical Knowledge
by: Li, Zihan, et al.
Published: (2024)
by: Li, Zihan, et al.
Published: (2024)
Multi-Agent Visual-Language Reasoning for Comprehensive Highway Scene Understanding
by: Yang, Yunxiang, et al.
Published: (2025)
by: Yang, Yunxiang, et al.
Published: (2025)
An Early Investigation into the Utility of Multimodal Large Language Models in Medical Imaging
by: Khan, Sulaiman, et al.
Published: (2024)
by: Khan, Sulaiman, et al.
Published: (2024)
MISC: Ultra-low Bitrate Image Semantic Compression Driven by Large Multimodal Model
by: Li, Chunyi, et al.
Published: (2024)
by: Li, Chunyi, et al.
Published: (2024)
Large-vocabulary forensic pathological analyses via prototypical cross-modal contrastive learning
by: Shen, Chen, et al.
Published: (2024)
by: Shen, Chen, et al.
Published: (2024)
Large Language Models for Lossless Image Compression: Next-Pixel Prediction in Language Space is All You Need
by: Chen, Kecheng, et al.
Published: (2024)
by: Chen, Kecheng, et al.
Published: (2024)
CAMSIC: Content-aware Masked Image Modeling Transformer for Stereo Image Compression
by: Zhang, Xinjie, et al.
Published: (2024)
by: Zhang, Xinjie, et al.
Published: (2024)
Generative Human Video Compression with Multi-granularity Temporal Trajectory Factorization
by: Yin, Shanzhi, et al.
Published: (2024)
by: Yin, Shanzhi, et al.
Published: (2024)
Comprehensive Evaluation of Multimodal AI Models in Medical Imaging Diagnosis: From Data Augmentation to Preference-Based Comparison
by: Ruan, Cailian, et al.
Published: (2024)
by: Ruan, Cailian, et al.
Published: (2024)
Large Language Model for Lossless Image Compression with Visual Prompts
by: Du, Junhao, et al.
Published: (2025)
by: Du, Junhao, et al.
Published: (2025)
Bidirectional Stereo Image Compression with Cross-Dimensional Entropy Model
by: Liu, Zhening, et al.
Published: (2024)
by: Liu, Zhening, et al.
Published: (2024)
InternVQA: Advancing Compressed Video Quality Assessment with Distilling Large Foundation Model
by: Guan, Fengbin, et al.
Published: (2025)
by: Guan, Fengbin, et al.
Published: (2025)
Sensitivity Decouple Learning for Image Compression Artifacts Reduction
by: Ma, Li, et al.
Published: (2024)
by: Ma, Li, et al.
Published: (2024)
FetalCLIP: A Visual-Language Foundation Model for Fetal Ultrasound Image Analysis
by: Maani, Fadillah, et al.
Published: (2025)
by: Maani, Fadillah, et al.
Published: (2025)
Multiple Latent Space Mapping for Compressed Dark Image Enhancement
by: Zeng, Yi, et al.
Published: (2024)
by: Zeng, Yi, et al.
Published: (2024)
Can Large Language Models Challenge CNNs in Medical Image Analysis?
by: Ahmed, Shibbir, et al.
Published: (2025)
by: Ahmed, Shibbir, et al.
Published: (2025)
High-Frequency Enhanced Hybrid Neural Representation for Video Compression
by: Yu, Li, et al.
Published: (2024)
by: Yu, Li, et al.
Published: (2024)
Lost in Translation? Translation Errors and Challenges for Fair Assessment of Text-to-Image Models on Multilingual Concepts
by: Saxon, Michael, et al.
Published: (2024)
by: Saxon, Michael, et al.
Published: (2024)
Inserting Faces inside Captions: Image Captioning with Attention Guided Merging
by: Tevissen, Yannis, et al.
Published: (2024)
by: Tevissen, Yannis, et al.
Published: (2024)
Semi-Supervised Medical Image Segmentation via Knowledge Mining from Large Models
by: Mao, Yuchen, et al.
Published: (2025)
by: Mao, Yuchen, et al.
Published: (2025)
GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian Splatting
by: Zhang, Xinjie, et al.
Published: (2024)
by: Zhang, Xinjie, et al.
Published: (2024)
Generative Artificial Intelligence in Medical Imaging: Foundations, Progress, and Clinical Translation
by: Zhou, Xuanru, et al.
Published: (2025)
by: Zhou, Xuanru, et al.
Published: (2025)
Evaluating the Diagnostic Classification Ability of Multimodal Large Language Models: Insights from the Osteoarthritis Initiative
by: Wang, Li, et al.
Published: (2026)
by: Wang, Li, et al.
Published: (2026)
Transformations in Learned Image Compression from a Modulation Perspective
by: Bao, Youneng, et al.
Published: (2022)
by: Bao, Youneng, et al.
Published: (2022)
3D Point Cloud Compression with Recurrent Neural Network and Image Compression Methods
by: Beemelmanns, Till, et al.
Published: (2024)
by: Beemelmanns, Till, et al.
Published: (2024)
Trustworthy Medical Imaging with Large Language Models: A Study of Hallucinations Across Modalities
by: Das, Anindya Bijoy, et al.
Published: (2025)
by: Das, Anindya Bijoy, et al.
Published: (2025)
Frequent Pattern Mining approach to Image Compression
by: Kadimisetty, Avinash, et al.
Published: (2026)
by: Kadimisetty, Avinash, et al.
Published: (2026)
A Transformer-Based Multi-Stream Approach for Isolated Iranian Sign Language Recognition
by: Ghadami, Ali, et al.
Published: (2024)
by: Ghadami, Ali, et al.
Published: (2024)
Image Super-Resolution with Taylor Expansion Approximation and Large Field Reception
by: Feng, Jiancong, et al.
Published: (2024)
by: Feng, Jiancong, et al.
Published: (2024)
Vision Language Models in Medicine
by: Kalpelbe, Beria Chingnabe, et al.
Published: (2025)
by: Kalpelbe, Beria Chingnabe, et al.
Published: (2025)
Surgical-LVLM: Learning to Adapt Large Vision-Language Model for Grounded Visual Question Answering in Robotic Surgery
by: Wang, Guankun, et al.
Published: (2024)
by: Wang, Guankun, et al.
Published: (2024)
Surgical Foundation Model Leveraging Compression and Entropy Maximization for Image-Guided Surgical Assistance
by: Yin, Lianhao, et al.
Published: (2025)
by: Yin, Lianhao, et al.
Published: (2025)
An Empirical Study on the Fairness of Foundation Models for Multi-Organ Image Segmentation
by: Li, Qin, et al.
Published: (2024)
by: Li, Qin, et al.
Published: (2024)
Beyond GFVC: A Progressive Face Video Compression Framework with Adaptive Visual Tokens
by: Chen, Bolin, et al.
Published: (2024)
by: Chen, Bolin, et al.
Published: (2024)
Potential of Multimodal Large Language Models for Data Mining of Medical Images and Free-text Reports
by: Zhang, Yutong, et al.
Published: (2024)
by: Zhang, Yutong, et al.
Published: (2024)
Task-Aware Encoder Control for Deep Video Compression
by: Ge, Xingtong, et al.
Published: (2024)
by: Ge, Xingtong, et al.
Published: (2024)
Similar Items
-
EVTP-IVS: Effective Visual Token Pruning For Unifying Instruction Visual Segmentation In Multi-Modal Large Language Models
by: Zhu, Wenhui, et al.
Published: (2025) -
Generative Visual Compression: A Review
by: Chen, Bolin, et al.
Published: (2024) -
MindVL: Towards Efficient and Effective Training of Multimodal Large Language Models on Ascend NPUs
by: Chen, Feilong, et al.
Published: (2025) -
Logic-in-Frames: Dynamic Keyframe Search via Visual Semantic-Logical Verification for Long Video Understanding
by: Guo, Weiyu, et al.
Published: (2025) -
VisionUnite: A Vision-Language Foundation Model for Ophthalmology Enhanced with Clinical Knowledge
by: Li, Zihan, et al.
Published: (2024)