Saved in:
| Main Authors: | Guo, Ziyan, Xu, Li, Liu, Jun |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2311.09680 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Vision Generalist Model: A Survey
by: Wang, Ziyi, et al.
Published: (2025)
by: Wang, Ziyi, et al.
Published: (2025)
Diffusion Models in Low-Level Vision: A Survey
by: He, Chunming, et al.
Published: (2024)
by: He, Chunming, et al.
Published: (2024)
On the Trustworthiness Landscape of State-of-the-art Generative Models: A Survey and Outlook
by: Fan, Mingyuan, et al.
Published: (2023)
by: Fan, Mingyuan, et al.
Published: (2023)
Leveraging Vision-Language Models to Select Trustworthy Super-Resolution Samples Generated by Diffusion Models
by: Korkmaz, Cansu, et al.
Published: (2025)
by: Korkmaz, Cansu, et al.
Published: (2025)
A Survey on Vision Autoregressive Model
by: Jiang, Kai, et al.
Published: (2024)
by: Jiang, Kai, et al.
Published: (2024)
The Paradigm Shift: A Comprehensive Survey on Large Vision Language Models for Multimodal Fake News Detection
by: Ai, Wei, et al.
Published: (2026)
by: Ai, Wei, et al.
Published: (2026)
Generative Physical AI in Vision: A Survey
by: Liu, Daochang, et al.
Published: (2025)
by: Liu, Daochang, et al.
Published: (2025)
Compound Expression Recognition via Large Vision-Language Models
by: Yu, Jun, et al.
Published: (2025)
by: Yu, Jun, et al.
Published: (2025)
Assessing Color Vision Test in Large Vision-language Models
by: Ye, Hongfei, et al.
Published: (2025)
by: Ye, Hongfei, et al.
Published: (2025)
3D-CT-GPT: Generating 3D Radiology Reports through Integration of Large Vision-Language Models
by: Chen, Hao, et al.
Published: (2024)
by: Chen, Hao, et al.
Published: (2024)
Vision Language Models in Autonomous Driving: A Survey and Outlook
by: Zhou, Xingcheng, et al.
Published: (2023)
by: Zhou, Xingcheng, et al.
Published: (2023)
Efficient Diffusion Models for Vision: A Survey
by: Ulhaq, Anwaar, et al.
Published: (2022)
by: Ulhaq, Anwaar, et al.
Published: (2022)
FoPru: Focal Pruning for Efficient Large Vision-Language Models
by: Jiang, Lei, et al.
Published: (2024)
by: Jiang, Lei, et al.
Published: (2024)
A Survey on Mamba Architecture for Vision Applications
by: Ibrahim, Fady, et al.
Published: (2025)
by: Ibrahim, Fady, et al.
Published: (2025)
AdPO: Enhancing the Adversarial Robustness of Large Vision-Language Models with Preference Optimization
by: Liu, Chaohu, et al.
Published: (2025)
by: Liu, Chaohu, et al.
Published: (2025)
Efficient Multimodal Large Language Models: A Survey
by: Jin, Yizhang, et al.
Published: (2024)
by: Jin, Yizhang, et al.
Published: (2024)
TSTMotion: Training-free Scene-aware Text-to-motion Generation
by: Guo, Ziyan, et al.
Published: (2025)
by: Guo, Ziyan, et al.
Published: (2025)
Challenges and Trends in Egocentric Vision: A Survey
by: Li, Xiang, et al.
Published: (2025)
by: Li, Xiang, et al.
Published: (2025)
V-DPO: Mitigating Hallucination in Large Vision Language Models via Vision-Guided Direct Preference Optimization
by: Xie, Yuxi, et al.
Published: (2024)
by: Xie, Yuxi, et al.
Published: (2024)
Benchmarking and Mitigating Sycophancy in Medical Vision Language Models
by: Xu, Juangui, et al.
Published: (2025)
by: Xu, Juangui, et al.
Published: (2025)
Probabilistic Conceptual Explainers: Trustworthy Conceptual Explanations for Vision Foundation Models
by: Wang, Hengyi, et al.
Published: (2024)
by: Wang, Hengyi, et al.
Published: (2024)
MultiTrust: A Comprehensive Benchmark Towards Trustworthy Multimodal Large Language Models
by: Zhang, Yichi, et al.
Published: (2024)
by: Zhang, Yichi, et al.
Published: (2024)
Benchmarking Large Vision-Language Models on CFMME: A Comprehensive Chinese Financial Multimodal Evaluation Dataset
by: Chen, Qian, et al.
Published: (2026)
by: Chen, Qian, et al.
Published: (2026)
TSHA: A Benchmark for Visual Language Models in Trustworthy Safety Hazard Assessment Scenarios
by: Yu, Qiucheng, et al.
Published: (2026)
by: Yu, Qiucheng, et al.
Published: (2026)
A Brief Survey on Leveraging Large Scale Vision Models for Enhanced Robot Grasping
by: Kamboj, Abhi, et al.
Published: (2024)
by: Kamboj, Abhi, et al.
Published: (2024)
DLaVA: Document Language and Vision Assistant for Answer Localization with Enhanced Interpretability and Trustworthiness
by: Mohammadshirazi, Ahmad, et al.
Published: (2024)
by: Mohammadshirazi, Ahmad, et al.
Published: (2024)
ArchiLense: A Framework for Quantitative Analysis of Architectural Styles Based on Vision Large Language Models
by: Zhong, Jing, et al.
Published: (2025)
by: Zhong, Jing, et al.
Published: (2025)
UrbanSense:A Framework for Quantitative Analysis of Urban Streetscapes leveraging Vision Large Language Models
by: Yin, Jun, et al.
Published: (2025)
by: Yin, Jun, et al.
Published: (2025)
From Captions to Rewards (CAREVL): Leveraging Large Language Model Experts for Enhanced Reward Modeling in Large Vision-Language Models
by: Dai, Muzhi, et al.
Published: (2025)
by: Dai, Muzhi, et al.
Published: (2025)
FILA: Fine-Grained Vision Language Models
by: Zhu, Shiding, et al.
Published: (2024)
by: Zhu, Shiding, et al.
Published: (2024)
A Survey on Trustworthiness in Foundation Models for Medical Image Analysis
by: Shi, Congzhen, et al.
Published: (2024)
by: Shi, Congzhen, et al.
Published: (2024)
VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks
by: Jiang, Ziyan, et al.
Published: (2024)
by: Jiang, Ziyan, et al.
Published: (2024)
Probing Perceptual Constancy in Large Vision-Language Models
by: Sun, Haoran, et al.
Published: (2025)
by: Sun, Haoran, et al.
Published: (2025)
A Visual Semantic Adaptive Watermark grounded by Prefix-Tuning for Large Vision-Language Model
by: Zheng, Qi, et al.
Published: (2026)
by: Zheng, Qi, et al.
Published: (2026)
OmniDrive-R1: Reinforcement-driven Interleaved Multi-modal Chain-of-Thought for Trustworthy Vision-Language Autonomous Driving
by: Zhang, Zhenguo, et al.
Published: (2025)
by: Zhang, Zhenguo, et al.
Published: (2025)
Harnessing Large Vision and Language Models in Agriculture: A Review
by: Zhu, Hongyan, et al.
Published: (2024)
by: Zhu, Hongyan, et al.
Published: (2024)
Local Feature Matching Using Deep Learning: A Survey
by: Xu, Shibiao, et al.
Published: (2024)
by: Xu, Shibiao, et al.
Published: (2024)
A Survey on Agentic Multimodal Large Language Models
by: Yao, Huanjin, et al.
Published: (2025)
by: Yao, Huanjin, et al.
Published: (2025)
MBQ: Modality-Balanced Quantization for Large Vision-Language Models
by: Li, Shiyao, et al.
Published: (2024)
by: Li, Shiyao, et al.
Published: (2024)
KKA: Improving Vision Anomaly Detection through Anomaly-related Knowledge from Large Language Models
by: Chen, Dong, et al.
Published: (2025)
by: Chen, Dong, et al.
Published: (2025)
Similar Items
-
Vision Generalist Model: A Survey
by: Wang, Ziyi, et al.
Published: (2025) -
Diffusion Models in Low-Level Vision: A Survey
by: He, Chunming, et al.
Published: (2024) -
On the Trustworthiness Landscape of State-of-the-art Generative Models: A Survey and Outlook
by: Fan, Mingyuan, et al.
Published: (2023) -
Leveraging Vision-Language Models to Select Trustworthy Super-Resolution Samples Generated by Diffusion Models
by: Korkmaz, Cansu, et al.
Published: (2025) -
A Survey on Vision Autoregressive Model
by: Jiang, Kai, et al.
Published: (2024)