Saved in:
| Main Authors: | Liu, Zhendong, Nie, Yuanbi, Tan, Yingshui, Yue, Xiangyu, Cui, Qiushi, Wang, Chongjun, Zhu, Xiaoyong, Zheng, Bo |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2405.13581 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
PSA-VLM: Enhancing Vision-Language Model Safety through Progressive Concept-Bottleneck-Driven Alignment
by: Liu, Zhendong, et al.
Published: (2024)
by: Liu, Zhendong, et al.
Published: (2024)
MSR-Align: Policy-Grounded Multimodal Alignment for Safety-Aware Reasoning in Vision-Language Models
by: Xia, Yinan, et al.
Published: (2025)
by: Xia, Yinan, et al.
Published: (2025)
ConceptGuard: Proactive Safety in Text-and-Image-to-Video Generation through Multimodal Risk Detection
by: Ma, Ruize, et al.
Published: (2025)
by: Ma, Ruize, et al.
Published: (2025)
INTER: Mitigating Hallucination in Large Vision-Language Models by Interaction Guidance Sampling
by: Dong, Xin, et al.
Published: (2025)
by: Dong, Xin, et al.
Published: (2025)
SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model
by: Zhang, Yongting, et al.
Published: (2024)
by: Zhang, Yongting, et al.
Published: (2024)
Think Hierarchically, Act Dynamically: Hierarchical Multi-modal Fusion and Reasoning for Vision-and-Language Navigation
by: Yue, Junrong, et al.
Published: (2025)
by: Yue, Junrong, et al.
Published: (2025)
Vision-R1: Evolving Human-Free Alignment in Large Vision-Language Models via Vision-Guided Reinforcement Learning
by: Zhan, Yufei, et al.
Published: (2025)
by: Zhan, Yufei, et al.
Published: (2025)
FILA: Fine-Grained Vision Language Models
by: Zhu, Shiding, et al.
Published: (2024)
by: Zhu, Shiding, et al.
Published: (2024)
Swarm Intelligence in Geo-Localization: A Multi-Agent Large Vision-Language Model Collaborative Framework
by: Han, Xiao, et al.
Published: (2024)
by: Han, Xiao, et al.
Published: (2024)
VLM-Guard: Safeguarding Vision-Language Models via Fulfilling Safety Alignment Gap
by: Liu, Qin, et al.
Published: (2025)
by: Liu, Qin, et al.
Published: (2025)
When Language Overwrites Vision: Over-Alignment and Geometric Debiasing in Vision-Language Models
by: Saini, Harshvardhan, et al.
Published: (2026)
by: Saini, Harshvardhan, et al.
Published: (2026)
Gradual Residuals Alignment: A Dual-Stream Framework for GAN Inversion and Image Attribute Editing
by: Li, Hao, et al.
Published: (2024)
by: Li, Hao, et al.
Published: (2024)
Qwen3-VL-Seg: Unlocking Open-World Referring Segmentation with Vision-Language Grounding
by: Yao, Yuan, et al.
Published: (2026)
by: Yao, Yuan, et al.
Published: (2026)
Tuning Vision-Language Models with Candidate Labels by Prompt Alignment
by: Zhang, Zhifang, et al.
Published: (2024)
by: Zhang, Zhifang, et al.
Published: (2024)
MVL-Loc: Leveraging Vision-Language Model for Generalizable Multi-Scene Camera Relocalization
by: Xiao, Zhendong, et al.
Published: (2025)
by: Xiao, Zhendong, et al.
Published: (2025)
RE-VLM: Event-Augmented Vision-Language Model for Scene Understanding
by: Liu, Hanqing, et al.
Published: (2026)
by: Liu, Hanqing, et al.
Published: (2026)
Equilibrate RLHF: Towards Balancing Helpfulness-Safety Trade-off in Large Language Models
by: Tan, Yingshui, et al.
Published: (2025)
by: Tan, Yingshui, et al.
Published: (2025)
TagAlign: Improving Vision-Language Alignment with Multi-Tag Classification
by: Liu, Qinying, et al.
Published: (2023)
by: Liu, Qinying, et al.
Published: (2023)
Beyond Cross-Modal Alignment: Measuring and Leveraging Modality Gap in Vision-Language Models
by: Yan, Hanqi, et al.
Published: (2025)
by: Yan, Hanqi, et al.
Published: (2025)
Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines
by: Zhang, Zhixin, et al.
Published: (2024)
by: Zhang, Zhixin, et al.
Published: (2024)
Nemesis: Normalizing the Soft-prompt Vectors of Vision-Language Models
by: Fu, Shuai, et al.
Published: (2024)
by: Fu, Shuai, et al.
Published: (2024)
Decomposed Vision-Language Alignment for Fine-Grained Open-Vocabulary Segmentation
by: Wang, Chenhao, et al.
Published: (2026)
by: Wang, Chenhao, et al.
Published: (2026)
Analyzing Fine-Grained Alignment and Enhancing Vision Understanding in Multimodal Language Models
by: Jiang, Jiachen, et al.
Published: (2025)
by: Jiang, Jiachen, et al.
Published: (2025)
Subspace Alignment for Vision-Language Model Test-time Adaptation
by: Zeng, Zhichen, et al.
Published: (2026)
by: Zeng, Zhichen, et al.
Published: (2026)
Learning to Look: Cognitive Attention Alignment with Vision-Language Models
by: Yang, Ryan L., et al.
Published: (2025)
by: Yang, Ryan L., et al.
Published: (2025)
Token-Level Inference-Time Alignment for Vision-Language Models
by: Chen, Kejia, et al.
Published: (2025)
by: Chen, Kejia, et al.
Published: (2025)
Enhance Vision-Language Alignment with Noise
by: Huang, Sida, et al.
Published: (2024)
by: Huang, Sida, et al.
Published: (2024)
Focus on Focus: Focus-oriented Representation Learning and Multi-view Cross-modal Alignment for Glioma Grading
by: Pan, Li, et al.
Published: (2024)
by: Pan, Li, et al.
Published: (2024)
HoliSafe: Holistic Safety Benchmarking and Modeling for Vision-Language Model
by: Lee, Youngwan, et al.
Published: (2025)
by: Lee, Youngwan, et al.
Published: (2025)
Enhancing the Safety of Medical Vision-Language Models by Synthetic Demonstrations
by: Xue, Zhiyu, et al.
Published: (2025)
by: Xue, Zhiyu, et al.
Published: (2025)
MM-Skin: Enhancing Dermatology Vision-Language Model with an Image-Text Dataset Derived from Textbooks
by: Zeng, Wenqi, et al.
Published: (2025)
by: Zeng, Wenqi, et al.
Published: (2025)
UniFashion: A Unified Vision-Language Model for Multimodal Fashion Retrieval and Generation
by: Zhao, Xiangyu, et al.
Published: (2024)
by: Zhao, Xiangyu, et al.
Published: (2024)
HiLa: Hierarchical Vision-Language Collaboration for Cancer Survival Prediction
by: Cui, Jiaqi, et al.
Published: (2025)
by: Cui, Jiaqi, et al.
Published: (2025)
Enhancing Medical Large Vision-Language Models via Alignment Distillation
by: Chang, Aofei, et al.
Published: (2025)
by: Chang, Aofei, et al.
Published: (2025)
Cross-Modal Safety Mechanism Transfer in Large Vision-Language Models
by: Xu, Shicheng, et al.
Published: (2024)
by: Xu, Shicheng, et al.
Published: (2024)
Evaluation of Safety Cognition Capability in Vision-Language Models for Autonomous Driving
by: Zhang, Enming, et al.
Published: (2025)
by: Zhang, Enming, et al.
Published: (2025)
SIA: Enhancing Safety via Intent Awareness for Vision-Language Models
by: Na, Youngjin, et al.
Published: (2025)
by: Na, Youngjin, et al.
Published: (2025)
Enhancing Adversarial Robustness of Vision-Language Models through Low-Rank Adaptation
by: Ji, Yuheng, et al.
Published: (2024)
by: Ji, Yuheng, et al.
Published: (2024)
Continuous-Time Distribution Matching for Few-Step Diffusion Distillation
by: Liu, Tao, et al.
Published: (2026)
by: Liu, Tao, et al.
Published: (2026)
Self-Supervised Visual Preference Alignment
by: Zhu, Ke, et al.
Published: (2024)
by: Zhu, Ke, et al.
Published: (2024)
Similar Items
-
PSA-VLM: Enhancing Vision-Language Model Safety through Progressive Concept-Bottleneck-Driven Alignment
by: Liu, Zhendong, et al.
Published: (2024) -
MSR-Align: Policy-Grounded Multimodal Alignment for Safety-Aware Reasoning in Vision-Language Models
by: Xia, Yinan, et al.
Published: (2025) -
ConceptGuard: Proactive Safety in Text-and-Image-to-Video Generation through Multimodal Risk Detection
by: Ma, Ruize, et al.
Published: (2025) -
INTER: Mitigating Hallucination in Large Vision-Language Models by Interaction Guidance Sampling
by: Dong, Xin, et al.
Published: (2025) -
SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model
by: Zhang, Yongting, et al.
Published: (2024)