Saved in:
| Main Authors: | Zhang, Yunqi, Li, Songda, Deng, Chunyuan, Wang, Luyi, Zhao, Hui |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2405.16860 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Perception Before Reasoning: Two-Stage Reinforcement Learning for Visual Reasoning in Vision-Language Models
by: Chen, Yan, et al.
Published: (2025)
by: Chen, Yan, et al.
Published: (2025)
Unveiling the "Fairness Seesaw": Discovering and Mitigating Gender and Race Bias in Vision-Language Models
by: Lan, Jian, et al.
Published: (2025)
by: Lan, Jian, et al.
Published: (2025)
GenderBias-\emph{VL}: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing
by: Xiao, Yisong, et al.
Published: (2024)
by: Xiao, Yisong, et al.
Published: (2024)
Think Hierarchically, Act Dynamically: Hierarchical Multi-modal Fusion and Reasoning for Vision-and-Language Navigation
by: Yue, Junrong, et al.
Published: (2025)
by: Yue, Junrong, et al.
Published: (2025)
Think, Act, Build: An Agentic Framework with Vision Language Models for Zero-Shot 3D Visual Grounding
by: Wang, Haibo, et al.
Published: (2026)
by: Wang, Haibo, et al.
Published: (2026)
Grounded Knowledge-Enhanced Medical Vision-Language Pre-training for Chest X-Ray
by: Deng, Qiao, et al.
Published: (2024)
by: Deng, Qiao, et al.
Published: (2024)
Think Before You Drive: World Model-Inspired Multimodal Grounding for Autonomous Vehicles
by: Liao, Haicheng, et al.
Published: (2025)
by: Liao, Haicheng, et al.
Published: (2025)
Egocentric Bias in Vision-Language Models
by: Wang, Maijunxian, et al.
Published: (2026)
by: Wang, Maijunxian, et al.
Published: (2026)
Thinking Before Looking: Improving Multimodal LLM Reasoning via Mitigating Visual Hallucination
by: Zheng, Haojie, et al.
Published: (2024)
by: Zheng, Haojie, et al.
Published: (2024)
VidLBEval: Benchmarking and Mitigating Language Bias in Video-Involved LVLMs
by: Yang, Yiming, et al.
Published: (2025)
by: Yang, Yiming, et al.
Published: (2025)
GABInsight: Exploring Gender-Activity Binding Bias in Vision-Language Models
by: Abdollahi, Ali, et al.
Published: (2024)
by: Abdollahi, Ali, et al.
Published: (2024)
ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning
by: Huang, Chi-Pin, et al.
Published: (2025)
by: Huang, Chi-Pin, et al.
Published: (2025)
Auditing and Mitigating Bias in Gender Classification Algorithms: A Data-Centric Approach
by: Bahiru, Tadesse K, et al.
Published: (2025)
by: Bahiru, Tadesse K, et al.
Published: (2025)
Bias Detection and Rotation-Robustness Mitigation in Vision-Language Models and Generative Image Models
by: Mithila, Tarannum
Published: (2026)
by: Mithila, Tarannum
Published: (2026)
Two-Stage Random Alternation Framework for One-Shot Pansharpening
by: Chen, Haorui, et al.
Published: (2025)
by: Chen, Haorui, et al.
Published: (2025)
Mitigating Gender Bias in Face Recognition Using the von Mises-Fisher Mixture Model
by: Conti, Jean-Rémy, et al.
Published: (2022)
by: Conti, Jean-Rémy, et al.
Published: (2022)
Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning
by: Huang, Chi-Pin, et al.
Published: (2026)
by: Huang, Chi-Pin, et al.
Published: (2026)
D2SP: Dynamic Dual-Stage Purification Framework for Dual Noise Mitigation in Vision-based Affective Recognition
by: Wang, Haoran, et al.
Published: (2024)
by: Wang, Haoran, et al.
Published: (2024)
TINA: Think, Interaction, and Action Framework for Zero-Shot Vision Language Navigation
by: Li, Dingbang, et al.
Published: (2024)
by: Li, Dingbang, et al.
Published: (2024)
Perceptual Inductive Bias Is What You Need Before Contrastive Learning
by: Li, Tianqin, et al.
Published: (2025)
by: Li, Tianqin, et al.
Published: (2025)
Explanatory Interactive Machine Learning for Bias Mitigation in Visual Gender Classification
by: Satriani, Nathanya, et al.
Published: (2026)
by: Satriani, Nathanya, et al.
Published: (2026)
Look Before You Decide: Prompting Active Deduction of MLLMs for Assumptive Reasoning
by: Li, Yian, et al.
Published: (2024)
by: Li, Yian, et al.
Published: (2024)
BRAIN: Bias-Mitigation Continual Learning Approach to Vision-Brain Understanding
by: Nguyen, Xuan-Bac, et al.
Published: (2025)
by: Nguyen, Xuan-Bac, et al.
Published: (2025)
CounterCount: A Diagnostic Framework for Counting Bias in Vision Language Models
by: Alzahrani, Reem, et al.
Published: (2026)
by: Alzahrani, Reem, et al.
Published: (2026)
IndicFairFace: Balanced Indian Face Dataset for Auditing and Mitigating Geographical Bias in Vision-Language Models
by: Mohsin, Aarish Shah, et al.
Published: (2026)
by: Mohsin, Aarish Shah, et al.
Published: (2026)
A Two-Stage Globally-Diverse Adversarial Attack for Vision-Language Pre-training Models
by: Chen, Wutao, et al.
Published: (2026)
by: Chen, Wutao, et al.
Published: (2026)
Think Before You Diffuse: Infusing Physical Rules into Video Diffusion
by: Zhang, Ke, et al.
Published: (2025)
by: Zhang, Ke, et al.
Published: (2025)
Explain Before You Answer: A Survey on Compositional Visual Reasoning
by: Ke, Fucai, et al.
Published: (2025)
by: Ke, Fucai, et al.
Published: (2025)
Content-Aware Ad Banner Layout Generation with Two-Stage Chain-of-Thought in Vision Language Models
by: Yoshitake, Kei, et al.
Published: (2025)
by: Yoshitake, Kei, et al.
Published: (2025)
Towards Mitigating Modality Bias in Vision-Language Models for Temporal Action Localization
by: Li, Jiaqi, et al.
Published: (2026)
by: Li, Jiaqi, et al.
Published: (2026)
LLaVA-OneVision: Easy Visual Task Transfer
by: Li, Bo, et al.
Published: (2024)
by: Li, Bo, et al.
Published: (2024)
Towards the Vision-Sound-Language-Action Paradigm: The HEAR Framework for Sound-Centric Manipulation
by: Nie, Chang, et al.
Published: (2026)
by: Nie, Chang, et al.
Published: (2026)
Two-Stream Interactive Joint Learning of Scene Parsing and Geometric Vision Tasks
by: Tang, Guanfeng, et al.
Published: (2026)
by: Tang, Guanfeng, et al.
Published: (2026)
Object-Centric Vision Token Pruning for Vision Language Models
by: Li, Guangyuan, et al.
Published: (2025)
by: Li, Guangyuan, et al.
Published: (2025)
Mitigating Object Hallucination in Large Vision-Language Models via Image-Grounded Guidance
by: Zhao, Linxi, et al.
Published: (2024)
by: Zhao, Linxi, et al.
Published: (2024)
Uncovering Bias in Large Vision-Language Models with Counterfactuals
by: Howard, Phillip, et al.
Published: (2024)
by: Howard, Phillip, et al.
Published: (2024)
Think Before You Move: Latent Motion Reasoning for Text-to-Motion Generation
by: Qian, Yijie, et al.
Published: (2025)
by: Qian, Yijie, et al.
Published: (2025)
Towards Self-Refinement of Vision-Language Models with Triangular Consistency
by: Deng, Yunlong, et al.
Published: (2025)
by: Deng, Yunlong, et al.
Published: (2025)
Text is All You Need for Vision-Language Model Jailbreaking
by: Chen, Yihang, et al.
Published: (2026)
by: Chen, Yihang, et al.
Published: (2026)
LayoutAgent: A Vision-Language Agent Guided Compositional Diffusion for Spatial Layout Planning
by: Fan, Zezhong, et al.
Published: (2025)
by: Fan, Zezhong, et al.
Published: (2025)
Similar Items
-
Perception Before Reasoning: Two-Stage Reinforcement Learning for Visual Reasoning in Vision-Language Models
by: Chen, Yan, et al.
Published: (2025) -
Unveiling the "Fairness Seesaw": Discovering and Mitigating Gender and Race Bias in Vision-Language Models
by: Lan, Jian, et al.
Published: (2025) -
GenderBias-\emph{VL}: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing
by: Xiao, Yisong, et al.
Published: (2024) -
Think Hierarchically, Act Dynamically: Hierarchical Multi-modal Fusion and Reasoning for Vision-and-Language Navigation
by: Yue, Junrong, et al.
Published: (2025) -
Think, Act, Build: An Agentic Framework with Vision Language Models for Zero-Shot 3D Visual Grounding
by: Wang, Haibo, et al.
Published: (2026)