:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ji, Zhengyang, Gao, Shang, Liu, Li, Jia, Yifan, Yue, Yutao
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2503.02476
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Adaptive H&E-IHC information fusion staining framework based on feature extra
by: Jia, Yifan, et al.
Published: (2025)

DRIVE: Dual-Robustness via Information Variability and Entropic Consistency in Source-Free Unsupervised Domain Adaptation
by: Xiao, Ruiqiang, et al.
Published: (2024)

Dual Consistent Constraint via Disentangled Consistency and Complementarity for Multi-view Clustering
by: Li, Bo, et al.
Published: (2025)

Improving VQA Reliability: A Dual-Assessment Approach with Self-Reflection and Cross-Model Verification
by: Wu, Xixian, et al.
Published: (2025)

CHRIS: Clothed Human Reconstruction with Side View Consistency
by: Liu, Dong, et al.
Published: (2025)

Harnessing Group-Oriented Consistency Constraints for Semi-Supervised Semantic Segmentation in CdZnTe Semiconductors
by: Li, Peihao, et al.
Published: (2025)

Interpreting Social Bias in LVLMs via Information Flow Analysis and Multi-Round Dialogue Evaluation
by: Ji, Zhengyang, et al.
Published: (2025)

CoEmoGen: Towards Semantically-Coherent and Scalable Emotional Image Content Generation
by: Yuan, Kaishen, et al.
Published: (2025)

ARIAL: An Agentic Framework for Document VQA with Precise Answer Localization
by: Mohammadshirazi, Ahmad, et al.
Published: (2025)

MISS: A Generative Pretraining and Finetuning Approach for Med-VQA
by: Chen, Jiawei, et al.
Published: (2024)

VQA$^2$: Visual Question Answering for Video Quality Assessment
by: Jia, Ziheng, et al.
Published: (2024)

Dual Causal Inference: Integrating Backdoor Adjustment and Instrumental Variable Learning for Medical VQA
by: Xu, Zibo, et al.
Published: (2026)

When to Trust the Answer: Question-Aligned Semantic Nearest Neighbor Entropy for Safer Surgical VQA
by: Carlini, Luca, et al.
Published: (2025)

TSVC:Tripartite Learning with Semantic Variation Consistency for Robust Image-Text Retrieval
by: Lyu, Shuai, et al.
Published: (2025)

Biomed-DPT: Dual Modality Prompt Tuning for Biomedical Vision-Language Models
by: Peng, Wei, et al.
Published: (2025)

An Evaluation of GPT-4V and Gemini in Online VQA
by: Liu, Mengchen, et al.
Published: (2023)

Semantic and Visual Evidence for Efficient Long-Video Reasoning: A Solution for the HD-EPIC VQA Challenge
by: Xu, Yinsong, et al.
Published: (2026)

LMM-VQA: Advancing Video Quality Assessment with Large Multimodal Models
by: Ge, Qihang, et al.
Published: (2024)

Diffusion-Guided Semantic Consistency for Multimodal Heterogeneity
by: Liu, Jing, et al.
Published: (2026)

BioVFM-21M: Benchmarking and Scaling Self-Supervised Vision Foundation Models for Biomedical Image Analysis
by: Liu, Jiarun, et al.
Published: (2025)

VQA-Levels: A Hierarchical Approach for Classifying Questions in VQA
by: Madaka, Madhuri Latha, et al.
Published: (2025)

Look-Closer-Then-Diagnose: Confidence-Aware Ultrasound VQA via Active Zooming
by: Zhou, Yue, et al.
Published: (2026)

Spectral Discrepancy and Cross-modal Semantic Consistency Learning for Object Detection in Hyperspectral Image
by: He, Xiao, et al.
Published: (2025)

MaS-VQA: A Mask-and-Select Framework for Knowledge-Based Visual Question Answering
by: Mao, Xianwei, et al.
Published: (2026)

PathNavigate: A Training-Free Pathology Agent with Surprise-Guided Scan and Shared Slide Memory for Whole-Slide Image VQA
by: Yang, Chunze, et al.
Published: (2026)

Leveraging the Video-level Semantic Consistency of Event for Audio-visual Event Localization
by: Jiang, Yuanyuan, et al.
Published: (2022)

Improving the Classification Effect of Clinical Images of Diseases for Multi-Source Privacy Protection
by: Bowen, Tian, et al.
Published: (2024)

SCRA-VQA: Summarized Caption-Rerank for Augmented Large Language Models in Visual Question Answering
by: Zhang, Yan, et al.
Published: (2025)

UniBEVFusion: Unified Radar-Vision BEVFusion for 3D Object Detection
by: Zhao, Haocheng, et al.
Published: (2024)

Fence Theorem: Towards Dual-Objective Semantic-Structure Isolation in Preprocessing Phase for 3D Anomaly Detection
by: Liang, Hanzhe, et al.
Published: (2025)

Cycle Inverse-Consistent TransMorph: A Balanced Deep Learning Framework for Brain MRI Registration
by: Shang, Jiaqi, et al.
Published: (2026)

Is ChatGPT-5 Ready for Mammogram VQA?
by: Li, Qiang, et al.
Published: (2025)

Advancing Surgical VQA with Scene Graph Knowledge
by: Yuan, Kun, et al.
Published: (2023)

Towards Clinically Interpretable Ophthalmic VQA via Spatially-Grounded Lesion Evidence
by: Wang, Xingyue, et al.
Published: (2026)

EarthCrafter: Scalable 3D Earth Generation via Dual-Sparse Latent Diffusion
by: Liu, Shang, et al.
Published: (2025)

PointGS: Semantic-Consistent Unsupervised 3D Point Cloud Segmentation with 3D Gaussian Splatting
by: Song, Yixiao, et al.
Published: (2026)

Frankenstein: Generating Semantic-Compositional 3D Scenes in One Tri-Plane
by: Yan, Han, et al.
Published: (2024)

R-LLaVA: Improving Med-VQA Understanding through Visual Region of Interest
by: Chen, Xupeng, et al.
Published: (2024)

Multi-task Visual Grounding with Coarse-to-Fine Consistency Constraints
by: Dai, Ming, et al.
Published: (2025)

KNVQA: A Benchmark for evaluation knowledge-based VQA
by: Cheng, Sirui, et al.
Published: (2023)