:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhang, Ruohong, Zhang, Bowen, Li, Yanghao, Zhang, Haotian, Sun, Zhiqing, Gan, Zhe, Yang, Yinfei, Pang, Ruoming, Yang, Yiming
Format:	Preprint
Published:	2024
Subjects:	Artificial Intelligence Computer Vision and Pattern Recognition 68T07
Online Access:	https://arxiv.org/abs/2410.16198
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

How Well Do Vision-Language Models Understand Sequential Driving Scenes? A Sensitivity Study
by: Brusnicki, Roberto, et al.
Published: (2026)

MedAD-R1: Eliciting Consistent Reasoning in Interpretible Medical Anomaly Detection via Consistency-Reinforced Policy Optimization
by: Zhang, Haitao, et al.
Published: (2026)

POA: Pre-training Once for Models of All Sizes
by: Zhang, Yingying, et al.
Published: (2024)

Beyond Color and Lines: Zero-Shot Style-Specific Image Variations with Coordinated Semantics
by: Hu, Jinghao, et al.
Published: (2024)

JVLGS: Joint Vision-Language Gas Leak Segmentation
by: Zhao, Xinlong, et al.
Published: (2025)

Saliency-Aware Multi-Route Thinking: Revisiting Vision-Language Reasoning
by: Shi, Mingjia, et al.
Published: (2026)

Masked Attention as a Mechanism for Improving Interpretability of Vision Transformers
by: Grisi, Clément, et al.
Published: (2024)

Attention Maps in 3D Shape Classification for Dental Stage Estimation with Class Node Graph Attention Networks
by: Buyukcakir, Barkin, et al.
Published: (2025)

MIRNet: Integrating Constrained Graph-Based Reasoning with Pre-training for Diagnostic Medical Imaging
by: Kong, Shufeng, et al.
Published: (2025)

Perception-Consistency Multimodal Large Language Models Reasoning via Caption-Regularized Policy Optimization
by: Tu, Songjun, et al.
Published: (2025)

Isolated Sign Language Recognition with Segmentation and Pose Estimation
by: Perkins, Daniel, et al.
Published: (2025)

Unlocking UML Class Diagram Understanding in Vision Language Models
by: Naboichenko, Artem, et al.
Published: (2026)

TWIG: Two-Step Image Generation using Segmentation Masks in Diffusion Models
by: Rakib, Mazharul Islam, et al.
Published: (2025)

Interactive Image Selection and Training for Brain Tumor Segmentation Network
by: Cerqueira, Matheus A., et al.
Published: (2024)

Performance Decay in Deepfake Detection: The Limitations of Training on Outdated Data
by: Richings, Jack, et al.
Published: (2025)

TG-LMM: Enhancing Medical Image Segmentation Accuracy through Text-Guided Large Multi-Modal Model
by: Zhao, Yihao, et al.
Published: (2024)

HelloMeme: Integrating Spatial Knitting Attentions to Embed High-Level and Fidelity-Rich Conditions in Diffusion Models
by: Zhang, Shengkai, et al.
Published: (2024)

Combiner and HyperCombiner Networks: Rules to Combine Multimodality MR Images for Prostate Cancer Localisation
by: Yan, Wen, et al.
Published: (2023)

Few-shot crack image classification using clip based on bayesian optimization
by: Zhang, Yingchao, et al.
Published: (2025)

Faster Vision Mamba is Rebuilt in Minutes via Merged Token Re-training
by: Shi, Mingjia, et al.
Published: (2024)

Computational Imaging Priors for Wireless Capsule Endoscopy: Monte Carlo-Guided Hemoglobin Mapping for Rare-Anomaly Detection
by: Yang, Chengshuai, et al.
Published: (2026)

Uncertainty and Prediction Quality Estimation for Semantic Segmentation via Graph Neural Networks
by: Heinert, Edgar, et al.
Published: (2024)

Steerable Pyramid Weighted Loss: Multi-Scale Adaptive Weighting for Semantic Segmentation
by: Lu, Renhao
Published: (2025)

Motion Consistency Loss for Monocular Visual Odometry with Attention-Based Deep Learning
by: Françani, André O., et al.
Published: (2024)

MaizeEar-SAM: Zero-Shot Maize Ear Phenotyping
by: Zaremehrjerdi, Hossein, et al.
Published: (2025)

Point, Detect, Count: Multi-Task Medical Image Understanding with Instruction-Tuned Vision-Language Models
by: Gautam, Sushant, et al.
Published: (2025)

Z-Order Transformer for Feed-Forward Gaussian Splatting
by: Wang, Can, et al.
Published: (2026)

EatGAN: An Edge-Attention Guided Generative Adversarial Network for Single Image Super-Resolution
by: Rao, Penghao, et al.
Published: (2025)

Poisson Flow Consistency Training
by: Zhang, Anthony, et al.
Published: (2025)

ViTNF: Leveraging Neural Fields to Boost Vision Transformers in Generalized Category Discovery
by: Su, Jiayi, et al.
Published: (2025)

An Autoencoder and Vision Transformer-based Interpretability Analysis of the Differences in Automated Staging of Second and Third Molars
by: Buyukcakir, Barkin, et al.
Published: (2025)

Transformer-Based Model for Monocular Visual Odometry: A Video Understanding Approach
by: Françani, André O., et al.
Published: (2023)

3D Convolutional Neural Networks for Improved Detection of Intracranial bleeding in CT Imaging
by: Subramanian, Bargava, et al.
Published: (2025)

Enhancing Small Object Detection with YOLO: A Novel Framework for Improved Accuracy and Efficiency
by: Moghadami, Mahila, et al.
Published: (2025)

How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts
by: Qian, Yusu, et al.
Published: (2024)

Deep Learning From Routine Histology Improves Risk Stratification for Biochemical Recurrence in Prostate Cancer
by: Grisi, Clément, et al.
Published: (2026)

Ada-adapter:Fast Few-shot Style Personlization of Diffusion Model with Pre-trained Image Encoder
by: Liu, Jia, et al.
Published: (2024)

Classification of Diabetic Retinopathy using Pre-Trained Deep Learning Models
by: Al-Kamachy, Inas, et al.
Published: (2024)

Surrealistic-like Image Generation with Vision-Language Models
by: Ayten, Elif, et al.
Published: (2024)

A Physics-Inspired Deep Learning Framework with Polar Coordinate Attention for Ptychographic Imaging
by: Yue, Han, et al.
Published: (2024)