Saved in:
| Main Authors: | Zhang, Ruohong, Zhang, Bowen, Li, Yanghao, Zhang, Haotian, Sun, Zhiqing, Gan, Zhe, Yang, Yinfei, Pang, Ruoming, Yang, Yiming |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2410.16198 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
How Well Do Vision-Language Models Understand Sequential Driving Scenes? A Sensitivity Study
by: Brusnicki, Roberto, et al.
Published: (2026)
by: Brusnicki, Roberto, et al.
Published: (2026)
MedAD-R1: Eliciting Consistent Reasoning in Interpretible Medical Anomaly Detection via Consistency-Reinforced Policy Optimization
by: Zhang, Haitao, et al.
Published: (2026)
by: Zhang, Haitao, et al.
Published: (2026)
POA: Pre-training Once for Models of All Sizes
by: Zhang, Yingying, et al.
Published: (2024)
by: Zhang, Yingying, et al.
Published: (2024)
Beyond Color and Lines: Zero-Shot Style-Specific Image Variations with Coordinated Semantics
by: Hu, Jinghao, et al.
Published: (2024)
by: Hu, Jinghao, et al.
Published: (2024)
JVLGS: Joint Vision-Language Gas Leak Segmentation
by: Zhao, Xinlong, et al.
Published: (2025)
by: Zhao, Xinlong, et al.
Published: (2025)
Saliency-Aware Multi-Route Thinking: Revisiting Vision-Language Reasoning
by: Shi, Mingjia, et al.
Published: (2026)
by: Shi, Mingjia, et al.
Published: (2026)
Masked Attention as a Mechanism for Improving Interpretability of Vision Transformers
by: Grisi, Clément, et al.
Published: (2024)
by: Grisi, Clément, et al.
Published: (2024)
Attention Maps in 3D Shape Classification for Dental Stage Estimation with Class Node Graph Attention Networks
by: Buyukcakir, Barkin, et al.
Published: (2025)
by: Buyukcakir, Barkin, et al.
Published: (2025)
MIRNet: Integrating Constrained Graph-Based Reasoning with Pre-training for Diagnostic Medical Imaging
by: Kong, Shufeng, et al.
Published: (2025)
by: Kong, Shufeng, et al.
Published: (2025)
Perception-Consistency Multimodal Large Language Models Reasoning via Caption-Regularized Policy Optimization
by: Tu, Songjun, et al.
Published: (2025)
by: Tu, Songjun, et al.
Published: (2025)
Isolated Sign Language Recognition with Segmentation and Pose Estimation
by: Perkins, Daniel, et al.
Published: (2025)
by: Perkins, Daniel, et al.
Published: (2025)
Unlocking UML Class Diagram Understanding in Vision Language Models
by: Naboichenko, Artem, et al.
Published: (2026)
by: Naboichenko, Artem, et al.
Published: (2026)
TWIG: Two-Step Image Generation using Segmentation Masks in Diffusion Models
by: Rakib, Mazharul Islam, et al.
Published: (2025)
by: Rakib, Mazharul Islam, et al.
Published: (2025)
Interactive Image Selection and Training for Brain Tumor Segmentation Network
by: Cerqueira, Matheus A., et al.
Published: (2024)
by: Cerqueira, Matheus A., et al.
Published: (2024)
Performance Decay in Deepfake Detection: The Limitations of Training on Outdated Data
by: Richings, Jack, et al.
Published: (2025)
by: Richings, Jack, et al.
Published: (2025)
TG-LMM: Enhancing Medical Image Segmentation Accuracy through Text-Guided Large Multi-Modal Model
by: Zhao, Yihao, et al.
Published: (2024)
by: Zhao, Yihao, et al.
Published: (2024)
HelloMeme: Integrating Spatial Knitting Attentions to Embed High-Level and Fidelity-Rich Conditions in Diffusion Models
by: Zhang, Shengkai, et al.
Published: (2024)
by: Zhang, Shengkai, et al.
Published: (2024)
Combiner and HyperCombiner Networks: Rules to Combine Multimodality MR Images for Prostate Cancer Localisation
by: Yan, Wen, et al.
Published: (2023)
by: Yan, Wen, et al.
Published: (2023)
Few-shot crack image classification using clip based on bayesian optimization
by: Zhang, Yingchao, et al.
Published: (2025)
by: Zhang, Yingchao, et al.
Published: (2025)
Faster Vision Mamba is Rebuilt in Minutes via Merged Token Re-training
by: Shi, Mingjia, et al.
Published: (2024)
by: Shi, Mingjia, et al.
Published: (2024)
Computational Imaging Priors for Wireless Capsule Endoscopy: Monte Carlo-Guided Hemoglobin Mapping for Rare-Anomaly Detection
by: Yang, Chengshuai, et al.
Published: (2026)
by: Yang, Chengshuai, et al.
Published: (2026)
Uncertainty and Prediction Quality Estimation for Semantic Segmentation via Graph Neural Networks
by: Heinert, Edgar, et al.
Published: (2024)
by: Heinert, Edgar, et al.
Published: (2024)
Steerable Pyramid Weighted Loss: Multi-Scale Adaptive Weighting for Semantic Segmentation
by: Lu, Renhao
Published: (2025)
by: Lu, Renhao
Published: (2025)
Motion Consistency Loss for Monocular Visual Odometry with Attention-Based Deep Learning
by: Françani, André O., et al.
Published: (2024)
by: Françani, André O., et al.
Published: (2024)
MaizeEar-SAM: Zero-Shot Maize Ear Phenotyping
by: Zaremehrjerdi, Hossein, et al.
Published: (2025)
by: Zaremehrjerdi, Hossein, et al.
Published: (2025)
Point, Detect, Count: Multi-Task Medical Image Understanding with Instruction-Tuned Vision-Language Models
by: Gautam, Sushant, et al.
Published: (2025)
by: Gautam, Sushant, et al.
Published: (2025)
Z-Order Transformer for Feed-Forward Gaussian Splatting
by: Wang, Can, et al.
Published: (2026)
by: Wang, Can, et al.
Published: (2026)
EatGAN: An Edge-Attention Guided Generative Adversarial Network for Single Image Super-Resolution
by: Rao, Penghao, et al.
Published: (2025)
by: Rao, Penghao, et al.
Published: (2025)
Poisson Flow Consistency Training
by: Zhang, Anthony, et al.
Published: (2025)
by: Zhang, Anthony, et al.
Published: (2025)
ViTNF: Leveraging Neural Fields to Boost Vision Transformers in Generalized Category Discovery
by: Su, Jiayi, et al.
Published: (2025)
by: Su, Jiayi, et al.
Published: (2025)
An Autoencoder and Vision Transformer-based Interpretability Analysis of the Differences in Automated Staging of Second and Third Molars
by: Buyukcakir, Barkin, et al.
Published: (2025)
by: Buyukcakir, Barkin, et al.
Published: (2025)
Transformer-Based Model for Monocular Visual Odometry: A Video Understanding Approach
by: Françani, André O., et al.
Published: (2023)
by: Françani, André O., et al.
Published: (2023)
3D Convolutional Neural Networks for Improved Detection of Intracranial bleeding in CT Imaging
by: Subramanian, Bargava, et al.
Published: (2025)
by: Subramanian, Bargava, et al.
Published: (2025)
Enhancing Small Object Detection with YOLO: A Novel Framework for Improved Accuracy and Efficiency
by: Moghadami, Mahila, et al.
Published: (2025)
by: Moghadami, Mahila, et al.
Published: (2025)
How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts
by: Qian, Yusu, et al.
Published: (2024)
by: Qian, Yusu, et al.
Published: (2024)
Deep Learning From Routine Histology Improves Risk Stratification for Biochemical Recurrence in Prostate Cancer
by: Grisi, Clément, et al.
Published: (2026)
by: Grisi, Clément, et al.
Published: (2026)
Ada-adapter:Fast Few-shot Style Personlization of Diffusion Model with Pre-trained Image Encoder
by: Liu, Jia, et al.
Published: (2024)
by: Liu, Jia, et al.
Published: (2024)
Classification of Diabetic Retinopathy using Pre-Trained Deep Learning Models
by: Al-Kamachy, Inas, et al.
Published: (2024)
by: Al-Kamachy, Inas, et al.
Published: (2024)
Surrealistic-like Image Generation with Vision-Language Models
by: Ayten, Elif, et al.
Published: (2024)
by: Ayten, Elif, et al.
Published: (2024)
A Physics-Inspired Deep Learning Framework with Polar Coordinate Attention for Ptychographic Imaging
by: Yue, Han, et al.
Published: (2024)
by: Yue, Han, et al.
Published: (2024)
Similar Items
-
How Well Do Vision-Language Models Understand Sequential Driving Scenes? A Sensitivity Study
by: Brusnicki, Roberto, et al.
Published: (2026) -
MedAD-R1: Eliciting Consistent Reasoning in Interpretible Medical Anomaly Detection via Consistency-Reinforced Policy Optimization
by: Zhang, Haitao, et al.
Published: (2026) -
POA: Pre-training Once for Models of All Sizes
by: Zhang, Yingying, et al.
Published: (2024) -
Beyond Color and Lines: Zero-Shot Style-Specific Image Variations with Coordinated Semantics
by: Hu, Jinghao, et al.
Published: (2024) -
JVLGS: Joint Vision-Language Gas Leak Segmentation
by: Zhao, Xinlong, et al.
Published: (2025)