:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Khan, Sulaiman, Biswas, Md. Rafiul, Shah, Zubair
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition Machine Learning
Online Access:	https://arxiv.org/abs/2601.12981
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

An Early Investigation into the Utility of Multimodal Large Language Models in Medical Imaging
by: Khan, Sulaiman, et al.
Published: (2024)

Pushing Boundaries: Exploring Zero Shot Object Classification with Large Multimodal Models
by: Islam, Ashhadul, et al.
Published: (2023)

GeoViSTA: Geospatial Vision-Tabular Transformer for Multimodal Environment Representation
by: Liu, Yuhao, et al.
Published: (2026)

Context-Aware Zero-Shot Anomaly Detection in Surveillance Using Contrastive and Predictive Spatiotemporal Modeling
by: Khan, Md. Rashid Shahriar, et al.
Published: (2025)

ZAYAN: Disentangled Contrastive Transformer for Tabular Remote Sensing Data
by: Habib, Al Zadid Sultan Bin, et al.
Published: (2026)

MMSFormer: Multimodal Transformer for Material and Semantic Segmentation
by: Reza, Md Kaykobad, et al.
Published: (2023)

Benchmarking Early Agitation Prediction in Community-Dwelling People with Dementia Using Multimodal Sensors and Machine Learning
by: Abedi, Ali, et al.
Published: (2025)

Resilient Vision-Tabular Multimodal Learning under Modality Missingness
by: Caruso, Camillo Maria, et al.
Published: (2026)

BanglaMM-Disaster: A Multimodal Transformer-Based Deep Learning Framework for Multiclass Disaster Classification in Bangla
by: Islam, Ariful, et al.
Published: (2025)

Machine Learning Prediction of Cardiovascular Risk in Type 1 Diabetes Mellitus Using Radiomics Features from Multimodal Retinal Images
by: Tohà-Dalmau, Ariadna, et al.
Published: (2025)

Lecture Video Visual Objects (LVVO) Dataset: A Benchmark for Visual Object Detection in Educational Videos
by: Biswas, Dipayan, et al.
Published: (2025)

TIME: TabPFN-Integrated Multimodal Engine for Robust Tabular-Image Learning
by: Luo, Jiaqi, et al.
Published: (2025)

KNN and ANN-based Recognition of Handwritten Pashto Letters using Zoning Features
by: Khan, Sulaiman, et al.
Published: (2019)

Balancing Accuracy and Efficiency: CNN Fusion Models for Diabetic Retinopathy Screening
by: Islam, Md Rafid, et al.
Published: (2025)

Multimodal Deep Learning for Diabetic Foot Ulcer Staging Using Integrated RGB and Thermal Imaging
by: Mermer, Gulengul, et al.
Published: (2026)

Integrating Non-Linear Radon Transformation for Diabetic Retinopathy Grading
by: Mohsen, Farida, et al.
Published: (2025)

MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image
by: Arazi, Alan, et al.
Published: (2026)

Multimodal Tabular Reasoning with Privileged Structured Information
by: Jiang, Jun-Peng, et al.
Published: (2025)

TREAT-Net: Tabular-Referenced Echocardiography Analysis for Acute Coronary Syndrome Treatment Prediction
by: Kim, Diane, et al.
Published: (2025)

Improving Multimodal Large Language Models Using Continual Learning
by: Srivastava, Shikhar, et al.
Published: (2024)

Tab2Visual: Overcoming Limited Data in Tabular Data Classification Using Deep Learning with Visual Representations
by: Mamdouh, Ahmed, et al.
Published: (2025)

Model Predictive Simulation Using Structured Graphical Models and Transformers
by: Lou, Xinghua, et al.
Published: (2024)

Tabular GANs for uneven distribution
by: Ashrapov, Insaf
Published: (2020)

Text Role Classification in Scientific Charts Using Multimodal Transformers
by: Kim, Hye Jin, et al.
Published: (2024)

DualSwinFusionSeg: Multimodal Martian Landslide Segmentation via Dual Swin Transformer with Multi-Scale Fusion and UNet++
by: Kabir, Shahriar, et al.
Published: (2026)

VisTabNet: Adapting Vision Transformers for Tabular Data
by: Wydmański, Witold, et al.
Published: (2024)

HyperFusion: A Hypernetwork Approach to Multimodal Integration of Tabular and Medical Imaging Data for Predictive Modeling
by: Duenias, Daniel, et al.
Published: (2024)

AttentionDrop: A Novel Regularization Method for Transformer Models
by: Baig, Mirza Samad Ahmed, et al.
Published: (2025)

Yoga Pose Classification Using Transfer Learning
by: Akash, M. M., et al.
Published: (2024)

ELMF4EggQ: Ensemble Learning with Multimodal Feature Fusion for Non-Destructive Egg Quality Assessment
by: Hassan, Md Zahim, et al.
Published: (2025)

Progressive Semantic-Guided Vision Transformer for Zero-Shot Learning
by: Chen, Shiming, et al.
Published: (2024)

A Survey on Self-supervised Contrastive Learning for Multimodal Text-Image Analysis
by: Khan, Asifullah, et al.
Published: (2025)

Deep Learning-Based Noninvasive Screening of Type 2 Diabetes with Chest X-ray Images and Electronic Health Records
by: Gundapaneni, Sanjana, et al.
Published: (2024)

A Bidirectional Siamese Recurrent Neural Network for Accurate Gait Recognition Using Body Landmarks
by: Progga, Proma Hossain, et al.
Published: (2024)

Leveraging Pre-trained CNNs for Efficient Feature Extraction in Rice Leaf Disease Classification
by: Sobuj, Md. Shohanur Islam, et al.
Published: (2024)

Rethinking Timesteps Samplers and Prediction Types
by: Xie, Bin, et al.
Published: (2025)

Video-Based MPAA Rating Prediction: An Attention-Driven Hybrid Architecture Using Contrastive Learning
by: Neogi, Dipta, et al.
Published: (2025)

A Transformer-based Multimodal Fusion Model for Efficient Crowd Counting Using Visual and Wireless Signals
by: Cui, Zhe, et al.
Published: (2025)

Transformer-Based Classification Outcome Prediction for Multimodal Stroke Treatment
by: Ma, Danqing, et al.
Published: (2024)

MultiFair: Multimodal Balanced Fairness-Aware Medical Classification with Dual-Level Gradient Modulation
by: Zubair, Md, et al.
Published: (2025)