:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Mahala, Nitish Kumar, Khan, Muzammil, Kumar, Pushpendra
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2508.14597
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Facial Emotion Learning with Text-Guided Multiview Fusion via Vision-Language Model for 3D/4D Facial Expression Recognition
by: Behzad, Muzammil
Published: (2025)

SwinTF3D: A Lightweight Multimodal Fusion Approach for Text-Guided 3D Medical Image Segmentation
by: Khan, Hasan Faraz, et al.
Published: (2025)

SynthGenNet: a self-supervised approach for test-time generalization using synthetic multi-source domain mixing of street view images
by: Dhakara, Pushpendra, et al.
Published: (2025)

Uncertainty-Guided Inference-Time Depth Adaptation for Transformer-Based Visual Tracking
by: Poggi, Patrick, et al.
Published: (2026)

Beam-Guided Knowledge Replay for Knowledge-Rich Image Captioning using Vision-Language Model
by: AlJunaid, Reem, et al.
Published: (2025)

Skin Cancer Classification: Hybrid CNN-Transformer Models with KAN-Based Fusion
by: Agarwal, Shubhi, et al.
Published: (2025)

Handcrafted Feature Fusion for Reliable Detection of AI-Generated Images
by: Nirob, Syed Mehedi Hasan, et al.
Published: (2026)

Reliable Object Tracking by Multimodal Hybrid Feature Extraction and Transformer-Based Fusion
by: Sun, Hongze, et al.
Published: (2024)

Attack-Aware Deepfake Detection under Counter-Forensic Manipulations
by: Fatima, Noor, et al.
Published: (2025)

Self-Supervised Multi-View Representation Learning using Vision-Language Model for 3D/4D Facial Expression Recognition
by: Behzad, Muzammil
Published: (2025)

Generative Adversarial Synthesis and Deep Feature Discrimination of Brain Tumor MRI Images
by: Ali, Md Sumon, et al.
Published: (2025)

Unsupervised Multiview Contrastive Language-Image Joint Learning with Pseudo-Labeled Prompts Via Vision-Language Model for 3D/4D Facial Expression Recognition
by: Behzad, Muzammil
Published: (2025)

DetRefiner: Model-Agnostic Detection Refinement with Feature Fusion Transformer
by: Okazaki, Soichiro, et al.
Published: (2026)

Multispectral Detection Transformer with Infrared-Centric Feature Fusion
by: Hwang, Seongmin, et al.
Published: (2025)

CurriFlow: Curriculum-Guided Depth Fusion with Optical Flow-Based Temporal Alignment for 3D Semantic Scene Completion
by: Lin, Jinzhou, et al.
Published: (2025)

AquaDiff: Diffusion-Based Underwater Image Enhancement for Addressing Color Distortion
by: Shaahid, Afrah, et al.
Published: (2025)

Graph-Based Uncertainty Modeling and Multimodal Fusion for Salient Object Detection
by: Xiong, Yuqi, et al.
Published: (2025)

SketchFusion: Learning Universal Sketch Features through Fusing Foundation Models
by: Koley, Subhadeep, et al.
Published: (2025)

SmokeBench: Evaluating Multimodal Large Language Models for Wildfire Smoke Detection
by: Qi, Tianye, et al.
Published: (2025)

Rethinking Dense Optical Flow without Test-Time Scaling
by: Chanda, Praroop, et al.
Published: (2026)

Deformable Attentive Visual Enhancement for Referring Segmentation Using Vision-Language Model
by: Dalaq, Alaa, et al.
Published: (2025)

DyFFPAD: Dynamic Fusion of Convolutional and Handcrafted Features for Fingerprint Presentation Attack Detection
by: Rai, Anuj, et al.
Published: (2023)

Contrastive Language-Image Learning with Augmented Textual Prompts for 3D/4D FER Using Vision-Language Model
by: Behzad, Muzammil, et al.
Published: (2025)

Depth Estimation Algorithm Based on Transformer-Encoder and Feature Fusion
by: Xia, Linhan, et al.
Published: (2024)

Prior-guided Fusion of Multimodal Features for Change Detection from Optical-SAR Images
by: Liu, Xuanguang, et al.
Published: (2026)

Scaling Down to Scale Up: Towards Operationally-Efficient and Deployable Clinical Models via Cross-Modal Low-Rank Adaptation for Medical Vision-Language Models
by: Alzubaidi, Thuraya, et al.
Published: (2025)

Attention Based Feature Fusion Network for Monkeypox Skin Lesion Detection
by: Kundu, Niloy Kumar, et al.
Published: (2024)

Prompt-Guided Patch UNet-VAE with Adversarial Supervision for Adrenal Gland Segmentation in Computed Tomography Medical Images
by: Ghouse, Hania, et al.
Published: (2025)

FlowIt: Global Matching via Hierarchical Transformers and Optimal Transport for Optical Flow
by: Safadoust, Sadra, et al.
Published: (2026)

From Graphs to Gates: DNS-HyXNet, A Lightweight and Deployable Sequential Model for Real-Time DNS Tunnel Detection
by: Ali, Faraz, et al.
Published: (2025)

Exploiting Optical Flow Guidance for Transformer-Based Video Inpainting
by: Zhang, Kaidong, et al.
Published: (2023)

U$^{2}$Flow: Uncertainty-Aware Unsupervised Optical Flow Estimation
by: Sun, Xunpei, et al.
Published: (2026)

A Deformable Attention-Based Detection Transformer with Cross-Scale Feature Fusion for Industrial Coil Spring Inspection
by: Rossi, Matteo, et al.
Published: (2026)

SAR-Based Marine Oil Spill Detection Using the DeepSegFusion Architecture
by: Yata, Pavan Kumar, et al.
Published: (2026)

Improving Optical Flow and Stereo Depth Estimation by Leveraging Uncertainty-Based Learning Difficulties
by: Jeong, Jisoo, et al.
Published: (2025)

Uncertainty Quantification in Detection Transformers: Object-Level Calibration and Image-Level Reliability
by: Park, Young-Jin, et al.
Published: (2024)

Facial Demorphing via Identity Preserving Image Decomposition
by: Shukla, Nitish, et al.
Published: (2024)

Underwater Diffusion Attention Network with Contrastive Language-Image Joint Learning for Underwater Image Enhancement
by: Shaahid, Afrah, et al.
Published: (2025)

Cross Resolution Encoding-Decoding For Detection Transformers
by: Kumar, Ashish, et al.
Published: (2024)

Learning Optical Flow and Scene Flow with Bidirectional Camera-LiDAR Fusion
by: Liu, Haisong, et al.
Published: (2023)