Saved in:
| Main Authors: | Mahala, Nitish Kumar, Khan, Muzammil, Kumar, Pushpendra |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.14597 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Facial Emotion Learning with Text-Guided Multiview Fusion via Vision-Language Model for 3D/4D Facial Expression Recognition
by: Behzad, Muzammil
Published: (2025)
by: Behzad, Muzammil
Published: (2025)
SwinTF3D: A Lightweight Multimodal Fusion Approach for Text-Guided 3D Medical Image Segmentation
by: Khan, Hasan Faraz, et al.
Published: (2025)
by: Khan, Hasan Faraz, et al.
Published: (2025)
SynthGenNet: a self-supervised approach for test-time generalization using synthetic multi-source domain mixing of street view images
by: Dhakara, Pushpendra, et al.
Published: (2025)
by: Dhakara, Pushpendra, et al.
Published: (2025)
Uncertainty-Guided Inference-Time Depth Adaptation for Transformer-Based Visual Tracking
by: Poggi, Patrick, et al.
Published: (2026)
by: Poggi, Patrick, et al.
Published: (2026)
Beam-Guided Knowledge Replay for Knowledge-Rich Image Captioning using Vision-Language Model
by: AlJunaid, Reem, et al.
Published: (2025)
by: AlJunaid, Reem, et al.
Published: (2025)
Skin Cancer Classification: Hybrid CNN-Transformer Models with KAN-Based Fusion
by: Agarwal, Shubhi, et al.
Published: (2025)
by: Agarwal, Shubhi, et al.
Published: (2025)
Handcrafted Feature Fusion for Reliable Detection of AI-Generated Images
by: Nirob, Syed Mehedi Hasan, et al.
Published: (2026)
by: Nirob, Syed Mehedi Hasan, et al.
Published: (2026)
Reliable Object Tracking by Multimodal Hybrid Feature Extraction and Transformer-Based Fusion
by: Sun, Hongze, et al.
Published: (2024)
by: Sun, Hongze, et al.
Published: (2024)
Attack-Aware Deepfake Detection under Counter-Forensic Manipulations
by: Fatima, Noor, et al.
Published: (2025)
by: Fatima, Noor, et al.
Published: (2025)
Self-Supervised Multi-View Representation Learning using Vision-Language Model for 3D/4D Facial Expression Recognition
by: Behzad, Muzammil
Published: (2025)
by: Behzad, Muzammil
Published: (2025)
Generative Adversarial Synthesis and Deep Feature Discrimination of Brain Tumor MRI Images
by: Ali, Md Sumon, et al.
Published: (2025)
by: Ali, Md Sumon, et al.
Published: (2025)
Unsupervised Multiview Contrastive Language-Image Joint Learning with Pseudo-Labeled Prompts Via Vision-Language Model for 3D/4D Facial Expression Recognition
by: Behzad, Muzammil
Published: (2025)
by: Behzad, Muzammil
Published: (2025)
DetRefiner: Model-Agnostic Detection Refinement with Feature Fusion Transformer
by: Okazaki, Soichiro, et al.
Published: (2026)
by: Okazaki, Soichiro, et al.
Published: (2026)
Multispectral Detection Transformer with Infrared-Centric Feature Fusion
by: Hwang, Seongmin, et al.
Published: (2025)
by: Hwang, Seongmin, et al.
Published: (2025)
CurriFlow: Curriculum-Guided Depth Fusion with Optical Flow-Based Temporal Alignment for 3D Semantic Scene Completion
by: Lin, Jinzhou, et al.
Published: (2025)
by: Lin, Jinzhou, et al.
Published: (2025)
AquaDiff: Diffusion-Based Underwater Image Enhancement for Addressing Color Distortion
by: Shaahid, Afrah, et al.
Published: (2025)
by: Shaahid, Afrah, et al.
Published: (2025)
Graph-Based Uncertainty Modeling and Multimodal Fusion for Salient Object Detection
by: Xiong, Yuqi, et al.
Published: (2025)
by: Xiong, Yuqi, et al.
Published: (2025)
SketchFusion: Learning Universal Sketch Features through Fusing Foundation Models
by: Koley, Subhadeep, et al.
Published: (2025)
by: Koley, Subhadeep, et al.
Published: (2025)
SmokeBench: Evaluating Multimodal Large Language Models for Wildfire Smoke Detection
by: Qi, Tianye, et al.
Published: (2025)
by: Qi, Tianye, et al.
Published: (2025)
Rethinking Dense Optical Flow without Test-Time Scaling
by: Chanda, Praroop, et al.
Published: (2026)
by: Chanda, Praroop, et al.
Published: (2026)
Deformable Attentive Visual Enhancement for Referring Segmentation Using Vision-Language Model
by: Dalaq, Alaa, et al.
Published: (2025)
by: Dalaq, Alaa, et al.
Published: (2025)
DyFFPAD: Dynamic Fusion of Convolutional and Handcrafted Features for Fingerprint Presentation Attack Detection
by: Rai, Anuj, et al.
Published: (2023)
by: Rai, Anuj, et al.
Published: (2023)
Contrastive Language-Image Learning with Augmented Textual Prompts for 3D/4D FER Using Vision-Language Model
by: Behzad, Muzammil, et al.
Published: (2025)
by: Behzad, Muzammil, et al.
Published: (2025)
Depth Estimation Algorithm Based on Transformer-Encoder and Feature Fusion
by: Xia, Linhan, et al.
Published: (2024)
by: Xia, Linhan, et al.
Published: (2024)
Prior-guided Fusion of Multimodal Features for Change Detection from Optical-SAR Images
by: Liu, Xuanguang, et al.
Published: (2026)
by: Liu, Xuanguang, et al.
Published: (2026)
Scaling Down to Scale Up: Towards Operationally-Efficient and Deployable Clinical Models via Cross-Modal Low-Rank Adaptation for Medical Vision-Language Models
by: Alzubaidi, Thuraya, et al.
Published: (2025)
by: Alzubaidi, Thuraya, et al.
Published: (2025)
Attention Based Feature Fusion Network for Monkeypox Skin Lesion Detection
by: Kundu, Niloy Kumar, et al.
Published: (2024)
by: Kundu, Niloy Kumar, et al.
Published: (2024)
Prompt-Guided Patch UNet-VAE with Adversarial Supervision for Adrenal Gland Segmentation in Computed Tomography Medical Images
by: Ghouse, Hania, et al.
Published: (2025)
by: Ghouse, Hania, et al.
Published: (2025)
FlowIt: Global Matching via Hierarchical Transformers and Optimal Transport for Optical Flow
by: Safadoust, Sadra, et al.
Published: (2026)
by: Safadoust, Sadra, et al.
Published: (2026)
From Graphs to Gates: DNS-HyXNet, A Lightweight and Deployable Sequential Model for Real-Time DNS Tunnel Detection
by: Ali, Faraz, et al.
Published: (2025)
by: Ali, Faraz, et al.
Published: (2025)
Exploiting Optical Flow Guidance for Transformer-Based Video Inpainting
by: Zhang, Kaidong, et al.
Published: (2023)
by: Zhang, Kaidong, et al.
Published: (2023)
U$^{2}$Flow: Uncertainty-Aware Unsupervised Optical Flow Estimation
by: Sun, Xunpei, et al.
Published: (2026)
by: Sun, Xunpei, et al.
Published: (2026)
A Deformable Attention-Based Detection Transformer with Cross-Scale Feature Fusion for Industrial Coil Spring Inspection
by: Rossi, Matteo, et al.
Published: (2026)
by: Rossi, Matteo, et al.
Published: (2026)
SAR-Based Marine Oil Spill Detection Using the DeepSegFusion Architecture
by: Yata, Pavan Kumar, et al.
Published: (2026)
by: Yata, Pavan Kumar, et al.
Published: (2026)
Improving Optical Flow and Stereo Depth Estimation by Leveraging Uncertainty-Based Learning Difficulties
by: Jeong, Jisoo, et al.
Published: (2025)
by: Jeong, Jisoo, et al.
Published: (2025)
Uncertainty Quantification in Detection Transformers: Object-Level Calibration and Image-Level Reliability
by: Park, Young-Jin, et al.
Published: (2024)
by: Park, Young-Jin, et al.
Published: (2024)
Facial Demorphing via Identity Preserving Image Decomposition
by: Shukla, Nitish, et al.
Published: (2024)
by: Shukla, Nitish, et al.
Published: (2024)
Underwater Diffusion Attention Network with Contrastive Language-Image Joint Learning for Underwater Image Enhancement
by: Shaahid, Afrah, et al.
Published: (2025)
by: Shaahid, Afrah, et al.
Published: (2025)
Cross Resolution Encoding-Decoding For Detection Transformers
by: Kumar, Ashish, et al.
Published: (2024)
by: Kumar, Ashish, et al.
Published: (2024)
Learning Optical Flow and Scene Flow with Bidirectional Camera-LiDAR Fusion
by: Liu, Haisong, et al.
Published: (2023)
by: Liu, Haisong, et al.
Published: (2023)
Similar Items
-
Facial Emotion Learning with Text-Guided Multiview Fusion via Vision-Language Model for 3D/4D Facial Expression Recognition
by: Behzad, Muzammil
Published: (2025) -
SwinTF3D: A Lightweight Multimodal Fusion Approach for Text-Guided 3D Medical Image Segmentation
by: Khan, Hasan Faraz, et al.
Published: (2025) -
SynthGenNet: a self-supervised approach for test-time generalization using synthetic multi-source domain mixing of street view images
by: Dhakara, Pushpendra, et al.
Published: (2025) -
Uncertainty-Guided Inference-Time Depth Adaptation for Transformer-Based Visual Tracking
by: Poggi, Patrick, et al.
Published: (2026) -
Beam-Guided Knowledge Replay for Knowledge-Rich Image Captioning using Vision-Language Model
by: AlJunaid, Reem, et al.
Published: (2025)