:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Kumar, Deepak, Singh, Abhishek Pratap, Kumar, Puneet, Li, Xiaobai, Raman, Balasubramanian
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2604.16214
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

VISTANet: VIsual Spoken Textual Additive Net for Interpretable Multimodal Emotion Recognition
by: Kumar, Puneet, et al.
Published: (2022)

Interpretable Image Emotion Recognition: A Domain Adaptation Approach Using Facial Expressions
by: Kumar, Puneet, et al.
Published: (2020)

Vision Large Language Models Are Good Noise Handlers in Engagement Analysis
by: Vedernikov, Alexander, et al.
Published: (2025)

VisioPhysioENet: Visual Physiological Engagement Detection Network
by: Singh, Alakhsimar, et al.
Published: (2024)

TCCT-Net: Two-Stream Network Architecture for Fast and Efficient Engagement Estimation via Behavioral Feature Signals
by: Vedernikov, Alexander, et al.
Published: (2024)

Biasing & Debiasing based Approach Towards Fair Knowledge Transfer for Equitable Skin Analysis
by: Pundhir, Anshul, et al.
Published: (2024)

Synthesizing Sentiment-Controlled Feedback For Multimodal Text and Image Data
by: Kumar, Puneet, et al.
Published: (2024)

FashionSD-X: Multimodal Fashion Garment Synthesis using Latent Diffusion
by: Singh, Abhishek Kumar, et al.
Published: (2024)

Contrast-Phys: Unsupervised Video-based Remote Physiological Measurement via Spatiotemporal Contrast
by: Sun, Zhaodong, et al.
Published: (2022)

Contrast-Phys+: Unsupervised and Weakly-supervised Video-based Remote Physiological Measurement via Spatiotemporal Contrast
by: Sun, Zhaodong, et al.
Published: (2023)

Active Multimodal Distillation for Few-shot Action Recognition
by: Feng, Weijia, et al.
Published: (2025)

Advanced Gesture Recognition for Autism Spectrum Disorder Detection: Integrating YOLOv7, Video Augmentation, and VideoMAE for Naturalistic Video Analysis
by: Singh, Amit Kumar, et al.
Published: (2024)

A Large-Scale Analysis on Contextual Self-Supervised Video Representation Learning
by: Kumar, Akash, et al.
Published: (2025)

On Evaluation of Vision Datasets and Models using Human Competency Frameworks
by: Ramachandran, Rahul, et al.
Published: (2024)

Efficient Human Pose Estimation: Leveraging Advanced Techniques with MediaPipe
by: Sengar, Sandeep Singh, et al.
Published: (2024)

VideoNet: A Large-Scale Dataset for Domain-Specific Action Recognition
by: Yadav, Tanush, et al.
Published: (2026)

FROQ: Observing Face Recognition Models for Efficient Quality Assessment
by: Babnik, Žiga, et al.
Published: (2025)

RL-AD-Net: Reinforcement Learning Guided Adaptive Displacement in Latent Space for Refined Point Cloud Completion
by: Paregi, Bhanu Pratap, et al.
Published: (2025)

A Benchmark for Incremental Micro-expression Recognition
by: Lai, Zhengqin, et al.
Published: (2025)

RecruitView: A Multimodal Dataset for Predicting Personality and Interview Performance for Human Resources Applications
by: Gupta, Amit Kumar, et al.
Published: (2025)

TabSniper: Towards Accurate Table Detection & Structure Recognition for Bank Statements
by: Trivedi, Abhishek, et al.
Published: (2024)

Data Leakage Detection and De-duplication in Large Scale Geospatial Image Datasets
by: Adimoolam, Yeshwanth Kumar, et al.
Published: (2023)

Capsule Vision 2024 Challenge: Multi-Class Abnormality Classification for Video Capsule Endoscopy
by: Handa, Palak, et al.
Published: (2024)

Multimodal Fusion Learning with Dual Attention for Medical Imaging
by: Dhar, Joy, et al.
Published: (2024)

DALight-3D: A Lightweight 3D U-Net for Brain Tumor Segmentation from Multi-Modal MRI
by: Mishra, Nand Kumar, et al.
Published: (2026)

AffectSRNet : Facial Emotion-Aware Super-Resolution Network
by: Rizvi, Syed Sameen Ahmad, et al.
Published: (2025)

Exploring Remote Photoplethysmography for Neonatal Pain Detection from Facial Videos
by: Dhamaniya, Ashutosh, et al.
Published: (2026)

A Multimodal Dataset for Enhancing Industrial Task Monitoring and Engagement Prediction
by: Mehta, Naval Kishore, et al.
Published: (2025)

CaMML: Context-Aware Multimodal Learner for Large Models
by: Chen, Yixin, et al.
Published: (2024)

Robust Context-Aware Object Recognition
by: Janouskova, Klara, et al.
Published: (2025)

Investigating the Quality of DermaMNIST and Fitzpatrick17k Dermatological Image Datasets
by: Abhishek, Kumar, et al.
Published: (2024)

Weed Detection using Convolutional Neural Network
by: Tripathi, Santosh Kumar, et al.
Published: (2025)

Multimodal Emotion Recognition via Causal-Diffusion Bridge (Affect-Diff)
by: Sanjyal, Ankit
Published: (2026)

A Generative Approach to High Fidelity 3D Reconstruction from Text Data
by: R, Venkat Kumar, et al.
Published: (2025)

Advancing Multimodal LLMs by Large-Scale 3D Visual Instruction Dataset Generation
by: He, Liu, et al.
Published: (2025)

Survey of Large Multimodal Model Datasets, Application Categories and Taxonomy
by: Pattnayak, Priyaranjan, et al.
Published: (2024)

BelHouse3D: A Benchmark Dataset for Assessing Occlusion Robustness in 3D Point Cloud Semantic Segmentation
by: Kumar, Umamaheswaran Raman, et al.
Published: (2024)

A Large-Scale Study on Video Action Dataset Condensation
by: Chen, Yang, et al.
Published: (2024)

Embody 3D: A Large-scale Multimodal Motion and Behavior Dataset
by: McLean, Claire, et al.
Published: (2025)

Motion-X++: A Large-Scale Multimodal 3D Whole-body Human Motion Dataset
by: Zhang, Yuhong, et al.
Published: (2025)