Saved in:
| Main Authors: | Kumar, Deepak, Singh, Abhishek Pratap, Kumar, Puneet, Li, Xiaobai, Raman, Balasubramanian |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.16214 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
VISTANet: VIsual Spoken Textual Additive Net for Interpretable Multimodal Emotion Recognition
by: Kumar, Puneet, et al.
Published: (2022)
by: Kumar, Puneet, et al.
Published: (2022)
Interpretable Image Emotion Recognition: A Domain Adaptation Approach Using Facial Expressions
by: Kumar, Puneet, et al.
Published: (2020)
by: Kumar, Puneet, et al.
Published: (2020)
Vision Large Language Models Are Good Noise Handlers in Engagement Analysis
by: Vedernikov, Alexander, et al.
Published: (2025)
by: Vedernikov, Alexander, et al.
Published: (2025)
VisioPhysioENet: Visual Physiological Engagement Detection Network
by: Singh, Alakhsimar, et al.
Published: (2024)
by: Singh, Alakhsimar, et al.
Published: (2024)
TCCT-Net: Two-Stream Network Architecture for Fast and Efficient Engagement Estimation via Behavioral Feature Signals
by: Vedernikov, Alexander, et al.
Published: (2024)
by: Vedernikov, Alexander, et al.
Published: (2024)
Biasing & Debiasing based Approach Towards Fair Knowledge Transfer for Equitable Skin Analysis
by: Pundhir, Anshul, et al.
Published: (2024)
by: Pundhir, Anshul, et al.
Published: (2024)
Synthesizing Sentiment-Controlled Feedback For Multimodal Text and Image Data
by: Kumar, Puneet, et al.
Published: (2024)
by: Kumar, Puneet, et al.
Published: (2024)
FashionSD-X: Multimodal Fashion Garment Synthesis using Latent Diffusion
by: Singh, Abhishek Kumar, et al.
Published: (2024)
by: Singh, Abhishek Kumar, et al.
Published: (2024)
Contrast-Phys: Unsupervised Video-based Remote Physiological Measurement via Spatiotemporal Contrast
by: Sun, Zhaodong, et al.
Published: (2022)
by: Sun, Zhaodong, et al.
Published: (2022)
Contrast-Phys+: Unsupervised and Weakly-supervised Video-based Remote Physiological Measurement via Spatiotemporal Contrast
by: Sun, Zhaodong, et al.
Published: (2023)
by: Sun, Zhaodong, et al.
Published: (2023)
Active Multimodal Distillation for Few-shot Action Recognition
by: Feng, Weijia, et al.
Published: (2025)
by: Feng, Weijia, et al.
Published: (2025)
Advanced Gesture Recognition for Autism Spectrum Disorder Detection: Integrating YOLOv7, Video Augmentation, and VideoMAE for Naturalistic Video Analysis
by: Singh, Amit Kumar, et al.
Published: (2024)
by: Singh, Amit Kumar, et al.
Published: (2024)
A Large-Scale Analysis on Contextual Self-Supervised Video Representation Learning
by: Kumar, Akash, et al.
Published: (2025)
by: Kumar, Akash, et al.
Published: (2025)
On Evaluation of Vision Datasets and Models using Human Competency Frameworks
by: Ramachandran, Rahul, et al.
Published: (2024)
by: Ramachandran, Rahul, et al.
Published: (2024)
Efficient Human Pose Estimation: Leveraging Advanced Techniques with MediaPipe
by: Sengar, Sandeep Singh, et al.
Published: (2024)
by: Sengar, Sandeep Singh, et al.
Published: (2024)
VideoNet: A Large-Scale Dataset for Domain-Specific Action Recognition
by: Yadav, Tanush, et al.
Published: (2026)
by: Yadav, Tanush, et al.
Published: (2026)
FROQ: Observing Face Recognition Models for Efficient Quality Assessment
by: Babnik, Žiga, et al.
Published: (2025)
by: Babnik, Žiga, et al.
Published: (2025)
RL-AD-Net: Reinforcement Learning Guided Adaptive Displacement in Latent Space for Refined Point Cloud Completion
by: Paregi, Bhanu Pratap, et al.
Published: (2025)
by: Paregi, Bhanu Pratap, et al.
Published: (2025)
A Benchmark for Incremental Micro-expression Recognition
by: Lai, Zhengqin, et al.
Published: (2025)
by: Lai, Zhengqin, et al.
Published: (2025)
RecruitView: A Multimodal Dataset for Predicting Personality and Interview Performance for Human Resources Applications
by: Gupta, Amit Kumar, et al.
Published: (2025)
by: Gupta, Amit Kumar, et al.
Published: (2025)
TabSniper: Towards Accurate Table Detection & Structure Recognition for Bank Statements
by: Trivedi, Abhishek, et al.
Published: (2024)
by: Trivedi, Abhishek, et al.
Published: (2024)
Data Leakage Detection and De-duplication in Large Scale Geospatial Image Datasets
by: Adimoolam, Yeshwanth Kumar, et al.
Published: (2023)
by: Adimoolam, Yeshwanth Kumar, et al.
Published: (2023)
Capsule Vision 2024 Challenge: Multi-Class Abnormality Classification for Video Capsule Endoscopy
by: Handa, Palak, et al.
Published: (2024)
by: Handa, Palak, et al.
Published: (2024)
Multimodal Fusion Learning with Dual Attention for Medical Imaging
by: Dhar, Joy, et al.
Published: (2024)
by: Dhar, Joy, et al.
Published: (2024)
DALight-3D: A Lightweight 3D U-Net for Brain Tumor Segmentation from Multi-Modal MRI
by: Mishra, Nand Kumar, et al.
Published: (2026)
by: Mishra, Nand Kumar, et al.
Published: (2026)
AffectSRNet : Facial Emotion-Aware Super-Resolution Network
by: Rizvi, Syed Sameen Ahmad, et al.
Published: (2025)
by: Rizvi, Syed Sameen Ahmad, et al.
Published: (2025)
Exploring Remote Photoplethysmography for Neonatal Pain Detection from Facial Videos
by: Dhamaniya, Ashutosh, et al.
Published: (2026)
by: Dhamaniya, Ashutosh, et al.
Published: (2026)
A Multimodal Dataset for Enhancing Industrial Task Monitoring and Engagement Prediction
by: Mehta, Naval Kishore, et al.
Published: (2025)
by: Mehta, Naval Kishore, et al.
Published: (2025)
CaMML: Context-Aware Multimodal Learner for Large Models
by: Chen, Yixin, et al.
Published: (2024)
by: Chen, Yixin, et al.
Published: (2024)
Robust Context-Aware Object Recognition
by: Janouskova, Klara, et al.
Published: (2025)
by: Janouskova, Klara, et al.
Published: (2025)
Investigating the Quality of DermaMNIST and Fitzpatrick17k Dermatological Image Datasets
by: Abhishek, Kumar, et al.
Published: (2024)
by: Abhishek, Kumar, et al.
Published: (2024)
Weed Detection using Convolutional Neural Network
by: Tripathi, Santosh Kumar, et al.
Published: (2025)
by: Tripathi, Santosh Kumar, et al.
Published: (2025)
Multimodal Emotion Recognition via Causal-Diffusion Bridge (Affect-Diff)
by: Sanjyal, Ankit
Published: (2026)
by: Sanjyal, Ankit
Published: (2026)
A Generative Approach to High Fidelity 3D Reconstruction from Text Data
by: R, Venkat Kumar, et al.
Published: (2025)
by: R, Venkat Kumar, et al.
Published: (2025)
Advancing Multimodal LLMs by Large-Scale 3D Visual Instruction Dataset Generation
by: He, Liu, et al.
Published: (2025)
by: He, Liu, et al.
Published: (2025)
Survey of Large Multimodal Model Datasets, Application Categories and Taxonomy
by: Pattnayak, Priyaranjan, et al.
Published: (2024)
by: Pattnayak, Priyaranjan, et al.
Published: (2024)
BelHouse3D: A Benchmark Dataset for Assessing Occlusion Robustness in 3D Point Cloud Semantic Segmentation
by: Kumar, Umamaheswaran Raman, et al.
Published: (2024)
by: Kumar, Umamaheswaran Raman, et al.
Published: (2024)
A Large-Scale Study on Video Action Dataset Condensation
by: Chen, Yang, et al.
Published: (2024)
by: Chen, Yang, et al.
Published: (2024)
Embody 3D: A Large-scale Multimodal Motion and Behavior Dataset
by: McLean, Claire, et al.
Published: (2025)
by: McLean, Claire, et al.
Published: (2025)
Motion-X++: A Large-Scale Multimodal 3D Whole-body Human Motion Dataset
by: Zhang, Yuhong, et al.
Published: (2025)
by: Zhang, Yuhong, et al.
Published: (2025)
Similar Items
-
VISTANet: VIsual Spoken Textual Additive Net for Interpretable Multimodal Emotion Recognition
by: Kumar, Puneet, et al.
Published: (2022) -
Interpretable Image Emotion Recognition: A Domain Adaptation Approach Using Facial Expressions
by: Kumar, Puneet, et al.
Published: (2020) -
Vision Large Language Models Are Good Noise Handlers in Engagement Analysis
by: Vedernikov, Alexander, et al.
Published: (2025) -
VisioPhysioENet: Visual Physiological Engagement Detection Network
by: Singh, Alakhsimar, et al.
Published: (2024) -
TCCT-Net: Two-Stream Network Architecture for Fast and Efficient Engagement Estimation via Behavioral Feature Signals
by: Vedernikov, Alexander, et al.
Published: (2024)