:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Mehta, Naval Kishore, Arvind, Kumar, Himanshu, Banerjee, Abeer, Saurav, Sumeet, Singh, Sanjay
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2501.05936
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Optimizing Multitask Industrial Processes with Predictive Action Guidance
by: Mehta, Naval Kishore, et al.
Published: (2025)

Gaze-Vector Estimation in the Dark with Temporally Encoded Event-driven Neural Networks
by: Banerjee, Abeer, et al.
Published: (2024)

Towards Lensless Image Deblurring with Prior-Embedded Implicit Neural Representations in the Low-Data Regime
by: Banerjee, Abeer, et al.
Published: (2024)

Towards Physics-informed Cyclic Adversarial Multi-PSF Lensless Imaging
by: Banerjee, Abeer, et al.
Published: (2024)

GLOFNet -- A Multimodal Dataset for GLOF Monitoring and Prediction
by: Fatima, Zuha, et al.
Published: (2025)

HQ-JEPA: Hybrid Quantum Joint-Embedding Predictive Architecture for Cross-Modal Remote Sensing Representation Learning
by: Hossain, Md Aminur, et al.
Published: (2026)

SKoPe3D: A Synthetic Dataset for Vehicle Keypoint Perception in 3D from Traffic Monitoring Cameras
by: Pahadia, Himanshu, et al.
Published: (2023)

Engagement Prediction of Short Videos with Large Multimodal Models
by: Sun, Wei, et al.
Published: (2025)

SmartWilds: Multimodal Wildlife Monitoring Dataset
by: Kline, Jenna, et al.
Published: (2025)

Enhancing Saliency Prediction in Monitoring Tasks: The Role of Visual Highlights
by: Wu, Zekun, et al.
Published: (2024)

Multimodal Fusion of Glucose Monitoring and Food Imagery for Caloric Content Prediction
by: Kumar, Adarsh
Published: (2025)

iOSPointMapper: RealTime Pedestrian and Accessibility Mapping with Mobile AI
by: Naidu, Himanshu, et al.
Published: (2025)

GAViD: A Large-Scale Multimodal Dataset for Context-Aware Group Affect Recognition from Videos
by: Kumar, Deepak, et al.
Published: (2026)

Tracking by Predicting 3-D Gaussians Over Time
by: Baranwal, Tanish, et al.
Published: (2025)

CrossMed: A Multimodal Cross-Task Benchmark for Compositional Generalization in Medical Imaging
by: Singh, Pooja, et al.
Published: (2025)

GradAttn: Replacing Fixed Residual Connections with Task-Modulated Attention Pathways
by: Ghoshal, Soudeep, et al.
Published: (2026)

Enhancing Adverse Drug Event Detection with Multimodal Dataset: Corpus Creation and Model Development
by: Sahoo, Pranab, et al.
Published: (2024)

VisioPhysioENet: Visual Physiological Engagement Detection Network
by: Singh, Alakhsimar, et al.
Published: (2024)

D-HUMOR: Dark Humor Understanding via Multimodal Open-ended Reasoning -- A Benchmark Dataset and Method
by: Kasu, Sai Kartheek Reddy, et al.
Published: (2025)

HOH: Markerless Multimodal Human-Object-Human Handover Dataset with Large Object Count
by: Wiederhold, Noah, et al.
Published: (2023)

MBE-ARI: A Multimodal Dataset Mapping Bi-directional Engagement in Animal-Robot Interaction
by: Noronha, Ian, et al.
Published: (2025)

Towards Open-Vocabulary Industrial Defect Understanding with a Large-Scale Multimodal Dataset
by: Ni, TsaiChing, et al.
Published: (2025)

Curriculum Guided Massive Multi Agent System Solving For Robust Long Horizon Tasks
by: Kar, Indrajit, et al.
Published: (2025)

OpenMarcie: Dataset for Multimodal Action Recognition in Industrial Environments
by: Bello, Hymalai, et al.
Published: (2026)

DAOS: A Multimodal In-cabin Behavior Monitoring with Driver Action-Object Synergy Dataset
by: Li, Yiming, et al.
Published: (2026)

Survey of Large Multimodal Model Datasets, Application Categories and Taxonomy
by: Pattnayak, Priyaranjan, et al.
Published: (2024)

fine-CLIP: Enhancing Zero-Shot Fine-Grained Surgical Action Recognition with Vision-Language Models
by: Sharma, Saurav, et al.
Published: (2025)

Tuned Reverse Distillation: Enhancing Multimodal Industrial Anomaly Detection with Crossmodal Tuners
by: Liu, Xinyue, et al.
Published: (2024)

A Computational Model of Message Sensation Value in Short Video Multimodal Features that Predicts Sensory and Behavioral Engagement
by: Xue, Haoning, et al.
Published: (2026)

A Novel Multimodal System to Predict Agitation in People with Dementia Within Clinical Settings: A Proof of Concept
by: Badawi, Abeer, et al.
Published: (2024)

Leveraging Perceptual Scores for Dataset Pruning in Computer Vision Tasks
by: Singh, Raghavendra
Published: (2024)

RecruitView: A Multimodal Dataset for Predicting Personality and Interview Performance for Human Resources Applications
by: Gupta, Amit Kumar, et al.
Published: (2025)

IJmond Industrial Smoke Segmentation Dataset
by: Hsu, Yen-Chia, et al.
Published: (2026)

Learning to Weigh Waste: A Physics-Informed Multimodal Fusion Framework and Large-Scale Dataset for Commercial and Industrial Applications
by: Islam, Md. Adnanul, et al.
Published: (2026)

Historical Printed Ornaments: Dataset and Tasks
by: Chaki, Sayan Kumar, et al.
Published: (2024)

Harnessing Shared Relations via Multimodal Mixup Contrastive Learning for Multimodal Classification
by: Kumar, Raja, et al.
Published: (2024)

Optimizing Skin Lesion Classification via Multimodal Data and Auxiliary Task Integration
by: Khurshid, Mahapara, et al.
Published: (2024)

Multimodal Event Detection: Current Approaches and Defining the New Playground through LLMs and VLMs
by: Dey, Abhishek, et al.
Published: (2025)

State-Change Learning for Prediction of Future Events in Endoscopic Videos
by: Sharma, Saurav, et al.
Published: (2025)

ADAS-TO: A Large-Scale Multimodal Naturalistic Dataset and Empirical Characterization of Human Takeovers during ADAS Engagement
by: Wang, Yuhang, et al.
Published: (2026)