:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Mehta, Naval Kishore, Arvind, Prasad, Shyam Sunder, Saurav, Sumeet, Singh, Sanjay
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2501.05108
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

A Multimodal Dataset for Enhancing Industrial Task Monitoring and Engagement Prediction
by: Mehta, Naval Kishore, et al.
Published: (2025)

Gaze-Vector Estimation in the Dark with Temporally Encoded Event-driven Neural Networks
by: Banerjee, Abeer, et al.
Published: (2024)

Dual Guidance Semi-Supervised Action Detection
by: Singh, Ankit, et al.
Published: (2025)

fine-CLIP: Enhancing Zero-Shot Fine-Grained Surgical Action Recognition with Vision-Language Models
by: Sharma, Saurav, et al.
Published: (2025)

Uncertainty-Guided Appearance-Motion Association Network for Out-of-Distribution Action Detection
by: Fang, Xiang, et al.
Published: (2024)

Multi-Level LVLM Guidance for Untrimmed Video Action Recognition
by: Peng, Liyang, et al.
Published: (2025)

Zero-Shot Temporal Action Localization Through Textual Guidance
by: Liberatori, Benedetta, et al.
Published: (2026)

Open-Vocabulary Temporal Action Localization using Multimodal Guidance
by: Gupta, Akshita, et al.
Published: (2024)

State-Change Learning for Prediction of Future Events in Endoscopic Videos
by: Sharma, Saurav, et al.
Published: (2025)

Action Recognition based Industrial Safety Violation Detection
by: Reddy, Surya N, et al.
Published: (2024)

Cross-Task Affinity Learning for Multitask Dense Scene Predictions
by: Sinodinos, Dimitrios, et al.
Published: (2024)

Multitasking Embedding for Embryo Blastocyst Grading Prediction (MEmEBG)
by: Angabini, Nahid Khoshk, et al.
Published: (2026)

BeLLA: End-to-End Birds Eye View Large Language Assistant for Autonomous Driving
by: Mohan, Karthik, et al.
Published: (2025)

Factored Classifier-Free Guidance
by: Xia, Tian, et al.
Published: (2025)

HQ-JEPA: Hybrid Quantum Joint-Embedding Predictive Architecture for Cross-Modal Remote Sensing Representation Learning
by: Hossain, Md Aminur, et al.
Published: (2026)

Automatic Discovery and Assessment of Interpretable Systematic Errors in Semantic Segmentation
by: Singh, Jaisidh, et al.
Published: (2024)

IAP: Invisible Adversarial Patch Attack through Perceptibility-Aware Localization and Perturbation Optimization
by: Dutta, Subrat Kishore, et al.
Published: (2025)

CrunchLLM: Multitask LLMs for Structured Business Reasoning and Outcome Prediction
by: Sadia, Rabeya Tus, et al.
Published: (2025)

World Guidance: World Modeling in Condition Space for Action Generation
by: Su, Yue, et al.
Published: (2026)

Egocentric Action-aware Inertial Localization in Point Clouds with Vision-Language Guidance
by: Zhang, Mingfang, et al.
Published: (2025)

NutritionVerse-Direct: Exploring Deep Neural Networks for Multitask Nutrition Prediction from Food Images
by: Keller, Matthew, et al.
Published: (2024)

Frequency Guidance Matters: Skeletal Action Recognition by Frequency-Aware Mixed Transformer
by: Wu, Wenhan, et al.
Published: (2024)

Efficient Multitask Dense Predictor via Binarization
by: Shang, Yuzhang, et al.
Published: (2024)

Group Diffusion Transformers are Unsupervised Multitask Learners
by: Huang, Lianghua, et al.
Published: (2024)

IPAD: Industrial Process Anomaly Detection Dataset
by: Liu, Jinfan, et al.
Published: (2024)

Cross-Domain Identity Representation for Skull to Face Matching with Benchmark DataSet
by: Prasad, Ravi Shankar, et al.
Published: (2025)

FCR: Investigating Generative AI models for Forensic Craniofacial Reconstruction
by: Prasad, Ravi Shankar, et al.
Published: (2025)

SPOT-Face: Forensic Face Identification using Attention Guided Optimal Transport
by: Prasad, Ravi Shankar, et al.
Published: (2026)

Can Multi-Modal LLMs Provide Live Step-by-Step Task Guidance?
by: Bhattacharyya, Apratim, et al.
Published: (2025)

Automatic Report Generation for Histopathology images using pre-trained Vision Transformers and BERT
by: Sengupta, Saurav, et al.
Published: (2023)

Learning Streaming Video Representation via Multitask Training
by: Yan, Yibin, et al.
Published: (2025)

Efficient Inter-Task Attention for Multitask Transformer Models
by: Bohn, Christian, et al.
Published: (2025)

Noise-Free Explanation for Driving Action Prediction
by: Zhu, Hongbo, et al.
Published: (2024)

Text-Driven Weakly Supervised OCT Lesion Segmentation with Structural Guidance
by: Yang, Jiaqi, et al.
Published: (2024)

FedSCAl: Leveraging Server and Client Alignment for Unsupervised Federated Source-Free Domain Adaptation
by: Yashwanth, M, et al.
Published: (2025)

USAM-Net: A U-Net-based Network for Improved Stereo Correspondence and Scene Depth Estimation using Features from a Pre-trained Image Segmentation network
by: Dayo, Joseph Emmanuel DL, et al.
Published: (2025)

HandDreamer: Zero-Shot Text to 3D Hand Model Generation using Corrective Hand Shape Guidance
by: Rosh, Green, et al.
Published: (2026)

EAGLE: Expert-Augmented Attention Guidance for Tuning-Free Industrial Anomaly Detection in Multimodal Large Language Models
by: Peng, Xiaomeng, et al.
Published: (2026)

ALMRR: Anomaly Localization Mamba on Industrial Textured Surface with Feature Reconstruction and Refinement
by: Qu, Shichen, et al.
Published: (2024)

A Method of Moments Embedding Constraint and its Application to Semi-Supervised Learning
by: Majurski, Michael, et al.
Published: (2024)