Saved in:
| Main Authors: | Mehta, Naval Kishore, Arvind, Prasad, Shyam Sunder, Saurav, Sumeet, Singh, Sanjay |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2501.05108 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
A Multimodal Dataset for Enhancing Industrial Task Monitoring and Engagement Prediction
by: Mehta, Naval Kishore, et al.
Published: (2025)
by: Mehta, Naval Kishore, et al.
Published: (2025)
Gaze-Vector Estimation in the Dark with Temporally Encoded Event-driven Neural Networks
by: Banerjee, Abeer, et al.
Published: (2024)
by: Banerjee, Abeer, et al.
Published: (2024)
Dual Guidance Semi-Supervised Action Detection
by: Singh, Ankit, et al.
Published: (2025)
by: Singh, Ankit, et al.
Published: (2025)
fine-CLIP: Enhancing Zero-Shot Fine-Grained Surgical Action Recognition with Vision-Language Models
by: Sharma, Saurav, et al.
Published: (2025)
by: Sharma, Saurav, et al.
Published: (2025)
Uncertainty-Guided Appearance-Motion Association Network for Out-of-Distribution Action Detection
by: Fang, Xiang, et al.
Published: (2024)
by: Fang, Xiang, et al.
Published: (2024)
Multi-Level LVLM Guidance for Untrimmed Video Action Recognition
by: Peng, Liyang, et al.
Published: (2025)
by: Peng, Liyang, et al.
Published: (2025)
Zero-Shot Temporal Action Localization Through Textual Guidance
by: Liberatori, Benedetta, et al.
Published: (2026)
by: Liberatori, Benedetta, et al.
Published: (2026)
Open-Vocabulary Temporal Action Localization using Multimodal Guidance
by: Gupta, Akshita, et al.
Published: (2024)
by: Gupta, Akshita, et al.
Published: (2024)
State-Change Learning for Prediction of Future Events in Endoscopic Videos
by: Sharma, Saurav, et al.
Published: (2025)
by: Sharma, Saurav, et al.
Published: (2025)
Action Recognition based Industrial Safety Violation Detection
by: Reddy, Surya N, et al.
Published: (2024)
by: Reddy, Surya N, et al.
Published: (2024)
Cross-Task Affinity Learning for Multitask Dense Scene Predictions
by: Sinodinos, Dimitrios, et al.
Published: (2024)
by: Sinodinos, Dimitrios, et al.
Published: (2024)
Multitasking Embedding for Embryo Blastocyst Grading Prediction (MEmEBG)
by: Angabini, Nahid Khoshk, et al.
Published: (2026)
by: Angabini, Nahid Khoshk, et al.
Published: (2026)
BeLLA: End-to-End Birds Eye View Large Language Assistant for Autonomous Driving
by: Mohan, Karthik, et al.
Published: (2025)
by: Mohan, Karthik, et al.
Published: (2025)
Factored Classifier-Free Guidance
by: Xia, Tian, et al.
Published: (2025)
by: Xia, Tian, et al.
Published: (2025)
HQ-JEPA: Hybrid Quantum Joint-Embedding Predictive Architecture for Cross-Modal Remote Sensing Representation Learning
by: Hossain, Md Aminur, et al.
Published: (2026)
by: Hossain, Md Aminur, et al.
Published: (2026)
Automatic Discovery and Assessment of Interpretable Systematic Errors in Semantic Segmentation
by: Singh, Jaisidh, et al.
Published: (2024)
by: Singh, Jaisidh, et al.
Published: (2024)
IAP: Invisible Adversarial Patch Attack through Perceptibility-Aware Localization and Perturbation Optimization
by: Dutta, Subrat Kishore, et al.
Published: (2025)
by: Dutta, Subrat Kishore, et al.
Published: (2025)
CrunchLLM: Multitask LLMs for Structured Business Reasoning and Outcome Prediction
by: Sadia, Rabeya Tus, et al.
Published: (2025)
by: Sadia, Rabeya Tus, et al.
Published: (2025)
World Guidance: World Modeling in Condition Space for Action Generation
by: Su, Yue, et al.
Published: (2026)
by: Su, Yue, et al.
Published: (2026)
Egocentric Action-aware Inertial Localization in Point Clouds with Vision-Language Guidance
by: Zhang, Mingfang, et al.
Published: (2025)
by: Zhang, Mingfang, et al.
Published: (2025)
NutritionVerse-Direct: Exploring Deep Neural Networks for Multitask Nutrition Prediction from Food Images
by: Keller, Matthew, et al.
Published: (2024)
by: Keller, Matthew, et al.
Published: (2024)
Frequency Guidance Matters: Skeletal Action Recognition by Frequency-Aware Mixed Transformer
by: Wu, Wenhan, et al.
Published: (2024)
by: Wu, Wenhan, et al.
Published: (2024)
Efficient Multitask Dense Predictor via Binarization
by: Shang, Yuzhang, et al.
Published: (2024)
by: Shang, Yuzhang, et al.
Published: (2024)
Group Diffusion Transformers are Unsupervised Multitask Learners
by: Huang, Lianghua, et al.
Published: (2024)
by: Huang, Lianghua, et al.
Published: (2024)
IPAD: Industrial Process Anomaly Detection Dataset
by: Liu, Jinfan, et al.
Published: (2024)
by: Liu, Jinfan, et al.
Published: (2024)
Cross-Domain Identity Representation for Skull to Face Matching with Benchmark DataSet
by: Prasad, Ravi Shankar, et al.
Published: (2025)
by: Prasad, Ravi Shankar, et al.
Published: (2025)
FCR: Investigating Generative AI models for Forensic Craniofacial Reconstruction
by: Prasad, Ravi Shankar, et al.
Published: (2025)
by: Prasad, Ravi Shankar, et al.
Published: (2025)
SPOT-Face: Forensic Face Identification using Attention Guided Optimal Transport
by: Prasad, Ravi Shankar, et al.
Published: (2026)
by: Prasad, Ravi Shankar, et al.
Published: (2026)
Can Multi-Modal LLMs Provide Live Step-by-Step Task Guidance?
by: Bhattacharyya, Apratim, et al.
Published: (2025)
by: Bhattacharyya, Apratim, et al.
Published: (2025)
Automatic Report Generation for Histopathology images using pre-trained Vision Transformers and BERT
by: Sengupta, Saurav, et al.
Published: (2023)
by: Sengupta, Saurav, et al.
Published: (2023)
Learning Streaming Video Representation via Multitask Training
by: Yan, Yibin, et al.
Published: (2025)
by: Yan, Yibin, et al.
Published: (2025)
Efficient Inter-Task Attention for Multitask Transformer Models
by: Bohn, Christian, et al.
Published: (2025)
by: Bohn, Christian, et al.
Published: (2025)
Noise-Free Explanation for Driving Action Prediction
by: Zhu, Hongbo, et al.
Published: (2024)
by: Zhu, Hongbo, et al.
Published: (2024)
Text-Driven Weakly Supervised OCT Lesion Segmentation with Structural Guidance
by: Yang, Jiaqi, et al.
Published: (2024)
by: Yang, Jiaqi, et al.
Published: (2024)
FedSCAl: Leveraging Server and Client Alignment for Unsupervised Federated Source-Free Domain Adaptation
by: Yashwanth, M, et al.
Published: (2025)
by: Yashwanth, M, et al.
Published: (2025)
USAM-Net: A U-Net-based Network for Improved Stereo Correspondence and Scene Depth Estimation using Features from a Pre-trained Image Segmentation network
by: Dayo, Joseph Emmanuel DL, et al.
Published: (2025)
by: Dayo, Joseph Emmanuel DL, et al.
Published: (2025)
HandDreamer: Zero-Shot Text to 3D Hand Model Generation using Corrective Hand Shape Guidance
by: Rosh, Green, et al.
Published: (2026)
by: Rosh, Green, et al.
Published: (2026)
EAGLE: Expert-Augmented Attention Guidance for Tuning-Free Industrial Anomaly Detection in Multimodal Large Language Models
by: Peng, Xiaomeng, et al.
Published: (2026)
by: Peng, Xiaomeng, et al.
Published: (2026)
ALMRR: Anomaly Localization Mamba on Industrial Textured Surface with Feature Reconstruction and Refinement
by: Qu, Shichen, et al.
Published: (2024)
by: Qu, Shichen, et al.
Published: (2024)
A Method of Moments Embedding Constraint and its Application to Semi-Supervised Learning
by: Majurski, Michael, et al.
Published: (2024)
by: Majurski, Michael, et al.
Published: (2024)
Similar Items
-
A Multimodal Dataset for Enhancing Industrial Task Monitoring and Engagement Prediction
by: Mehta, Naval Kishore, et al.
Published: (2025) -
Gaze-Vector Estimation in the Dark with Temporally Encoded Event-driven Neural Networks
by: Banerjee, Abeer, et al.
Published: (2024) -
Dual Guidance Semi-Supervised Action Detection
by: Singh, Ankit, et al.
Published: (2025) -
fine-CLIP: Enhancing Zero-Shot Fine-Grained Surgical Action Recognition with Vision-Language Models
by: Sharma, Saurav, et al.
Published: (2025) -
Uncertainty-Guided Appearance-Motion Association Network for Out-of-Distribution Action Detection
by: Fang, Xiang, et al.
Published: (2024)