Saved in:
| Main Authors: | Shaikh, Muhammad Bilal, Islam, Syed Mohammed Shamsul, Chai, Douglas, Akhtar, Naveed |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2405.15813 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Deep Learning Approaches for Human Action Recognition in Video Data
by: Xie, Yufei
Published: (2024)
by: Xie, Yufei
Published: (2024)
Next-Generation License Plate Detection and Recognition System using YOLOv8
by: Amin, Arslan, et al.
Published: (2025)
by: Amin, Arslan, et al.
Published: (2025)
Distinguishing Visually Similar Actions: Prompt-Guided Semantic Prototype Modulation for Few-Shot Action Recognition
by: Li, Xiaoyang, et al.
Published: (2025)
by: Li, Xiaoyang, et al.
Published: (2025)
SITransformer: Shared Information-Guided Transformer for Extreme Multimodal Summarization
by: Liu, Sicheng, et al.
Published: (2024)
by: Liu, Sicheng, et al.
Published: (2024)
Context-Aware Network Based on Multi-scale Spatio-temporal Attention for Action Recognition in Videos
by: Li, Xiaoyang, et al.
Published: (2025)
by: Li, Xiaoyang, et al.
Published: (2025)
Multi-Scale Spatial-Temporal Self-Attention Graph Convolutional Networks for Skeleton-based Action Recognition
by: Nakamura, Ikuo
Published: (2024)
by: Nakamura, Ikuo
Published: (2024)
Quantifying and Inducing Shape Bias in CNNs via Max-Pool Dilation
by: Sawada, Takito, et al.
Published: (2026)
by: Sawada, Takito, et al.
Published: (2026)
U-Net-Like Spiking Neural Networks for Single Image Dehazing
by: Li, Huibin, et al.
Published: (2025)
by: Li, Huibin, et al.
Published: (2025)
Motion-Guided Semantic Alignment with Negative Prompts for Zero-Shot Video Action Recognition
by: Wang, Yiming, et al.
Published: (2026)
by: Wang, Yiming, et al.
Published: (2026)
A Challenging Benchmark of Anime Style Recognition
by: Li, Haotang, et al.
Published: (2022)
by: Li, Haotang, et al.
Published: (2022)
Pointing-Based Object Recognition
by: Hajdúch, Lukáš, et al.
Published: (2026)
by: Hajdúch, Lukáš, et al.
Published: (2026)
TAG-Head: Time-Aligned Graph Head for Plug-and-Play Fine-grained Action Recognition
by: Hassan, Imtiaz Ul, et al.
Published: (2026)
by: Hassan, Imtiaz Ul, et al.
Published: (2026)
CLIP-Joint-Detect: End-to-End Joint Training of Object Detectors with Contrastive Vision-Language Supervision
by: Raoufi, Behnam, et al.
Published: (2025)
by: Raoufi, Behnam, et al.
Published: (2025)
Multimodal Action Quality Assessment
by: Zeng, Ling-An, et al.
Published: (2024)
by: Zeng, Ling-An, et al.
Published: (2024)
SemanticHuman-HD: High-Resolution Semantic Disentangled 3D Human Generation
by: Zheng, Peng, et al.
Published: (2024)
by: Zheng, Peng, et al.
Published: (2024)
Learning Discriminative Spatio-temporal Representations for Semi-supervised Action Recognition
by: Wang, Yu, et al.
Published: (2024)
by: Wang, Yu, et al.
Published: (2024)
Visual Enhanced Depth Scaling for Multimodal Latent Reasoning
by: Han, Yudong, et al.
Published: (2026)
by: Han, Yudong, et al.
Published: (2026)
Light Future: Multimodal Action Frame Prediction via InstructPix2Pix
by: Zhong, Zesen, et al.
Published: (2025)
by: Zhong, Zesen, et al.
Published: (2025)
Multi-modal Sensor Fusion for Auto Driving Perception: A Survey
by: Huang, Keli, et al.
Published: (2022)
by: Huang, Keli, et al.
Published: (2022)
SLUM-i: Semi-supervised Learning for Urban Mapping of Informal Settlements and Data Quality Benchmarking
by: Mukhtar, Muhammad Taha, et al.
Published: (2026)
by: Mukhtar, Muhammad Taha, et al.
Published: (2026)
An Evaluation of a Visual Question Answering Strategy for Zero-shot Facial Expression Recognition in Still Images
by: Castrillón-Santana, Modesto, et al.
Published: (2025)
by: Castrillón-Santana, Modesto, et al.
Published: (2025)
YotoR-You Only Transform One Representation
by: Villa, José Ignacio Díaz, et al.
Published: (2024)
by: Villa, José Ignacio Díaz, et al.
Published: (2024)
GeoHeight-Bench: Towards Height-Aware Multimodal Reasoning in Remote Sensing
by: Hu, Xuran, et al.
Published: (2026)
by: Hu, Xuran, et al.
Published: (2026)
Action Anticipation from SoccerNet Football Video Broadcasts
by: Dalal, Mohamad, et al.
Published: (2025)
by: Dalal, Mohamad, et al.
Published: (2025)
UTAL-GNN: Unsupervised Temporal Action Localization using Graph Neural Networks
by: Badatya, Bikash Kumar, et al.
Published: (2025)
by: Badatya, Bikash Kumar, et al.
Published: (2025)
Lost in Context: The Influence of Context on Feature Attribution Methods for Object Recognition
by: Adhikari, Sayanta, et al.
Published: (2024)
by: Adhikari, Sayanta, et al.
Published: (2024)
Joint Learning of Depth, Pose, and Local Radiance Field for Large Scale Monocular 3D Reconstruction
by: Syed, Shahram Najam, et al.
Published: (2025)
by: Syed, Shahram Najam, et al.
Published: (2025)
Towards Hard and Soft Shadow Removal via Dual-Branch Separation Network and Vision Transformer
by: Liang, Jiajia
Published: (2025)
by: Liang, Jiajia
Published: (2025)
Pedestrian Detection in Low-Light Conditions: A Comprehensive Survey
by: Ghari, Bahareh, et al.
Published: (2024)
by: Ghari, Bahareh, et al.
Published: (2024)
Towards a Generalizable Fusion Architecture for Multimodal Object Detection
by: Berjawi, Jad, et al.
Published: (2025)
by: Berjawi, Jad, et al.
Published: (2025)
Deep Learning-based Depth Estimation Methods from Monocular Image and Videos: A Comprehensive Survey
by: Rajapaksha, Uchitha, et al.
Published: (2024)
by: Rajapaksha, Uchitha, et al.
Published: (2024)
The Influence of Iconicity in Transfer Learning for Sign Language Recognition
by: Artiaga, Keren, et al.
Published: (2026)
by: Artiaga, Keren, et al.
Published: (2026)
WatchHAR: Real-time On-device Human Activity Recognition System for Smartwatches
by: Yeon, Taeyoung, et al.
Published: (2025)
by: Yeon, Taeyoung, et al.
Published: (2025)
Domain-Adaptive Pretraining Improves Primate Behavior Recognition
by: Mueller, Felix B., et al.
Published: (2025)
by: Mueller, Felix B., et al.
Published: (2025)
From Latent to Engine Manifolds: Analyzing ImageBind's Multimodal Embedding Space
by: Hamara, Andrew, et al.
Published: (2024)
by: Hamara, Andrew, et al.
Published: (2024)
A Recipe for Geometry-Aware 3D Mesh Transformers
by: Farazi, Mohammad, et al.
Published: (2024)
by: Farazi, Mohammad, et al.
Published: (2024)
A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO-NAS
by: Terven, Juan, et al.
Published: (2023)
by: Terven, Juan, et al.
Published: (2023)
EventFormer: A Node-graph Hierarchical Attention Transformer for Action-centric Video Event Prediction
by: Su, Qile, et al.
Published: (2025)
by: Su, Qile, et al.
Published: (2025)
Data Organization Matters in Multimodal Instruction Tuning: A Controlled Study of Capability Trade-offs
by: Tang, Guowei
Published: (2026)
by: Tang, Guowei
Published: (2026)
VIAFormer: Voxel-Image Alignment Transformer for High-Fidelity Voxel Refinement
by: Fang, Tiancheng, et al.
Published: (2026)
by: Fang, Tiancheng, et al.
Published: (2026)
Similar Items
-
Deep Learning Approaches for Human Action Recognition in Video Data
by: Xie, Yufei
Published: (2024) -
Next-Generation License Plate Detection and Recognition System using YOLOv8
by: Amin, Arslan, et al.
Published: (2025) -
Distinguishing Visually Similar Actions: Prompt-Guided Semantic Prototype Modulation for Few-Shot Action Recognition
by: Li, Xiaoyang, et al.
Published: (2025) -
SITransformer: Shared Information-Guided Transformer for Extreme Multimodal Summarization
by: Liu, Sicheng, et al.
Published: (2024) -
Context-Aware Network Based on Multi-scale Spatio-temporal Attention for Action Recognition in Videos
by: Li, Xiaoyang, et al.
Published: (2025)