:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Shaikh, Muhammad Bilal, Islam, Syed Mohammed Shamsul, Chai, Douglas, Akhtar, Naveed
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition A.1; I.2.10
Online Access:	https://arxiv.org/abs/2405.15813
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Deep Learning Approaches for Human Action Recognition in Video Data
by: Xie, Yufei
Published: (2024)

Next-Generation License Plate Detection and Recognition System using YOLOv8
by: Amin, Arslan, et al.
Published: (2025)

Distinguishing Visually Similar Actions: Prompt-Guided Semantic Prototype Modulation for Few-Shot Action Recognition
by: Li, Xiaoyang, et al.
Published: (2025)

SITransformer: Shared Information-Guided Transformer for Extreme Multimodal Summarization
by: Liu, Sicheng, et al.
Published: (2024)

Context-Aware Network Based on Multi-scale Spatio-temporal Attention for Action Recognition in Videos
by: Li, Xiaoyang, et al.
Published: (2025)

Multi-Scale Spatial-Temporal Self-Attention Graph Convolutional Networks for Skeleton-based Action Recognition
by: Nakamura, Ikuo
Published: (2024)

Quantifying and Inducing Shape Bias in CNNs via Max-Pool Dilation
by: Sawada, Takito, et al.
Published: (2026)

U-Net-Like Spiking Neural Networks for Single Image Dehazing
by: Li, Huibin, et al.
Published: (2025)

Motion-Guided Semantic Alignment with Negative Prompts for Zero-Shot Video Action Recognition
by: Wang, Yiming, et al.
Published: (2026)

A Challenging Benchmark of Anime Style Recognition
by: Li, Haotang, et al.
Published: (2022)

Pointing-Based Object Recognition
by: Hajdúch, Lukáš, et al.
Published: (2026)

TAG-Head: Time-Aligned Graph Head for Plug-and-Play Fine-grained Action Recognition
by: Hassan, Imtiaz Ul, et al.
Published: (2026)

CLIP-Joint-Detect: End-to-End Joint Training of Object Detectors with Contrastive Vision-Language Supervision
by: Raoufi, Behnam, et al.
Published: (2025)

Multimodal Action Quality Assessment
by: Zeng, Ling-An, et al.
Published: (2024)

SemanticHuman-HD: High-Resolution Semantic Disentangled 3D Human Generation
by: Zheng, Peng, et al.
Published: (2024)

Learning Discriminative Spatio-temporal Representations for Semi-supervised Action Recognition
by: Wang, Yu, et al.
Published: (2024)

Visual Enhanced Depth Scaling for Multimodal Latent Reasoning
by: Han, Yudong, et al.
Published: (2026)

Light Future: Multimodal Action Frame Prediction via InstructPix2Pix
by: Zhong, Zesen, et al.
Published: (2025)

Multi-modal Sensor Fusion for Auto Driving Perception: A Survey
by: Huang, Keli, et al.
Published: (2022)

SLUM-i: Semi-supervised Learning for Urban Mapping of Informal Settlements and Data Quality Benchmarking
by: Mukhtar, Muhammad Taha, et al.
Published: (2026)

An Evaluation of a Visual Question Answering Strategy for Zero-shot Facial Expression Recognition in Still Images
by: Castrillón-Santana, Modesto, et al.
Published: (2025)

YotoR-You Only Transform One Representation
by: Villa, José Ignacio Díaz, et al.
Published: (2024)

GeoHeight-Bench: Towards Height-Aware Multimodal Reasoning in Remote Sensing
by: Hu, Xuran, et al.
Published: (2026)

Action Anticipation from SoccerNet Football Video Broadcasts
by: Dalal, Mohamad, et al.
Published: (2025)

UTAL-GNN: Unsupervised Temporal Action Localization using Graph Neural Networks
by: Badatya, Bikash Kumar, et al.
Published: (2025)

Lost in Context: The Influence of Context on Feature Attribution Methods for Object Recognition
by: Adhikari, Sayanta, et al.
Published: (2024)

Joint Learning of Depth, Pose, and Local Radiance Field for Large Scale Monocular 3D Reconstruction
by: Syed, Shahram Najam, et al.
Published: (2025)

Towards Hard and Soft Shadow Removal via Dual-Branch Separation Network and Vision Transformer
by: Liang, Jiajia
Published: (2025)

Pedestrian Detection in Low-Light Conditions: A Comprehensive Survey
by: Ghari, Bahareh, et al.
Published: (2024)

Towards a Generalizable Fusion Architecture for Multimodal Object Detection
by: Berjawi, Jad, et al.
Published: (2025)

Deep Learning-based Depth Estimation Methods from Monocular Image and Videos: A Comprehensive Survey
by: Rajapaksha, Uchitha, et al.
Published: (2024)

The Influence of Iconicity in Transfer Learning for Sign Language Recognition
by: Artiaga, Keren, et al.
Published: (2026)

WatchHAR: Real-time On-device Human Activity Recognition System for Smartwatches
by: Yeon, Taeyoung, et al.
Published: (2025)

Domain-Adaptive Pretraining Improves Primate Behavior Recognition
by: Mueller, Felix B., et al.
Published: (2025)

From Latent to Engine Manifolds: Analyzing ImageBind's Multimodal Embedding Space
by: Hamara, Andrew, et al.
Published: (2024)

A Recipe for Geometry-Aware 3D Mesh Transformers
by: Farazi, Mohammad, et al.
Published: (2024)

A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO-NAS
by: Terven, Juan, et al.
Published: (2023)

EventFormer: A Node-graph Hierarchical Attention Transformer for Action-centric Video Event Prediction
by: Su, Qile, et al.
Published: (2025)

Data Organization Matters in Multimodal Instruction Tuning: A Controlled Study of Capability Trade-offs
by: Tang, Guowei
Published: (2026)

VIAFormer: Voxel-Image Alignment Transformer for High-Fidelity Voxel Refinement
by: Fang, Tiancheng, et al.
Published: (2026)