:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	John, Vijay, Kawanishi, Yasutomo
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2403.11616
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

View-aware Cross-modal Distillation for Multi-view Action Recognition
by: Nguyen, Trung Thanh, et al.
Published: (2025)

One-Stage Open-Vocabulary Temporal Action Detection Leveraging Temporal Multi-scale and Action Label Features
by: Nguyen, Trung Thanh, et al.
Published: (2024)

MultiTSF: Transformer-based Sensor Fusion for Human-Centric Multi-view and Multi-modal Action Recognition
by: Nguyen, Trung Thanh, et al.
Published: (2025)

MultiSensor-Home: A Wide-area Multi-modal Multi-view Dataset for Action Recognition and Transformer-based Sensor Fusion
by: Nguyen, Trung Thanh, et al.
Published: (2025)

Action Selection Learning for Multi-label Multi-view Action Recognition
by: Nguyen, Trung Thanh, et al.
Published: (2024)

Tracking Small Birds by Detection Candidate Region Filtering and Detection History-aware Association
by: Liu, Tingwei, et al.
Published: (2024)

FROSS: Faster-than-Real-Time Online 3D Semantic Scene Graph Generation from RGB-D Images
by: Hou, Hao-Yu, et al.
Published: (2025)

A Gaze-grounded Visual Question Answering Dataset for Clarifying Ambiguous Japanese Questions
by: Inadumi, Shun, et al.
Published: (2024)

Class-agnostic 3D Segmentation by Granularity-Consistent Automatic 2D Mask Tracking
by: Wang, Juan, et al.
Published: (2025)

Action Motifs: Self-Supervised Hierarchical Representation of Human Body Movements
by: Kinoshita, Genki, et al.
Published: (2026)

REACH: Hand Pose Estimation from Room Corners
by: Nakamura, Shu, et al.
Published: (2026)

Small Object Detection for Birds with Swin Transformer
by: Huo, Da, et al.
Published: (2025)

Leveraging Multi-View Weak Supervision for Occlusion-Aware Multi-Human Parsing
by: Bragagnolo, Laura, et al.
Published: (2025)

ForestMamba: Sparse Mamba with Geometry-guided Queries for 3D Forest Point Cloud Segmentation
by: Nguyen, Trung Thanh, et al.
Published: (2026)

Frame-Level Captions for Long Video Generation with Complex Multi Scenes
by: Zheng, Guangcong, et al.
Published: (2025)

FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting
by: He, Zefeng, et al.
Published: (2025)

Perception-Oriented Video Frame Interpolation via Asymmetric Blending
by: Wu, Guangyang, et al.
Published: (2024)

Reliable Representation Learning for Incomplete Multi-View Missing Multi-Label Classification
by: Liu, Chengliang, et al.
Published: (2023)

Cross Pseudo Labeling For Weakly Supervised Video Anomaly Detection
by: Lee, Dayeon, et al.
Published: (2026)

Which Viewpoint Shows it Best? Language for Weakly Supervising View Selection in Multi-view Instructional Videos
by: Majumder, Sagnik, et al.
Published: (2024)

Reinforced Label Denoising for Weakly-Supervised Audio-Visual Video Parsing
by: Gao, Yongbiao, et al.
Published: (2024)

FaceBench: A Multi-View Multi-Level Facial Attribute VQA Dataset for Benchmarking Face Perception MLLMs
by: Wang, Xiaoqin, et al.
Published: (2025)

Multi-View Factorizing and Disentangling: A Novel Framework for Incomplete Multi-View Multi-Label Classification
by: Xie, Wulin, et al.
Published: (2025)

Improving Multi-Label Contrastive Learning by Leveraging Label Distribution
by: Chen, Ning, et al.
Published: (2025)

Narrating the Video: Boosting Text-Video Retrieval via Comprehensive Utilization of Frame-Level Captions
by: Hur, Chan, et al.
Published: (2025)

A Unified Solution to Video Fusion: From Multi-Frame Learning to Benchmarking
by: Zhao, Zixiang, et al.
Published: (2025)

Learned Rate Control for Frame-Level Adaptive Neural Video Compression via Dynamic Neural Network
by: Zhang, Chenhao, et al.
Published: (2025)

Q-Frame: Query-aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs
by: Zhang, Shaojie, et al.
Published: (2025)

Improving Weakly-supervised Video Instance Segmentation by Leveraging Spatio-temporal Consistency
by: Arefi, Farnoosh, et al.
Published: (2024)

Task-Augmented Cross-View Imputation Network for Partial Multi-View Incomplete Multi-Label Classification
by: Zhao, Lian, et al.
Published: (2024)

Frame Guidance: Training-Free Guidance for Frame-Level Control in Video Diffusion Models
by: Jang, Sangwon, et al.
Published: (2025)

Frame-Voyager: Learning to Query Frames for Video Large Language Models
by: Yu, Sicheng, et al.
Published: (2024)

Multi-View Pose-Agnostic Change Localization with Zero Labels
by: Galappaththige, Chamuditha Jayanga, et al.
Published: (2024)

Information Maximization Clustering via Multi-View Self-Labelling
by: Ntelemis, Foivos, et al.
Published: (2021)

Adaptive Disentangled Representation Learning for Incomplete Multi-View Multi-Label Classification
by: Li, Quanjiang, et al.
Published: (2026)

Emerging Trends in Pseudo-Label Refinement for Weakly Supervised Semantic Segmentation with Image-Level Supervision
by: Zhang, Zheyuan, et al.
Published: (2025)

Weakly-Supervised Semantic Segmentation with Image-Level Labels: from Traditional Models to Foundation Models
by: Chen, Zhaozheng, et al.
Published: (2023)

Hybrid-Learning Video Moment Retrieval across Multi-Domain Labels
by: Cai, Weitong, et al.
Published: (2024)

Leveraging Transformers for Weakly Supervised Object Localization in Unconstrained Videos
by: Murtaza, Shakeeb, et al.
Published: (2024)

Leveraging Vision-Language Models as Weak Annotators in Active Learning
by: Nguyen, Phuong Ngoc, et al.
Published: (2026)