Saved in:
| Main Authors: | John, Vijay, Kawanishi, Yasutomo |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2403.11616 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
View-aware Cross-modal Distillation for Multi-view Action Recognition
by: Nguyen, Trung Thanh, et al.
Published: (2025)
by: Nguyen, Trung Thanh, et al.
Published: (2025)
One-Stage Open-Vocabulary Temporal Action Detection Leveraging Temporal Multi-scale and Action Label Features
by: Nguyen, Trung Thanh, et al.
Published: (2024)
by: Nguyen, Trung Thanh, et al.
Published: (2024)
MultiTSF: Transformer-based Sensor Fusion for Human-Centric Multi-view and Multi-modal Action Recognition
by: Nguyen, Trung Thanh, et al.
Published: (2025)
by: Nguyen, Trung Thanh, et al.
Published: (2025)
MultiSensor-Home: A Wide-area Multi-modal Multi-view Dataset for Action Recognition and Transformer-based Sensor Fusion
by: Nguyen, Trung Thanh, et al.
Published: (2025)
by: Nguyen, Trung Thanh, et al.
Published: (2025)
Action Selection Learning for Multi-label Multi-view Action Recognition
by: Nguyen, Trung Thanh, et al.
Published: (2024)
by: Nguyen, Trung Thanh, et al.
Published: (2024)
Tracking Small Birds by Detection Candidate Region Filtering and Detection History-aware Association
by: Liu, Tingwei, et al.
Published: (2024)
by: Liu, Tingwei, et al.
Published: (2024)
FROSS: Faster-than-Real-Time Online 3D Semantic Scene Graph Generation from RGB-D Images
by: Hou, Hao-Yu, et al.
Published: (2025)
by: Hou, Hao-Yu, et al.
Published: (2025)
A Gaze-grounded Visual Question Answering Dataset for Clarifying Ambiguous Japanese Questions
by: Inadumi, Shun, et al.
Published: (2024)
by: Inadumi, Shun, et al.
Published: (2024)
Class-agnostic 3D Segmentation by Granularity-Consistent Automatic 2D Mask Tracking
by: Wang, Juan, et al.
Published: (2025)
by: Wang, Juan, et al.
Published: (2025)
Action Motifs: Self-Supervised Hierarchical Representation of Human Body Movements
by: Kinoshita, Genki, et al.
Published: (2026)
by: Kinoshita, Genki, et al.
Published: (2026)
REACH: Hand Pose Estimation from Room Corners
by: Nakamura, Shu, et al.
Published: (2026)
by: Nakamura, Shu, et al.
Published: (2026)
Small Object Detection for Birds with Swin Transformer
by: Huo, Da, et al.
Published: (2025)
by: Huo, Da, et al.
Published: (2025)
Leveraging Multi-View Weak Supervision for Occlusion-Aware Multi-Human Parsing
by: Bragagnolo, Laura, et al.
Published: (2025)
by: Bragagnolo, Laura, et al.
Published: (2025)
ForestMamba: Sparse Mamba with Geometry-guided Queries for 3D Forest Point Cloud Segmentation
by: Nguyen, Trung Thanh, et al.
Published: (2026)
by: Nguyen, Trung Thanh, et al.
Published: (2026)
Frame-Level Captions for Long Video Generation with Complex Multi Scenes
by: Zheng, Guangcong, et al.
Published: (2025)
by: Zheng, Guangcong, et al.
Published: (2025)
FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting
by: He, Zefeng, et al.
Published: (2025)
by: He, Zefeng, et al.
Published: (2025)
Perception-Oriented Video Frame Interpolation via Asymmetric Blending
by: Wu, Guangyang, et al.
Published: (2024)
by: Wu, Guangyang, et al.
Published: (2024)
Reliable Representation Learning for Incomplete Multi-View Missing Multi-Label Classification
by: Liu, Chengliang, et al.
Published: (2023)
by: Liu, Chengliang, et al.
Published: (2023)
Cross Pseudo Labeling For Weakly Supervised Video Anomaly Detection
by: Lee, Dayeon, et al.
Published: (2026)
by: Lee, Dayeon, et al.
Published: (2026)
Which Viewpoint Shows it Best? Language for Weakly Supervising View Selection in Multi-view Instructional Videos
by: Majumder, Sagnik, et al.
Published: (2024)
by: Majumder, Sagnik, et al.
Published: (2024)
Reinforced Label Denoising for Weakly-Supervised Audio-Visual Video Parsing
by: Gao, Yongbiao, et al.
Published: (2024)
by: Gao, Yongbiao, et al.
Published: (2024)
FaceBench: A Multi-View Multi-Level Facial Attribute VQA Dataset for Benchmarking Face Perception MLLMs
by: Wang, Xiaoqin, et al.
Published: (2025)
by: Wang, Xiaoqin, et al.
Published: (2025)
Multi-View Factorizing and Disentangling: A Novel Framework for Incomplete Multi-View Multi-Label Classification
by: Xie, Wulin, et al.
Published: (2025)
by: Xie, Wulin, et al.
Published: (2025)
Improving Multi-Label Contrastive Learning by Leveraging Label Distribution
by: Chen, Ning, et al.
Published: (2025)
by: Chen, Ning, et al.
Published: (2025)
Narrating the Video: Boosting Text-Video Retrieval via Comprehensive Utilization of Frame-Level Captions
by: Hur, Chan, et al.
Published: (2025)
by: Hur, Chan, et al.
Published: (2025)
A Unified Solution to Video Fusion: From Multi-Frame Learning to Benchmarking
by: Zhao, Zixiang, et al.
Published: (2025)
by: Zhao, Zixiang, et al.
Published: (2025)
Learned Rate Control for Frame-Level Adaptive Neural Video Compression via Dynamic Neural Network
by: Zhang, Chenhao, et al.
Published: (2025)
by: Zhang, Chenhao, et al.
Published: (2025)
Q-Frame: Query-aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs
by: Zhang, Shaojie, et al.
Published: (2025)
by: Zhang, Shaojie, et al.
Published: (2025)
Improving Weakly-supervised Video Instance Segmentation by Leveraging Spatio-temporal Consistency
by: Arefi, Farnoosh, et al.
Published: (2024)
by: Arefi, Farnoosh, et al.
Published: (2024)
Task-Augmented Cross-View Imputation Network for Partial Multi-View Incomplete Multi-Label Classification
by: Zhao, Lian, et al.
Published: (2024)
by: Zhao, Lian, et al.
Published: (2024)
Frame Guidance: Training-Free Guidance for Frame-Level Control in Video Diffusion Models
by: Jang, Sangwon, et al.
Published: (2025)
by: Jang, Sangwon, et al.
Published: (2025)
Frame-Voyager: Learning to Query Frames for Video Large Language Models
by: Yu, Sicheng, et al.
Published: (2024)
by: Yu, Sicheng, et al.
Published: (2024)
Multi-View Pose-Agnostic Change Localization with Zero Labels
by: Galappaththige, Chamuditha Jayanga, et al.
Published: (2024)
by: Galappaththige, Chamuditha Jayanga, et al.
Published: (2024)
Information Maximization Clustering via Multi-View Self-Labelling
by: Ntelemis, Foivos, et al.
Published: (2021)
by: Ntelemis, Foivos, et al.
Published: (2021)
Adaptive Disentangled Representation Learning for Incomplete Multi-View Multi-Label Classification
by: Li, Quanjiang, et al.
Published: (2026)
by: Li, Quanjiang, et al.
Published: (2026)
Emerging Trends in Pseudo-Label Refinement for Weakly Supervised Semantic Segmentation with Image-Level Supervision
by: Zhang, Zheyuan, et al.
Published: (2025)
by: Zhang, Zheyuan, et al.
Published: (2025)
Weakly-Supervised Semantic Segmentation with Image-Level Labels: from Traditional Models to Foundation Models
by: Chen, Zhaozheng, et al.
Published: (2023)
by: Chen, Zhaozheng, et al.
Published: (2023)
Hybrid-Learning Video Moment Retrieval across Multi-Domain Labels
by: Cai, Weitong, et al.
Published: (2024)
by: Cai, Weitong, et al.
Published: (2024)
Leveraging Transformers for Weakly Supervised Object Localization in Unconstrained Videos
by: Murtaza, Shakeeb, et al.
Published: (2024)
by: Murtaza, Shakeeb, et al.
Published: (2024)
Leveraging Vision-Language Models as Weak Annotators in Active Learning
by: Nguyen, Phuong Ngoc, et al.
Published: (2026)
by: Nguyen, Phuong Ngoc, et al.
Published: (2026)
Similar Items
-
View-aware Cross-modal Distillation for Multi-view Action Recognition
by: Nguyen, Trung Thanh, et al.
Published: (2025) -
One-Stage Open-Vocabulary Temporal Action Detection Leveraging Temporal Multi-scale and Action Label Features
by: Nguyen, Trung Thanh, et al.
Published: (2024) -
MultiTSF: Transformer-based Sensor Fusion for Human-Centric Multi-view and Multi-modal Action Recognition
by: Nguyen, Trung Thanh, et al.
Published: (2025) -
MultiSensor-Home: A Wide-area Multi-modal Multi-view Dataset for Action Recognition and Transformer-based Sensor Fusion
by: Nguyen, Trung Thanh, et al.
Published: (2025) -
Action Selection Learning for Multi-label Multi-view Action Recognition
by: Nguyen, Trung Thanh, et al.
Published: (2024)