:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Luo, Yanan, Yi, Jinhui, Farha, Yazan Abu, Wolter, Moritz, Gall, Juergen
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2407.09431
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Gated Temporal Diffusion for Stochastic Long-Term Dense Anticipation
by: Zatsarynna, Olga, et al.
Published: (2024)

MANTA: Diffusion Mamba for Efficient and Effective Stochastic Long-Term Dense Anticipation
by: Zatsarynna, Olga, et al.
Published: (2025)

Looking into the Unknown: Exploring Action Discovery for Segmentation of Known and Unknown Actions
by: Spurio, Federico, et al.
Published: (2025)

MV-Match: Multi-View Matching for Domain-Adaptive Identification of Plant Nutrient Deficiencies
by: Yi, Jinhui, et al.
Published: (2024)

Video-Panda: Parameter-efficient Alignment for Encoder-free Video-Language Models
by: Yi, Jinhui, et al.
Published: (2024)

Fréchet Wavelet Distance: A Domain-Agnostic Metric for Image Generation
by: Veeramacheneni, Lokesh, et al.
Published: (2023)

Identifying Spatio-Temporal Drivers of Extreme Events
by: Eddin, Mohamad Hakam Shams, et al.
Published: (2024)

LC-SLab -- An Object-based Deep Learning Framework for Large-scale Land Cover Classification from Satellite Imagery and Sparse In-situ Labels
by: Leonhardt, Johannes, et al.
Published: (2025)

CamC2V: Context-aware Controllable Video Generation
by: Denninger, Luis, et al.
Published: (2025)

Video Panels for Long Video Understanding
by: Doorenbos, Lars, et al.
Published: (2025)

Using Visual Anomaly Detection for Task Execution Monitoring
by: Thoduka, Santosh, et al.
Published: (2021)

Learning a Neural Association Network for Self-supervised Multi-Object Tracking
by: Li, Shuai, et al.
Published: (2024)

ADA-Track++: End-to-End Multi-Camera 3D Multi-Object Tracking with Alternating Detection and Association
by: Ding, Shuxiao, et al.
Published: (2024)

Hierarchical Vector Quantization for Unsupervised Action Segmentation
by: Spurio, Federico, et al.
Published: (2024)

StableMamba: Distillation-free Scaling of Large SSMs for Images and Videos
by: Suleman, Hamid, et al.
Published: (2024)

Skeleton Motion Words for Unsupervised Skeleton-Based Temporal Action Segmentation
by: Gökay, Uzay, et al.
Published: (2025)

FlowNar: Scalable Streaming Narration for Long-Form Videos
by: Zhong, Zeyun, et al.
Published: (2026)

A Survey on Deep Learning Techniques for Action Anticipation
by: Zhong, Zeyun, et al.
Published: (2023)

Self-Intersection-Aware 3D Human Motion Generation Using an Efficient Human Sphere Proxy
by: Herrmann, Pascal, et al.
Published: (2026)

A Multimodal Handover Failure Detection Dataset and Baselines
by: Thoduka, Santosh, et al.
Published: (2024)

Enhancing Video-Based Robot Failure Detection Using Task Knowledge
by: Thoduka, Santosh, et al.
Published: (2025)

MixANT: Observation-dependent Memory Propagation for Stochastic Dense Action Anticipation
by: Wasim, Syed Talal, et al.
Published: (2025)

SyncVP: Joint Diffusion for Synchronous Multi-Modal Video Prediction
by: Pallotta, Enrico, et al.
Published: (2025)

Privacy-Preserving Semantic Segmentation from Ultra-Low-Resolution RGB Inputs
by: Huang, Xuying, et al.
Published: (2025)

Improving action segmentation via explicit similarity measurement
by: Aouaidjia, Kamel, et al.
Published: (2025)

REVEAL: Relation-based Video Representation Learning for Video-Question-Answering
by: Chaybouti, Sofian, et al.
Published: (2025)

GroupMamba: Efficient Group-Based Visual State Space Model
by: Shaker, Abdelrahman, et al.
Published: (2024)

STRIVE: Structured Spatiotemporal Exploration for Reinforcement Learning in Video Question Answering
by: Bahrami, Emad, et al.
Published: (2026)

Towards Generalizing Temporal Action Segmentation to Unseen Views
by: Bahrami, Emad, et al.
Published: (2025)

RiverMamba: A State Space Model for Global River Discharge and Flood Forecasting
by: Eddin, Mohamad Hakam Shams, et al.
Published: (2025)

Sequence-Adaptive Video Prediction in Continuous Streams using Diffusion Noise Optimization
by: Azar, Sina Mokhtarzadeh, et al.
Published: (2025)

EgoControl: Controllable Egocentric Video Generation via 3D Full-Body Poses
by: Pallotta, Enrico, et al.
Published: (2025)

TFNet: Exploiting Temporal Cues for Fast and Accurate LiDAR Semantic Segmentation
by: Li, Rong, et al.
Published: (2023)

Massively Multi-Person 3D Human Motion Forecasting with Scene Context
by: Mueller, Felix B, et al.
Published: (2024)

Global-Aware Monocular Semantic Scene Completion with State Space Models
by: Li, Shijie, et al.
Published: (2025)

TQD-Track: Temporal Query Denoising for 3D Multi-Object Tracking
by: Ding, Shuxiao, et al.
Published: (2025)

Forecast-PEFT: Parameter-Efficient Fine-Tuning for Pre-trained Motion Forecasting Models
by: Wang, Jifeng, et al.
Published: (2024)

BIKED++: A Multimodal Dataset of 1.4 Million Bicycle Image and Parametric CAD Designs
by: Regenwetter, Lyle, et al.
Published: (2024)

Spatio-temporal Decoupled Knowledge Compensator for Few-Shot Action Recognition
by: Qu, Hongyu, et al.
Published: (2026)

TadML: A fast temporal action detection with Mechanics-MLP
by: Deng, Bowen, et al.
Published: (2022)