:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Benavent-Lledo, Manuel, Bacharidis, Konstantinos, Papoutsakis, Konstantinos, Argyros, Antonis, Garcia-Rodriguez, Jose
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2601.22039
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Action Anticipation at a Glimpse: To What Extent Can Multimodal Cues Replace Video?
by: Benavent-Lledo, Manuel, et al.
Published: (2025)

Anticipating Object State Changes in Long Procedural Videos
by: Manousaki, Victoria, et al.
Published: (2024)

Vision-Based Mistake Analysis in Procedural Activities: A Review of Advances and Challenges
by: Bacharidis, Konstantinos, et al.
Published: (2025)

Enhancing Action Recognition by Leveraging the Hierarchical Structure of Actions and Textual Context
by: Benavent-Lledo, Manuel, et al.
Published: (2024)

Recognizing Unseen States of Unknown Objects by Leveraging Knowledge Graphs
by: Gouidis, Filipos, et al.
Published: (2023)

Fusing Domain-Specific Content from Large Language Models into Knowledge Graphs for Enhanced Zero Shot Object State Classification
by: Gouidis, Filippos, et al.
Published: (2024)

Text-driven Online Action Detection
by: Benavent-Lledo, Manuel, et al.
Published: (2025)

AIFloodSense: A Global Aerial Imagery Dataset for Semantic Segmentation and Understanding of Flooded Environments
by: Simantiris, Georgios, et al.
Published: (2025)

ENACT: Entropy-based Clustering of Attention Input for Reducing the Computational Needs of Object Detection Transformers
by: Savathrakis, Giorgos, et al.
Published: (2024)

A vision-based framework for human behavior understanding in industrial assembly lines
by: Papoutsakis, Konstantinos, et al.
Published: (2024)

OCCAM: Class-Agnostic, Training-Free, Prior-Free and Multi-Class Object Counting
by: Spanakis, Michail, et al.
Published: (2026)

D-PoSE: Depth as an Intermediate Representation for 3D Human Pose and Shape Estimation
by: Vasilikopoulos, Nikolaos, et al.
Published: (2024)

Detecting Facial Image Manipulations with Multi-Layer CNN Models
by: Montejano, Alejandro Marco, et al.
Published: (2024)

Y-MAP-Net: Real-time depth, normals, segmentation, multi-label captioning and 2D human pose in RGB images
by: Qammaz, Ammar, et al.
Published: (2024)

Multimodal Large Models Are Effective Action Anticipators
by: Wang, Binglu, et al.
Published: (2025)

Combining Facial Videos and Biosignals for Stress Estimation During Driving
by: Valergaki, Paraskevi, et al.
Published: (2026)

Enhancing Monocular 3D Hand Reconstruction with Learned Texture Priors
by: Karvounas, Giorgos, et al.
Published: (2025)

A Survey on Deep Learning Techniques for Action Anticipation
by: Zhong, Zeyun, et al.
Published: (2023)

Visual WetlandBirds Dataset: Bird Species Identification and Behavior Recognition in Videos
by: Rodriguez-Juan, Javier, et al.
Published: (2025)

Action-Guided Attention for Video Action Anticipation
by: Tai, Tsung-Ming, et al.
Published: (2026)

Multi-task Learning For Joint Action and Gesture Recognition
by: Spathis, Konstantinos, et al.
Published: (2025)

A Single Image and Multimodality Is All You Need for Novel View Synthesis
by: Javadi, Amirhosein, et al.
Published: (2026)

Interaction Region Visual Transformer for Egocentric Action Anticipation
by: Roy, Debaditya, et al.
Published: (2022)

Human Action Anticipation: A Survey
by: Lai, Bolin, et al.
Published: (2024)

Vision-Language-Action (VLA) Models: Concepts, Progress, Applications and Challenges
by: Sapkota, Ranjan, et al.
Published: (2025)

Chain-of-Frames: Advancing Video Understanding in Multimodal LLMs via Frame-Aware Reasoning
by: Ghazanfari, Sara, et al.
Published: (2025)

From Recognition to Prediction: Leveraging Sequence Reasoning for Action Anticipation
by: Liu, Xin, et al.
Published: (2024)

Intention Action Anticipation Model with Guide-Feedback Loop Mechanism
by: Ma, Zongnan, et al.
Published: (2024)

Bidirectional Action Sequence Learning for Long-term Action Anticipation with Large Language Models
by: Sato, Yuji, et al.
Published: (2025)

Semantically Guided Action Anticipation
by: Diko, Anxhelo, et al.
Published: (2024)

Complementarity-Supervised Spectral-Band Routing for Multimodal Emotion Recognition
by: Huang, Zhexian, et al.
Published: (2026)

Multi-level and Multi-modal Action Anticipation
by: Kim, Seulgi, et al.
Published: (2025)

LaViDa: A Large Diffusion Language Model for Multimodal Understanding
by: Li, Shufan, et al.
Published: (2025)

EEE-Bench: A Comprehensive Multimodal Electrical And Electronics Engineering Benchmark
by: Li, Ming, et al.
Published: (2024)

Zero-Shot Generative De-identification: Inversion-Free Flow for Privacy-Preserving Skin Image Analysis
by: Moutselos, Konstantinos, et al.
Published: (2026)

Zero-shot Segmentation of Skin Conditions: Erythema with Edit-Friendly Inversion
by: Moutselos, Konstantinos, et al.
Published: (2025)

An Organism Starts with a Single Pix-Cell: A Neural Cellular Diffusion for High-Resolution Image Synthesis
by: Elbatel, Marawan, et al.
Published: (2024)

MixANT: Observation-dependent Memory Propagation for Stochastic Dense Action Anticipation
by: Wasim, Syed Talal, et al.
Published: (2025)

See It Before You Grab It: Deep Learning-based Action Anticipation in Basketball
by: Roy, Arnau Barrera, et al.
Published: (2025)

Vision and Intention Boost Large Language Model in Long-Term Action Anticipation
by: Cao, Congqi, et al.
Published: (2025)