Saved in:
| Main Authors: | Benavent-Lledo, Manuel, Bacharidis, Konstantinos, Papoutsakis, Konstantinos, Argyros, Antonis, Garcia-Rodriguez, Jose |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.22039 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Action Anticipation at a Glimpse: To What Extent Can Multimodal Cues Replace Video?
by: Benavent-Lledo, Manuel, et al.
Published: (2025)
by: Benavent-Lledo, Manuel, et al.
Published: (2025)
Anticipating Object State Changes in Long Procedural Videos
by: Manousaki, Victoria, et al.
Published: (2024)
by: Manousaki, Victoria, et al.
Published: (2024)
Vision-Based Mistake Analysis in Procedural Activities: A Review of Advances and Challenges
by: Bacharidis, Konstantinos, et al.
Published: (2025)
by: Bacharidis, Konstantinos, et al.
Published: (2025)
Enhancing Action Recognition by Leveraging the Hierarchical Structure of Actions and Textual Context
by: Benavent-Lledo, Manuel, et al.
Published: (2024)
by: Benavent-Lledo, Manuel, et al.
Published: (2024)
Recognizing Unseen States of Unknown Objects by Leveraging Knowledge Graphs
by: Gouidis, Filipos, et al.
Published: (2023)
by: Gouidis, Filipos, et al.
Published: (2023)
Fusing Domain-Specific Content from Large Language Models into Knowledge Graphs for Enhanced Zero Shot Object State Classification
by: Gouidis, Filippos, et al.
Published: (2024)
by: Gouidis, Filippos, et al.
Published: (2024)
Text-driven Online Action Detection
by: Benavent-Lledo, Manuel, et al.
Published: (2025)
by: Benavent-Lledo, Manuel, et al.
Published: (2025)
AIFloodSense: A Global Aerial Imagery Dataset for Semantic Segmentation and Understanding of Flooded Environments
by: Simantiris, Georgios, et al.
Published: (2025)
by: Simantiris, Georgios, et al.
Published: (2025)
ENACT: Entropy-based Clustering of Attention Input for Reducing the Computational Needs of Object Detection Transformers
by: Savathrakis, Giorgos, et al.
Published: (2024)
by: Savathrakis, Giorgos, et al.
Published: (2024)
A vision-based framework for human behavior understanding in industrial assembly lines
by: Papoutsakis, Konstantinos, et al.
Published: (2024)
by: Papoutsakis, Konstantinos, et al.
Published: (2024)
OCCAM: Class-Agnostic, Training-Free, Prior-Free and Multi-Class Object Counting
by: Spanakis, Michail, et al.
Published: (2026)
by: Spanakis, Michail, et al.
Published: (2026)
D-PoSE: Depth as an Intermediate Representation for 3D Human Pose and Shape Estimation
by: Vasilikopoulos, Nikolaos, et al.
Published: (2024)
by: Vasilikopoulos, Nikolaos, et al.
Published: (2024)
Detecting Facial Image Manipulations with Multi-Layer CNN Models
by: Montejano, Alejandro Marco, et al.
Published: (2024)
by: Montejano, Alejandro Marco, et al.
Published: (2024)
Y-MAP-Net: Real-time depth, normals, segmentation, multi-label captioning and 2D human pose in RGB images
by: Qammaz, Ammar, et al.
Published: (2024)
by: Qammaz, Ammar, et al.
Published: (2024)
Multimodal Large Models Are Effective Action Anticipators
by: Wang, Binglu, et al.
Published: (2025)
by: Wang, Binglu, et al.
Published: (2025)
Combining Facial Videos and Biosignals for Stress Estimation During Driving
by: Valergaki, Paraskevi, et al.
Published: (2026)
by: Valergaki, Paraskevi, et al.
Published: (2026)
Enhancing Monocular 3D Hand Reconstruction with Learned Texture Priors
by: Karvounas, Giorgos, et al.
Published: (2025)
by: Karvounas, Giorgos, et al.
Published: (2025)
A Survey on Deep Learning Techniques for Action Anticipation
by: Zhong, Zeyun, et al.
Published: (2023)
by: Zhong, Zeyun, et al.
Published: (2023)
Visual WetlandBirds Dataset: Bird Species Identification and Behavior Recognition in Videos
by: Rodriguez-Juan, Javier, et al.
Published: (2025)
by: Rodriguez-Juan, Javier, et al.
Published: (2025)
Action-Guided Attention for Video Action Anticipation
by: Tai, Tsung-Ming, et al.
Published: (2026)
by: Tai, Tsung-Ming, et al.
Published: (2026)
Multi-task Learning For Joint Action and Gesture Recognition
by: Spathis, Konstantinos, et al.
Published: (2025)
by: Spathis, Konstantinos, et al.
Published: (2025)
A Single Image and Multimodality Is All You Need for Novel View Synthesis
by: Javadi, Amirhosein, et al.
Published: (2026)
by: Javadi, Amirhosein, et al.
Published: (2026)
Interaction Region Visual Transformer for Egocentric Action Anticipation
by: Roy, Debaditya, et al.
Published: (2022)
by: Roy, Debaditya, et al.
Published: (2022)
Human Action Anticipation: A Survey
by: Lai, Bolin, et al.
Published: (2024)
by: Lai, Bolin, et al.
Published: (2024)
Vision-Language-Action (VLA) Models: Concepts, Progress, Applications and Challenges
by: Sapkota, Ranjan, et al.
Published: (2025)
by: Sapkota, Ranjan, et al.
Published: (2025)
Chain-of-Frames: Advancing Video Understanding in Multimodal LLMs via Frame-Aware Reasoning
by: Ghazanfari, Sara, et al.
Published: (2025)
by: Ghazanfari, Sara, et al.
Published: (2025)
From Recognition to Prediction: Leveraging Sequence Reasoning for Action Anticipation
by: Liu, Xin, et al.
Published: (2024)
by: Liu, Xin, et al.
Published: (2024)
Intention Action Anticipation Model with Guide-Feedback Loop Mechanism
by: Ma, Zongnan, et al.
Published: (2024)
by: Ma, Zongnan, et al.
Published: (2024)
Bidirectional Action Sequence Learning for Long-term Action Anticipation with Large Language Models
by: Sato, Yuji, et al.
Published: (2025)
by: Sato, Yuji, et al.
Published: (2025)
Semantically Guided Action Anticipation
by: Diko, Anxhelo, et al.
Published: (2024)
by: Diko, Anxhelo, et al.
Published: (2024)
Complementarity-Supervised Spectral-Band Routing for Multimodal Emotion Recognition
by: Huang, Zhexian, et al.
Published: (2026)
by: Huang, Zhexian, et al.
Published: (2026)
Multi-level and Multi-modal Action Anticipation
by: Kim, Seulgi, et al.
Published: (2025)
by: Kim, Seulgi, et al.
Published: (2025)
LaViDa: A Large Diffusion Language Model for Multimodal Understanding
by: Li, Shufan, et al.
Published: (2025)
by: Li, Shufan, et al.
Published: (2025)
EEE-Bench: A Comprehensive Multimodal Electrical And Electronics Engineering Benchmark
by: Li, Ming, et al.
Published: (2024)
by: Li, Ming, et al.
Published: (2024)
Zero-Shot Generative De-identification: Inversion-Free Flow for Privacy-Preserving Skin Image Analysis
by: Moutselos, Konstantinos, et al.
Published: (2026)
by: Moutselos, Konstantinos, et al.
Published: (2026)
Zero-shot Segmentation of Skin Conditions: Erythema with Edit-Friendly Inversion
by: Moutselos, Konstantinos, et al.
Published: (2025)
by: Moutselos, Konstantinos, et al.
Published: (2025)
An Organism Starts with a Single Pix-Cell: A Neural Cellular Diffusion for High-Resolution Image Synthesis
by: Elbatel, Marawan, et al.
Published: (2024)
by: Elbatel, Marawan, et al.
Published: (2024)
MixANT: Observation-dependent Memory Propagation for Stochastic Dense Action Anticipation
by: Wasim, Syed Talal, et al.
Published: (2025)
by: Wasim, Syed Talal, et al.
Published: (2025)
See It Before You Grab It: Deep Learning-based Action Anticipation in Basketball
by: Roy, Arnau Barrera, et al.
Published: (2025)
by: Roy, Arnau Barrera, et al.
Published: (2025)
Vision and Intention Boost Large Language Model in Long-Term Action Anticipation
by: Cao, Congqi, et al.
Published: (2025)
by: Cao, Congqi, et al.
Published: (2025)
Similar Items
-
Action Anticipation at a Glimpse: To What Extent Can Multimodal Cues Replace Video?
by: Benavent-Lledo, Manuel, et al.
Published: (2025) -
Anticipating Object State Changes in Long Procedural Videos
by: Manousaki, Victoria, et al.
Published: (2024) -
Vision-Based Mistake Analysis in Procedural Activities: A Review of Advances and Challenges
by: Bacharidis, Konstantinos, et al.
Published: (2025) -
Enhancing Action Recognition by Leveraging the Hierarchical Structure of Actions and Textual Context
by: Benavent-Lledo, Manuel, et al.
Published: (2024) -
Recognizing Unseen States of Unknown Objects by Leveraging Knowledge Graphs
by: Gouidis, Filipos, et al.
Published: (2023)