:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Fadaei, Amir Hosein, Dehaqani, Mohammad-Reza A.
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2502.07277
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Beyond still images: Temporal features and input variance resilience
by: Fadaei, Amir Hosein, et al.
Published: (2023)

SpikeReg: Energy-Efficient 3D Deformable Medical Image Registration with Spiking Neural Networks
by: Barzili, Ali Mikaeili, et al.
Published: (2026)

Wise-SrNet: A Novel Architecture for Enhancing Image Classification by Learning Spatial Resolution of Feature Maps
by: Rahimzadeh, Mohammad, et al.
Published: (2021)

Understanding Counting Mechanisms in Large Language and Vision-Language Models
by: Hasani, Hosein, et al.
Published: (2025)

Spatiotemporal Learning with Context-aware Video Tubelets for Ultrasound Video Analysis
by: Li, Gary Y., et al.
Published: (2025)

Improving 3D Few-Shot Segmentation with Inference-Time Pseudo-Labeling
by: Mozafari, Mohammad, et al.
Published: (2024)

ST-SimDiff: Balancing Spatiotemporal Similarity and Difference for Efficient Video Understanding with MLLMs
by: Luo, Bingjun, et al.
Published: (2026)

Physics Context Builders: A Modular Framework for Physical Reasoning in Vision-Language Models
by: Balazadeh, Vahid, et al.
Published: (2024)

Using Deep Convolutional Neural Networks to Detect Rendered Glitches in Video Games
by: Ling, Carlos Garcia, et al.
Published: (2024)

Towards Neuro-Symbolic Video Understanding
by: Choi, Minkyu, et al.
Published: (2024)

Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis
by: Menapace, Willi, et al.
Published: (2024)

High Resolution Flood Extent Detection Using Deep Learning with Random Forest Derived Training Labels
by: Nuriddinov, Azizbek, et al.
Published: (2026)

Memory-Efficient Continual Learning Object Segmentation for Long Video
by: Nazemi, Amir, et al.
Published: (2023)

A Survey: Spatiotemporal Consistency in Video Generation
by: Yin, Zhiyu, et al.
Published: (2025)

VideoGameQA-Bench: Evaluating Vision-Language Models for Video Game Quality Assurance
by: Taesiri, Mohammad Reza, et al.
Published: (2025)

Extracting Overlapping Microservices from Monolithic Code via Deep Semantic Embeddings and Graph Neural Network-Based Soft Clustering
by: Ziabakhsh, Morteza, et al.
Published: (2025)

Enhancing Few-Shot Image Classification through Learnable Multi-Scale Embedding and Attention Mechanisms
by: Askari, Fatemeh, et al.
Published: (2024)

MSDNet: Multi-Scale Decoder for Few-Shot Semantic Segmentation via Transformer-Guided Prototyping
by: Fateh, Amirreza, et al.
Published: (2024)

Video-CCAM: Enhancing Video-Language Understanding with Causal Cross-Attention Masks for Short and Long Videos
by: Fei, Jiajun, et al.
Published: (2024)

Brand Visibility in Packaging: A Deep Learning Approach for Logo Detection, Saliency-Map Prediction, and Logo Placement Analysis
by: Hosseini, Alireza, et al.
Published: (2024)

IPFormer-VideoLLM: Enhancing Multi-modal Video Understanding for Multi-shot Scenes
by: Liang, Yujia, et al.
Published: (2025)

Understanding Multimodal Deep Neural Networks: A Concept Selection View
by: Shang, Chenming, et al.
Published: (2024)

Understanding Distributed Representations of Concepts in Deep Neural Networks without Supervision
by: Chang, Wonjoon, et al.
Published: (2023)

Language-guided Recursive Spatiotemporal Graph Modeling for Video Summarization
by: Park, Jungin, et al.
Published: (2025)

Enhancing Long Video Understanding via Hierarchical Event-Based Memory
by: Cheng, Dingxin, et al.
Published: (2024)

Graph-Attention Network with Adversarial Domain Alignment for Robust Cross-Domain Facial Expression Recognition
by: Ghaedi, Razieh, et al.
Published: (2025)

StabStitch++: Unsupervised Online Video Stitching with Spatiotemporal Bidirectional Warps
by: Nie, Lang, et al.
Published: (2025)

Learning Spatiotemporal Sensitivity in Video LLMs via Counterfactual Reinforcement Learning
by: Du, Dazhao, et al.
Published: (2026)

Deep Video Discovery: Agentic Search with Tool Use for Long-form Video Understanding
by: Zhang, Xiaoyi, et al.
Published: (2025)

Video Panels for Long Video Understanding
by: Doorenbos, Lars, et al.
Published: (2025)

Understanding Generative AI Capabilities in Everyday Image Editing Tasks
by: Taesiri, Mohammad Reza, et al.
Published: (2025)

Self-Supervised Learning for Endoscopic Video Analysis
by: Hirsch, Roy, et al.
Published: (2023)

Uncovering Grounding IDs: How External Cues Shape Multimodal Binding
by: Hasani, Hosein, et al.
Published: (2025)

WhisperNetV2: SlowFast Siamese Network For Lip-Based Biometrics
by: Zakeri, Abdollah, et al.
Published: (2024)

VIA: Unified Spatiotemporal Video Adaptation Framework for Global and Local Video Editing
by: Gu, Jing, et al.
Published: (2024)

Deep Neural Networks Fused with Textures for Image Classification
by: Bera, Asish, et al.
Published: (2023)

CaC: Advancing Video Reward Models via Hierarchical Spatiotemporal Concentrating
by: Wang, Jiyuan, et al.
Published: (2026)

Personalized Video Summarization by Multimodal Video Understanding
by: Chen, Brian, et al.
Published: (2024)

Object-Shot Enhanced Grounding Network for Egocentric Video
by: Feng, Yisen, et al.
Published: (2025)

Deep Learning-Driven Multimodal Detection and Movement Analysis of Objects in Culinary
by: Ishat, Tahoshin Alam, et al.
Published: (2025)