Saved in:
| Main Authors: | Fadaei, Amir Hosein, Dehaqani, Mohammad-Reza A. |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.07277 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Beyond still images: Temporal features and input variance resilience
by: Fadaei, Amir Hosein, et al.
Published: (2023)
by: Fadaei, Amir Hosein, et al.
Published: (2023)
SpikeReg: Energy-Efficient 3D Deformable Medical Image Registration with Spiking Neural Networks
by: Barzili, Ali Mikaeili, et al.
Published: (2026)
by: Barzili, Ali Mikaeili, et al.
Published: (2026)
Wise-SrNet: A Novel Architecture for Enhancing Image Classification by Learning Spatial Resolution of Feature Maps
by: Rahimzadeh, Mohammad, et al.
Published: (2021)
by: Rahimzadeh, Mohammad, et al.
Published: (2021)
Understanding Counting Mechanisms in Large Language and Vision-Language Models
by: Hasani, Hosein, et al.
Published: (2025)
by: Hasani, Hosein, et al.
Published: (2025)
Spatiotemporal Learning with Context-aware Video Tubelets for Ultrasound Video Analysis
by: Li, Gary Y., et al.
Published: (2025)
by: Li, Gary Y., et al.
Published: (2025)
Improving 3D Few-Shot Segmentation with Inference-Time Pseudo-Labeling
by: Mozafari, Mohammad, et al.
Published: (2024)
by: Mozafari, Mohammad, et al.
Published: (2024)
ST-SimDiff: Balancing Spatiotemporal Similarity and Difference for Efficient Video Understanding with MLLMs
by: Luo, Bingjun, et al.
Published: (2026)
by: Luo, Bingjun, et al.
Published: (2026)
Physics Context Builders: A Modular Framework for Physical Reasoning in Vision-Language Models
by: Balazadeh, Vahid, et al.
Published: (2024)
by: Balazadeh, Vahid, et al.
Published: (2024)
Using Deep Convolutional Neural Networks to Detect Rendered Glitches in Video Games
by: Ling, Carlos Garcia, et al.
Published: (2024)
by: Ling, Carlos Garcia, et al.
Published: (2024)
Towards Neuro-Symbolic Video Understanding
by: Choi, Minkyu, et al.
Published: (2024)
by: Choi, Minkyu, et al.
Published: (2024)
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis
by: Menapace, Willi, et al.
Published: (2024)
by: Menapace, Willi, et al.
Published: (2024)
High Resolution Flood Extent Detection Using Deep Learning with Random Forest Derived Training Labels
by: Nuriddinov, Azizbek, et al.
Published: (2026)
by: Nuriddinov, Azizbek, et al.
Published: (2026)
Memory-Efficient Continual Learning Object Segmentation for Long Video
by: Nazemi, Amir, et al.
Published: (2023)
by: Nazemi, Amir, et al.
Published: (2023)
A Survey: Spatiotemporal Consistency in Video Generation
by: Yin, Zhiyu, et al.
Published: (2025)
by: Yin, Zhiyu, et al.
Published: (2025)
VideoGameQA-Bench: Evaluating Vision-Language Models for Video Game Quality Assurance
by: Taesiri, Mohammad Reza, et al.
Published: (2025)
by: Taesiri, Mohammad Reza, et al.
Published: (2025)
Extracting Overlapping Microservices from Monolithic Code via Deep Semantic Embeddings and Graph Neural Network-Based Soft Clustering
by: Ziabakhsh, Morteza, et al.
Published: (2025)
by: Ziabakhsh, Morteza, et al.
Published: (2025)
Enhancing Few-Shot Image Classification through Learnable Multi-Scale Embedding and Attention Mechanisms
by: Askari, Fatemeh, et al.
Published: (2024)
by: Askari, Fatemeh, et al.
Published: (2024)
MSDNet: Multi-Scale Decoder for Few-Shot Semantic Segmentation via Transformer-Guided Prototyping
by: Fateh, Amirreza, et al.
Published: (2024)
by: Fateh, Amirreza, et al.
Published: (2024)
Video-CCAM: Enhancing Video-Language Understanding with Causal Cross-Attention Masks for Short and Long Videos
by: Fei, Jiajun, et al.
Published: (2024)
by: Fei, Jiajun, et al.
Published: (2024)
Brand Visibility in Packaging: A Deep Learning Approach for Logo Detection, Saliency-Map Prediction, and Logo Placement Analysis
by: Hosseini, Alireza, et al.
Published: (2024)
by: Hosseini, Alireza, et al.
Published: (2024)
IPFormer-VideoLLM: Enhancing Multi-modal Video Understanding for Multi-shot Scenes
by: Liang, Yujia, et al.
Published: (2025)
by: Liang, Yujia, et al.
Published: (2025)
Understanding Multimodal Deep Neural Networks: A Concept Selection View
by: Shang, Chenming, et al.
Published: (2024)
by: Shang, Chenming, et al.
Published: (2024)
Understanding Distributed Representations of Concepts in Deep Neural Networks without Supervision
by: Chang, Wonjoon, et al.
Published: (2023)
by: Chang, Wonjoon, et al.
Published: (2023)
Language-guided Recursive Spatiotemporal Graph Modeling for Video Summarization
by: Park, Jungin, et al.
Published: (2025)
by: Park, Jungin, et al.
Published: (2025)
Enhancing Long Video Understanding via Hierarchical Event-Based Memory
by: Cheng, Dingxin, et al.
Published: (2024)
by: Cheng, Dingxin, et al.
Published: (2024)
Graph-Attention Network with Adversarial Domain Alignment for Robust Cross-Domain Facial Expression Recognition
by: Ghaedi, Razieh, et al.
Published: (2025)
by: Ghaedi, Razieh, et al.
Published: (2025)
StabStitch++: Unsupervised Online Video Stitching with Spatiotemporal Bidirectional Warps
by: Nie, Lang, et al.
Published: (2025)
by: Nie, Lang, et al.
Published: (2025)
Learning Spatiotemporal Sensitivity in Video LLMs via Counterfactual Reinforcement Learning
by: Du, Dazhao, et al.
Published: (2026)
by: Du, Dazhao, et al.
Published: (2026)
Deep Video Discovery: Agentic Search with Tool Use for Long-form Video Understanding
by: Zhang, Xiaoyi, et al.
Published: (2025)
by: Zhang, Xiaoyi, et al.
Published: (2025)
Video Panels for Long Video Understanding
by: Doorenbos, Lars, et al.
Published: (2025)
by: Doorenbos, Lars, et al.
Published: (2025)
Understanding Generative AI Capabilities in Everyday Image Editing Tasks
by: Taesiri, Mohammad Reza, et al.
Published: (2025)
by: Taesiri, Mohammad Reza, et al.
Published: (2025)
Self-Supervised Learning for Endoscopic Video Analysis
by: Hirsch, Roy, et al.
Published: (2023)
by: Hirsch, Roy, et al.
Published: (2023)
Uncovering Grounding IDs: How External Cues Shape Multimodal Binding
by: Hasani, Hosein, et al.
Published: (2025)
by: Hasani, Hosein, et al.
Published: (2025)
WhisperNetV2: SlowFast Siamese Network For Lip-Based Biometrics
by: Zakeri, Abdollah, et al.
Published: (2024)
by: Zakeri, Abdollah, et al.
Published: (2024)
VIA: Unified Spatiotemporal Video Adaptation Framework for Global and Local Video Editing
by: Gu, Jing, et al.
Published: (2024)
by: Gu, Jing, et al.
Published: (2024)
Deep Neural Networks Fused with Textures for Image Classification
by: Bera, Asish, et al.
Published: (2023)
by: Bera, Asish, et al.
Published: (2023)
CaC: Advancing Video Reward Models via Hierarchical Spatiotemporal Concentrating
by: Wang, Jiyuan, et al.
Published: (2026)
by: Wang, Jiyuan, et al.
Published: (2026)
Personalized Video Summarization by Multimodal Video Understanding
by: Chen, Brian, et al.
Published: (2024)
by: Chen, Brian, et al.
Published: (2024)
Object-Shot Enhanced Grounding Network for Egocentric Video
by: Feng, Yisen, et al.
Published: (2025)
by: Feng, Yisen, et al.
Published: (2025)
Deep Learning-Driven Multimodal Detection and Movement Analysis of Objects in Culinary
by: Ishat, Tahoshin Alam, et al.
Published: (2025)
by: Ishat, Tahoshin Alam, et al.
Published: (2025)
Similar Items
-
Beyond still images: Temporal features and input variance resilience
by: Fadaei, Amir Hosein, et al.
Published: (2023) -
SpikeReg: Energy-Efficient 3D Deformable Medical Image Registration with Spiking Neural Networks
by: Barzili, Ali Mikaeili, et al.
Published: (2026) -
Wise-SrNet: A Novel Architecture for Enhancing Image Classification by Learning Spatial Resolution of Feature Maps
by: Rahimzadeh, Mohammad, et al.
Published: (2021) -
Understanding Counting Mechanisms in Large Language and Vision-Language Models
by: Hasani, Hosein, et al.
Published: (2025) -
Spatiotemporal Learning with Context-aware Video Tubelets for Ultrasound Video Analysis
by: Li, Gary Y., et al.
Published: (2025)