:: Library Catalog

Imagen de Portada

Guardado en:

Detalles Bibliográficos
Autores principales:	Zhang, Xian, Wu, Zexi, Li, Zinuo, Xu, Hongming, Gong, Luqi, Boussaid, Farid, Werghi, Naoufel, Bennamoun, Mohammed
Formato:	Preprint
Publicado:	2025
Materias:	Computer Vision and Pattern Recognition
Acceso en línea:	https://arxiv.org/abs/2510.02778
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

Ejemplares similares

Watch and Listen: Understanding Audio-Visual-Speech Moments with Multimodal LLM
por: Li, Zinuo, et al.
Publicado: (2025)

Controllable Complex Human Motion Video Generation via Text-to-Skeleton Cascades
por: Taghipour, Ashkan, et al.
Publicado: (2026)

Generalized Closed-form Formulae for Feature-based Subpixel Alignment in Patch-based Matching
por: Jospin, Laurent Valentin, et al.
Publicado: (2021)

Faster Image2Video Generation: A Closer Look at CLIP Image Embedding's Impact on Spatio-Temporal Cross-Attentions
por: Taghipour, Ashkan, et al.
Publicado: (2024)

LatentMove: Towards Complex Human Movement Video Generation
por: Taghipour, Ashkan, et al.
Publicado: (2025)

3D Brain and Heart Volume Generative Models: A Survey
por: Liu, Yanbin, et al.
Publicado: (2022)

STEER: Structured Event Evidence for Video Reasoning via Multi-Objective Reinforcement Learning
por: Li, Zinuo, et al.
Publicado: (2026)

AquaticCLIP: A Vision-Language Foundation Model for Underwater Scene Analysis
por: Alawode, Basit, et al.
Publicado: (2025)

CPLIP: Zero-Shot Learning for Histopathology with Comprehensive Vision-Language Alignment
por: Javed, Sajid, et al.
Publicado: (2024)

SVR-GS: Spatially Variant Regularization for Probabilistic Masks in 3D Gaussian Splatting
por: Taghipour, Ashkan, et al.
Publicado: (2025)

Dynamic Neural Surfaces for Elastic 4D Shape Representation and Analysis
por: Nizamani, Awais, et al.
Publicado: (2025)

Hybrid Transformer-Mamba Architecture for Weakly Supervised Volumetric Medical Segmentation
por: Lyu, Yiheng, et al.
Publicado: (2025)

Auxiliary Tasks Enhanced Dual-affinity Learning for Weakly Supervised Semantic Segmentation
por: Xu, Lian, et al.
Publicado: (2024)

Adaptive Keyframe Sampling for Long Video Understanding
por: Tang, Xi, et al.
Publicado: (2025)

BENet: A Cross-domain Robust Network for Detecting Face Forgeries via Bias Expansion and Latent-space Attention
por: Liu, Weihua, et al.
Publicado: (2024)

Advancing Histopathology with Deep Learning Under Data Scarcity: A Decade in Review
por: Obeid, Ahmad, et al.
Publicado: (2024)

Box It to Bind It: Unified Layout Control and Attribute Binding in T2I Diffusion Models
por: Taghipour, Ashkan, et al.
Publicado: (2024)

Multi-Modal Attention Networks for Enhanced Segmentation and Depth Estimation of Subsurface Defects in Pulse Thermography
por: Salah, Mohammed, et al.
Publicado: (2025)

AdaFocus: Adaptive Relevance-Diversity Sampling with Zero-Cache Look-back for Efficient Long Video Understanding
por: Yang, Xiao, et al.
Publicado: (2026)

AdaFlow: Efficient Long Video Editing via Adaptive Attention Slimming And Keyframe Selection
por: Zhang, Shuheng, et al.
Publicado: (2025)

Multi-Resolution Pathology-Language Pre-training Model with Text-Guided Visual Representation
por: Albastaki, Shahad, et al.
Publicado: (2025)

DynaPURLS: Dynamic Refinement of Part-Aware Representations for Skeleton-Based Zero-Shot Action Recognition
por: Zhu, Jingmin, et al.
Publicado: (2025)

Fact or Fake? Assessing the Role of Deepfake Detectors in Multimodal Misinformation Detection
por: Sagar, A S M Sharifuzzaman, et al.
Publicado: (2026)

A Riemannian Approach for Spatiotemporal Analysis and Generation of 4D Tree-shaped Structures
por: Khanam, Tahmina, et al.
Publicado: (2024)

A Riemannian Framework for the Elastic Analysis of the Spatiotemporal Variability in the Shape and Structure of Tree-like 4D Objects
por: Khanam, Tahmina, et al.
Publicado: (2025)

SkeletonContext: Skeleton-side Context Prompt Learning for Zero-Shot Skeleton-based Action Recognition
por: Wang, Ning, et al.
Publicado: (2026)

Video Anomaly Detection in 10 Years: A Survey and Outlook
por: Abdalla, Moshira, et al.
Publicado: (2024)

UIFormer: A Unified Transformer-based Framework for Incremental Few-Shot Object Detection and Instance Segmentation
por: Zhang, Chengyuan, et al.
Publicado: (2024)

SPARROW: Learning Spatial Precision and Temporal Referential Consistency in Pixel-Grounded Video MLLMs
por: Alansari, Mohamad, et al.
Publicado: (2026)

RobMOT: Robust 3D Multi-Object Tracking by Observational Noise and State Estimation Drift Mitigation on LiDAR PointCloud
por: Nagy, Mohamed, et al.
Publicado: (2024)

Towards Accurate State Estimation: Kalman Filter Incorporating Motion Dynamics for 3D Multi-Object Tracking
por: Nagy, Mohamed, et al.
Publicado: (2025)

Admitting Ignorance Helps the Video Question Answering Models to Answer
por: Li, Haopeng, et al.
Publicado: (2025)

Rethinking Memory Design in SAM-Based Visual Object Tracking
por: Alansari, Mohamad, et al.
Publicado: (2025)

STING-BEE: Towards Vision-Language Model for Real-World X-ray Baggage Security Inspection
por: Velayudhan, Divya, et al.
Publicado: (2025)

Query-Conditioned Evidential Keyframe Sampling for MLLM-Based Long-Form Video Understanding
por: Wang, Yiheng, et al.
Publicado: (2026)

A Robust Adversary Detection-Deactivation Method for Metaverse-oriented Collaborative Deep Learning
por: Li, Pengfei, et al.
Publicado: (2023)

FOCUS: Efficient Keyframe Selection for Long Video Understanding
por: Zhu, Zirui, et al.
Publicado: (2025)

AdaSpark: Adaptive Sparsity for Efficient Long-Video Understanding
por: Li, Handong, et al.
Publicado: (2026)

DrawVideo: Generating Long Video from Storyboard Keyframe Sketches
por: Xu, Chuanzhi, et al.
Publicado: (2026)

Decomposing Queries into Tool Calls for Long-Video Keyframe Retrieval
por: Shlapentokh-Rothman, Michal, et al.
Publicado: (2026)