Guardado en:
| Autores principales: | Zhang, Xian, Wu, Zexi, Li, Zinuo, Xu, Hongming, Gong, Luqi, Boussaid, Farid, Werghi, Naoufel, Bennamoun, Mohammed |
|---|---|
| Formato: | Preprint |
| Publicado: |
2025
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2510.02778 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
Watch and Listen: Understanding Audio-Visual-Speech Moments with Multimodal LLM
por: Li, Zinuo, et al.
Publicado: (2025)
por: Li, Zinuo, et al.
Publicado: (2025)
Controllable Complex Human Motion Video Generation via Text-to-Skeleton Cascades
por: Taghipour, Ashkan, et al.
Publicado: (2026)
por: Taghipour, Ashkan, et al.
Publicado: (2026)
Generalized Closed-form Formulae for Feature-based Subpixel Alignment in Patch-based Matching
por: Jospin, Laurent Valentin, et al.
Publicado: (2021)
por: Jospin, Laurent Valentin, et al.
Publicado: (2021)
Faster Image2Video Generation: A Closer Look at CLIP Image Embedding's Impact on Spatio-Temporal Cross-Attentions
por: Taghipour, Ashkan, et al.
Publicado: (2024)
por: Taghipour, Ashkan, et al.
Publicado: (2024)
LatentMove: Towards Complex Human Movement Video Generation
por: Taghipour, Ashkan, et al.
Publicado: (2025)
por: Taghipour, Ashkan, et al.
Publicado: (2025)
3D Brain and Heart Volume Generative Models: A Survey
por: Liu, Yanbin, et al.
Publicado: (2022)
por: Liu, Yanbin, et al.
Publicado: (2022)
STEER: Structured Event Evidence for Video Reasoning via Multi-Objective Reinforcement Learning
por: Li, Zinuo, et al.
Publicado: (2026)
por: Li, Zinuo, et al.
Publicado: (2026)
AquaticCLIP: A Vision-Language Foundation Model for Underwater Scene Analysis
por: Alawode, Basit, et al.
Publicado: (2025)
por: Alawode, Basit, et al.
Publicado: (2025)
CPLIP: Zero-Shot Learning for Histopathology with Comprehensive Vision-Language Alignment
por: Javed, Sajid, et al.
Publicado: (2024)
por: Javed, Sajid, et al.
Publicado: (2024)
SVR-GS: Spatially Variant Regularization for Probabilistic Masks in 3D Gaussian Splatting
por: Taghipour, Ashkan, et al.
Publicado: (2025)
por: Taghipour, Ashkan, et al.
Publicado: (2025)
Dynamic Neural Surfaces for Elastic 4D Shape Representation and Analysis
por: Nizamani, Awais, et al.
Publicado: (2025)
por: Nizamani, Awais, et al.
Publicado: (2025)
Hybrid Transformer-Mamba Architecture for Weakly Supervised Volumetric Medical Segmentation
por: Lyu, Yiheng, et al.
Publicado: (2025)
por: Lyu, Yiheng, et al.
Publicado: (2025)
Auxiliary Tasks Enhanced Dual-affinity Learning for Weakly Supervised Semantic Segmentation
por: Xu, Lian, et al.
Publicado: (2024)
por: Xu, Lian, et al.
Publicado: (2024)
Adaptive Keyframe Sampling for Long Video Understanding
por: Tang, Xi, et al.
Publicado: (2025)
por: Tang, Xi, et al.
Publicado: (2025)
BENet: A Cross-domain Robust Network for Detecting Face Forgeries via Bias Expansion and Latent-space Attention
por: Liu, Weihua, et al.
Publicado: (2024)
por: Liu, Weihua, et al.
Publicado: (2024)
Advancing Histopathology with Deep Learning Under Data Scarcity: A Decade in Review
por: Obeid, Ahmad, et al.
Publicado: (2024)
por: Obeid, Ahmad, et al.
Publicado: (2024)
Box It to Bind It: Unified Layout Control and Attribute Binding in T2I Diffusion Models
por: Taghipour, Ashkan, et al.
Publicado: (2024)
por: Taghipour, Ashkan, et al.
Publicado: (2024)
Multi-Modal Attention Networks for Enhanced Segmentation and Depth Estimation of Subsurface Defects in Pulse Thermography
por: Salah, Mohammed, et al.
Publicado: (2025)
por: Salah, Mohammed, et al.
Publicado: (2025)
AdaFocus: Adaptive Relevance-Diversity Sampling with Zero-Cache Look-back for Efficient Long Video Understanding
por: Yang, Xiao, et al.
Publicado: (2026)
por: Yang, Xiao, et al.
Publicado: (2026)
AdaFlow: Efficient Long Video Editing via Adaptive Attention Slimming And Keyframe Selection
por: Zhang, Shuheng, et al.
Publicado: (2025)
por: Zhang, Shuheng, et al.
Publicado: (2025)
Multi-Resolution Pathology-Language Pre-training Model with Text-Guided Visual Representation
por: Albastaki, Shahad, et al.
Publicado: (2025)
por: Albastaki, Shahad, et al.
Publicado: (2025)
DynaPURLS: Dynamic Refinement of Part-Aware Representations for Skeleton-Based Zero-Shot Action Recognition
por: Zhu, Jingmin, et al.
Publicado: (2025)
por: Zhu, Jingmin, et al.
Publicado: (2025)
Fact or Fake? Assessing the Role of Deepfake Detectors in Multimodal Misinformation Detection
por: Sagar, A S M Sharifuzzaman, et al.
Publicado: (2026)
por: Sagar, A S M Sharifuzzaman, et al.
Publicado: (2026)
A Riemannian Approach for Spatiotemporal Analysis and Generation of 4D Tree-shaped Structures
por: Khanam, Tahmina, et al.
Publicado: (2024)
por: Khanam, Tahmina, et al.
Publicado: (2024)
A Riemannian Framework for the Elastic Analysis of the Spatiotemporal Variability in the Shape and Structure of Tree-like 4D Objects
por: Khanam, Tahmina, et al.
Publicado: (2025)
por: Khanam, Tahmina, et al.
Publicado: (2025)
SkeletonContext: Skeleton-side Context Prompt Learning for Zero-Shot Skeleton-based Action Recognition
por: Wang, Ning, et al.
Publicado: (2026)
por: Wang, Ning, et al.
Publicado: (2026)
Video Anomaly Detection in 10 Years: A Survey and Outlook
por: Abdalla, Moshira, et al.
Publicado: (2024)
por: Abdalla, Moshira, et al.
Publicado: (2024)
UIFormer: A Unified Transformer-based Framework for Incremental Few-Shot Object Detection and Instance Segmentation
por: Zhang, Chengyuan, et al.
Publicado: (2024)
por: Zhang, Chengyuan, et al.
Publicado: (2024)
SPARROW: Learning Spatial Precision and Temporal Referential Consistency in Pixel-Grounded Video MLLMs
por: Alansari, Mohamad, et al.
Publicado: (2026)
por: Alansari, Mohamad, et al.
Publicado: (2026)
RobMOT: Robust 3D Multi-Object Tracking by Observational Noise and State Estimation Drift Mitigation on LiDAR PointCloud
por: Nagy, Mohamed, et al.
Publicado: (2024)
por: Nagy, Mohamed, et al.
Publicado: (2024)
Towards Accurate State Estimation: Kalman Filter Incorporating Motion Dynamics for 3D Multi-Object Tracking
por: Nagy, Mohamed, et al.
Publicado: (2025)
por: Nagy, Mohamed, et al.
Publicado: (2025)
Admitting Ignorance Helps the Video Question Answering Models to Answer
por: Li, Haopeng, et al.
Publicado: (2025)
por: Li, Haopeng, et al.
Publicado: (2025)
Rethinking Memory Design in SAM-Based Visual Object Tracking
por: Alansari, Mohamad, et al.
Publicado: (2025)
por: Alansari, Mohamad, et al.
Publicado: (2025)
STING-BEE: Towards Vision-Language Model for Real-World X-ray Baggage Security Inspection
por: Velayudhan, Divya, et al.
Publicado: (2025)
por: Velayudhan, Divya, et al.
Publicado: (2025)
Query-Conditioned Evidential Keyframe Sampling for MLLM-Based Long-Form Video Understanding
por: Wang, Yiheng, et al.
Publicado: (2026)
por: Wang, Yiheng, et al.
Publicado: (2026)
A Robust Adversary Detection-Deactivation Method for Metaverse-oriented Collaborative Deep Learning
por: Li, Pengfei, et al.
Publicado: (2023)
por: Li, Pengfei, et al.
Publicado: (2023)
FOCUS: Efficient Keyframe Selection for Long Video Understanding
por: Zhu, Zirui, et al.
Publicado: (2025)
por: Zhu, Zirui, et al.
Publicado: (2025)
AdaSpark: Adaptive Sparsity for Efficient Long-Video Understanding
por: Li, Handong, et al.
Publicado: (2026)
por: Li, Handong, et al.
Publicado: (2026)
DrawVideo: Generating Long Video from Storyboard Keyframe Sketches
por: Xu, Chuanzhi, et al.
Publicado: (2026)
por: Xu, Chuanzhi, et al.
Publicado: (2026)
Decomposing Queries into Tool Calls for Long-Video Keyframe Retrieval
por: Shlapentokh-Rothman, Michal, et al.
Publicado: (2026)
por: Shlapentokh-Rothman, Michal, et al.
Publicado: (2026)
Ejemplares similares
-
Watch and Listen: Understanding Audio-Visual-Speech Moments with Multimodal LLM
por: Li, Zinuo, et al.
Publicado: (2025) -
Controllable Complex Human Motion Video Generation via Text-to-Skeleton Cascades
por: Taghipour, Ashkan, et al.
Publicado: (2026) -
Generalized Closed-form Formulae for Feature-based Subpixel Alignment in Patch-based Matching
por: Jospin, Laurent Valentin, et al.
Publicado: (2021) -
Faster Image2Video Generation: A Closer Look at CLIP Image Embedding's Impact on Spatio-Temporal Cross-Attentions
por: Taghipour, Ashkan, et al.
Publicado: (2024) -
LatentMove: Towards Complex Human Movement Video Generation
por: Taghipour, Ashkan, et al.
Publicado: (2025)