Saved in:
| Main Authors: | Lu, Hui, Yu, Yi, Lu, Shijian, Rajan, Deepu, Ng, Boon Poh, Kot, Alex C., Jiang, Xudong |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.17929 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
From Pretrain to Pain: Adversarial Vulnerability of Video Foundation Models Without Task Knowledge
by: Lu, Hui, et al.
Published: (2025)
by: Lu, Hui, et al.
Published: (2025)
When Robots Obey the Patch: Universal Transferable Patch Attacks on Vision-Language-Action Models
by: Lu, Hui, et al.
Published: (2025)
by: Lu, Hui, et al.
Published: (2025)
One-Shot Action Recognition via Multi-Scale Spatial-Temporal Skeleton Matching
by: Yang, Siyuan, et al.
Published: (2023)
by: Yang, Siyuan, et al.
Published: (2023)
Vid-Morp: Video Moment Retrieval Pretraining from Unlabeled Videos in the Wild
by: Bao, Peijun, et al.
Published: (2024)
by: Bao, Peijun, et al.
Published: (2024)
Next-Frame Feature Prediction for Multimodal Deepfake Detection and Temporal Localization
by: Anshul, Ashutosh, et al.
Published: (2025)
by: Anshul, Ashutosh, et al.
Published: (2025)
LLMs Meet VLMs: Boost Open Vocabulary Object Detection with Fine-grained Descriptors
by: Jin, Sheng, et al.
Published: (2024)
by: Jin, Sheng, et al.
Published: (2024)
OpenTAD: A Unified Framework and Comprehensive Study of Temporal Action Detection
by: Liu, Shuming, et al.
Published: (2025)
by: Liu, Shuming, et al.
Published: (2025)
LiquidTAD: Efficient Temporal Action Detection via Parallel Liquid-Inspired Temporal Relaxation
by: Sun, Zepeng, et al.
Published: (2026)
by: Sun, Zepeng, et al.
Published: (2026)
SpoT-Mamba: Learning Long-Range Dependency on Spatio-Temporal Graphs with Selective State Spaces
by: Choi, Jinhyeok, et al.
Published: (2024)
by: Choi, Jinhyeok, et al.
Published: (2024)
E.M.Ground: A Temporal Grounding Vid-LLM with Holistic Event Perception and Matching
by: Nie, Jiahao, et al.
Published: (2026)
by: Nie, Jiahao, et al.
Published: (2026)
MambaDETR: Query-based Temporal Modeling using State Space Model for Multi-View 3D Object Detection
by: Ning, Tong, et al.
Published: (2024)
by: Ning, Tong, et al.
Published: (2024)
Know-Show: Benchmarking Video-Language Models on Spatio-Temporal Grounded Reasoning
by: Sugandhika, Chinthani, et al.
Published: (2025)
by: Sugandhika, Chinthani, et al.
Published: (2025)
VOST-SGG: VLM-Aided One-Stage Spatio-Temporal Scene Graph Generation
by: Sugandhika, Chinthani, et al.
Published: (2025)
by: Sugandhika, Chinthani, et al.
Published: (2025)
Universal Adversarial Attacks against Closed-Source MLLMs via Target-View Routed Meta Optimization
by: Lu, Hui, et al.
Published: (2026)
by: Lu, Hui, et al.
Published: (2026)
Temporal-Guided Spiking Neural Networks for Event-Based Human Action Recognition
by: Yang, Siyuan, et al.
Published: (2025)
by: Yang, Siyuan, et al.
Published: (2025)
Backdoor Attacks against No-Reference Image Quality Assessment Models via a Scalable Trigger
by: Yu, Yi, et al.
Published: (2024)
by: Yu, Yi, et al.
Published: (2024)
Large Language Models Meet Contrastive Learning: Zero-Shot Emotion Recognition Across Languages
by: Zou, Heqing, et al.
Published: (2025)
by: Zou, Heqing, et al.
Published: (2025)
Unlearnable Examples Detection via Iterative Filtering
by: Yu, Yi, et al.
Published: (2024)
by: Yu, Yi, et al.
Published: (2024)
RayMamba: Ray-Aligned Serialization for Long-Range 3D Object Detection
by: Lu, Cheng, et al.
Published: (2026)
by: Lu, Cheng, et al.
Published: (2026)
Cross-Domain Few-Shot Segmentation via Multi-view Progressive Adaptation
by: Nie, Jiahao, et al.
Published: (2026)
by: Nie, Jiahao, et al.
Published: (2026)
SP-Mamba: Spatial-Perception State Space Model for Unsupervised Medical Anomaly Detection
by: Pan, Rui, et al.
Published: (2025)
by: Pan, Rui, et al.
Published: (2025)
TE-TAD: Towards Full End-to-End Temporal Action Detection via Time-Aligned Coordinate Expression
by: Kim, Ho-Joong, et al.
Published: (2024)
by: Kim, Ho-Joong, et al.
Published: (2024)
Reducing Object Hallucination in LVLMs via Emphasizing Image-negative Tokens
by: Shen, Meng, et al.
Published: (2026)
by: Shen, Meng, et al.
Published: (2026)
Purify Unlearnable Examples via Rate-Constrained Variational Autoencoders
by: Yu, Yi, et al.
Published: (2024)
by: Yu, Yi, et al.
Published: (2024)
Graph-Mamba: Towards Long-Range Graph Sequence Modeling with Selective State Spaces
by: Wang, Chloe, et al.
Published: (2024)
by: Wang, Chloe, et al.
Published: (2024)
TeRFS: Temporal-Evolving Radio Field Synthesis
by: Zhang, Pengyang, et al.
Published: (2026)
by: Zhang, Pengyang, et al.
Published: (2026)
MMRel: Benchmarking Relation Understanding in Multi-Modal Large Language Models
by: Nie, Jiahao, et al.
Published: (2024)
by: Nie, Jiahao, et al.
Published: (2024)
Towards Model Resistant to Transferable Adversarial Examples via Trigger Activation
by: Yu, Yi, et al.
Published: (2025)
by: Yu, Yi, et al.
Published: (2025)
MTL-UE: Learning to Learn Nothing for Multi-Task Learning
by: Yu, Yi, et al.
Published: (2025)
by: Yu, Yi, et al.
Published: (2025)
SimBase: A Simple Baseline for Temporal Video Grounding
by: Bao, Peijun, et al.
Published: (2024)
by: Bao, Peijun, et al.
Published: (2024)
Open-Vocabulary Object Detection via Language Hierarchy
by: Huang, Jiaxing, et al.
Published: (2024)
by: Huang, Jiaxing, et al.
Published: (2024)
Pano-NeRF: Synthesizing High Dynamic Range Novel Views with Geometry from Sparse Low Dynamic Range Panoramic Images
by: Lu, Zhan, et al.
Published: (2023)
by: Lu, Zhan, et al.
Published: (2023)
Weakly Supervised Monocular 3D Detection with a Single-View Image
by: Jiang, Xueying, et al.
Published: (2024)
by: Jiang, Xueying, et al.
Published: (2024)
From Pixels to Gigapixels: Bridging Local Inductive Bias and Long-Range Dependencies with Pixel-Mamba
by: Qiu, Zhongwei, et al.
Published: (2024)
by: Qiu, Zhongwei, et al.
Published: (2024)
TSkel-Mamba: Temporal Dynamic Modeling via State Space Model for Human Skeleton-based Action Recognition
by: Liu, Yanan, et al.
Published: (2025)
by: Liu, Yanan, et al.
Published: (2025)
Robust and Transferable Backdoor Attacks Against Deep Image Compression With Selective Frequency Prior
by: Yu, Yi, et al.
Published: (2024)
by: Yu, Yi, et al.
Published: (2024)
Situational Scene Graph for Structured Human-centric Situation Understanding
by: Sugandhika, Chinthani, et al.
Published: (2024)
by: Sugandhika, Chinthani, et al.
Published: (2024)
ActivityForensics: A Comprehensive Benchmark for Localizing Manipulated Activity in Videos
by: Bao, Peijun, et al.
Published: (2026)
by: Bao, Peijun, et al.
Published: (2026)
VideoMamba: Spatio-Temporal Selective State Space Model
by: Park, Jinyoung, et al.
Published: (2024)
by: Park, Jinyoung, et al.
Published: (2024)
ChessMamba: Structure-Aware Interleaving of State Spaces for Change Detection in Remote Sensing Images
by: Ding, Lei, et al.
Published: (2025)
by: Ding, Lei, et al.
Published: (2025)
Similar Items
-
From Pretrain to Pain: Adversarial Vulnerability of Video Foundation Models Without Task Knowledge
by: Lu, Hui, et al.
Published: (2025) -
When Robots Obey the Patch: Universal Transferable Patch Attacks on Vision-Language-Action Models
by: Lu, Hui, et al.
Published: (2025) -
One-Shot Action Recognition via Multi-Scale Spatial-Temporal Skeleton Matching
by: Yang, Siyuan, et al.
Published: (2023) -
Vid-Morp: Video Moment Retrieval Pretraining from Unlabeled Videos in the Wild
by: Bao, Peijun, et al.
Published: (2024) -
Next-Frame Feature Prediction for Multimodal Deepfake Detection and Temporal Localization
by: Anshul, Ashutosh, et al.
Published: (2025)