:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yu, Yating, Cao, Congqi, Wang, Zhaoying, Meng, Weihua, Li, Jie, Li, Yuxin, Wei, Zihao, Shen, Zhongpei, Zhang, Jiajun
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2511.00613
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Learning to Generalize without Bias for Open-Vocabulary Action Recognition
by: Yu, Yating, et al.
Published: (2025)

Vision and Intention Boost Large Language Model in Long-Term Action Anticipation
by: Cao, Congqi, et al.
Published: (2025)

Prototypical Learning Guided Context-Aware Segmentation Network for Few-Shot Anomaly Detection
by: Jiang, Yuxin, et al.
Published: (2025)

SRVAU-R1: Enhancing Video Anomaly Understanding via Reflection-Aware Learning
by: Zhao, Zihao, et al.
Published: (2026)

Autoregressive Denoising Score Matching is a Good Video Anomaly Detector
by: Zhang, Hanwen, et al.
Published: (2025)

SAW-Bench: Learning Situated Awareness in the Real World
by: Li, Chuhan, et al.
Published: (2026)

OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?
by: Li, Yifei, et al.
Published: (2025)

Building a Multi-modal Spatiotemporal Expert for Zero-shot Action Recognition with CLIP
by: Yu, Yating, et al.
Published: (2024)

Task-Adapter: Task-specific Adaptation of Image Models for Few-shot Action Recognition
by: Cao, Congqi, et al.
Published: (2024)

UniG2U-Bench: Do Unified Models Advance Multimodal Understanding?
by: Wen, Zimo, et al.
Published: (2026)

Multi-View Reconstruction with Global Context for 3D Anomaly Detection
by: Sun, Yihan, et al.
Published: (2025)

Advancing Adaptive Multi-Stage Video Anomaly Reasoning: A Benchmark Dataset and Method
by: Huang, Chao, et al.
Published: (2026)

InkStream: Real-time GNN Inference on Streaming Graphs via Incremental Update
by: Wu, Dan, et al.
Published: (2023)

Task-Adapter++: Task-specific Adaptation with Order-aware Alignment for Few-shot Action Recognition
by: Cao, Congqi, et al.
Published: (2025)

ESOM: Efficiently Understanding Streaming Video Anomalies with Open-world Dynamic Definitions
by: Liu, Zihao, et al.
Published: (2026)

AgencyBench: Benchmarking the Frontiers of Autonomous Agents in 1M-Token Real-World Contexts
by: Li, Keyu, et al.
Published: (2026)

StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding
by: Lin, Junming, et al.
Published: (2024)

No Need For Real Anomaly: MLLM Empowered Zero-Shot Video Anomaly Detection
by: Dai, Zunkai, et al.
Published: (2026)

ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development
by: Yang, Jie, et al.
Published: (2026)

AD-Bench: A Real-World, Trajectory-Aware Advertising Analytics Benchmark for LLM Agents
by: Hu, Lingxiang, et al.
Published: (2026)

DentalBench: Benchmarking and Advancing LLMs Capability for Bilingual Dentistry Understanding
by: Zhu, Hengchuan, et al.
Published: (2025)

Towards Real-World HDR Video Reconstruction: A Large-Scale Benchmark Dataset and A Two-Stage Alignment Network
by: Shu, Yong, et al.
Published: (2024)

Towards Aerial Collaborative Stereo: Real-Time Cross-Camera Feature Association and Relative Pose Estimation for UAVs
by: Wang, Zhaoying, et al.
Published: (2024)

VAU-R1: Advancing Video Anomaly Understanding via Reinforcement Fine-Tuning
by: Zhu, Liyun, et al.
Published: (2025)

FinMCP-Bench: Benchmarking LLM Agents for Real-World Financial Tool Use under the Model Context Protocol
by: Zhu, Jie, et al.
Published: (2026)

ConsintBench: Evaluating Language Models on Real-World Consumer Intent Understanding
by: Li, Xiaozhe, et al.
Published: (2025)

Retina‐Like Neuromorphic Visual Sensor for Sensing Broad‐Spectrum Ultraviolet Light (Advanced Optical Materials 35/2024)
by: Zhaoying Xi, et al.
Published: (2024)

CauCLIP: Bridging the Sim-to-Real Gap in Surgical Video Understanding via Causality-Inspired Vision-Language Modeling
by: He, Yuxin, et al.
Published: (2026)

Hawk: Learning to Understand Open-World Video Anomalies
by: Tang, Jiaqi, et al.
Published: (2024)

RiskCueBench: Benchmarking Anticipatory Reasoning from Early Risk Cues in Video-Language Models
by: Luo, Sha, et al.
Published: (2026)

VEU-Bench: Towards Comprehensive Understanding of Video Editing
by: Li, Bozheng, et al.
Published: (2025)

ComBench: A Repo-level Real-world Benchmark for Compilation Error Repair
by: Li, Jia, et al.
Published: (2026)

Can LLMs Understand Time Series Anomalies?
by: Zhou, Zihao, et al.
Published: (2024)

Can Unified Generation and Understanding Models Maintain Semantic Equivalence Across Different Output Modalities?
by: Jiang, Hongbo, et al.
Published: (2026)

Context-Aware Probabilistic Modeling with LLM for Multimodal Time Series Forecasting
by: Yao, Yueyang, et al.
Published: (2025)

Rethinking Metrics and Benchmarks of Video Anomaly Detection
by: Liu, Zihao, et al.
Published: (2025)

WorldModelBench: Judging Video Generation Models As World Models
by: Li, Dacheng, et al.
Published: (2025)

ProAgentBench: Evaluating LLM Agents for Proactive Assistance with Real-World Data
by: Tang, Yuanbo, et al.
Published: (2026)

RTV-Bench: Benchmarking MLLM Continuous Perception, Understanding and Reasoning through Real-Time Video
by: Xun, Shuhang, et al.
Published: (2025)

DecompileBench: A Comprehensive Benchmark for Evaluating Decompilers in Real-World Scenarios
by: Gao, Zeyu, et al.
Published: (2025)