:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Tzachor, Issar, Samuel, Dvir, Ben-Ari, Rami
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2602.08099
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Fast Autoregressive Video Diffusion and World Models with Temporal Cache Compression and Sparse Attention
by: Samuel, Dvir, et al.
Published: (2026)

Find your Needle: Small Object Image Retrieval via Multi-Object Attention Optimization
by: Green, Michael, et al.
Published: (2025)

EffoVPR: Effective Foundation Model Utilization for Visual Place Recognition
by: Tzachor, Issar, et al.
Published: (2024)

Retrieval-Augmented Gaussian Avatars: Improving Expression Generalization
by: Levy, Matan, et al.
Published: (2026)

Where's Waldo: Diffusion Features for Personalized Segmentation and Retrieval
by: Samuel, Dvir, et al.
Published: (2024)

BlobGEN-Vid: Compositional Text-to-Video Generation with Blob Video Representations
by: Feng, Weixi, et al.
Published: (2025)

OmnimatteZero: Fast Training-free Omnimatte with Pre-trained Video Diffusion Models
by: Samuel, Dvir, et al.
Published: (2025)

Set Features for Anomaly Detection
by: Cohen, Niv, et al.
Published: (2023)

AdaVid: Adaptive Video-Language Pretraining
by: Patel, Chaitanya, et al.
Published: (2025)

Place-it-R1: Unlocking Environment-aware Reasoning Potential of MLLM for Video Object Insertion
by: Gu, Bohai, et al.
Published: (2026)

InstructVid2Vid: Controllable Video Editing with Natural Language Instructions
by: Qin, Bosheng, et al.
Published: (2023)

SMART: Shot-Aware Multimodal Video Moment Retrieval with Audio-Enhanced MLLM
by: Yu, An, et al.
Published: (2025)

VidPrism: Heterogeneous Mixture of Experts for Image-to-Video Transfer
by: Lin, Rui, et al.
Published: (2026)

VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?
by: Tang, Yolo Y., et al.
Published: (2024)

RelightVid: Temporal-Consistent Diffusion Model for Video Relighting
by: Fang, Ye, et al.
Published: (2025)

EdgeVidSum: Real-Time Personalized Video Summarization at the Edge
by: Mujtaba, Ghulam, et al.
Published: (2025)

VidLBEval: Benchmarking and Mitigating Language Bias in Video-Involved LVLMs
by: Yang, Yiming, et al.
Published: (2025)

SafeVid: Toward Safety Aligned Video Large Multimodal Models
by: Wang, Yixu, et al.
Published: (2025)

VidTwin: Video VAE with Decoupled Structure and Dynamics
by: Wang, Yuchi, et al.
Published: (2024)

VidEvent: A Large Dataset for Understanding Dynamic Evolution of Events in Videos
by: Liang, Baoyu, et al.
Published: (2025)

VidSketch: Hand-drawn Sketch-Driven Video Generation with Diffusion Control
by: Jiang, Lifan, et al.
Published: (2025)

Vid-SME: Membership Inference Attacks against Large Video Understanding Models
by: Li, Qi, et al.
Published: (2025)

Ambiguity-Restrained Text-Video Representation Learning for Partially Relevant Video Retrieval
by: Cho, CH, et al.
Published: (2025)

UniVid: Pyramid Diffusion Model for High Quality Video Generation
by: Xiao, Xinyu, et al.
Published: (2026)

VidCtx: Context-aware Video Question Answering with Image Models
by: Goulas, Andreas, et al.
Published: (2024)

Task-Specific Adaptation with Restricted Model Access
by: Levy, Matan, et al.
Published: (2025)

OTT-Vid: Optimal Transport Temporal Token Compression for Video Large Language Models
by: Kang, Minseok, et al.
Published: (2026)

VidDoS: Universal Denial-of-Service Attack on Video-based Large Language Models
by: Tang, Duoxun, et al.
Published: (2026)

VidLaDA: Bidirectional Diffusion Large Language Models for Efficient Video Understanding
by: He, Zhihao, et al.
Published: (2026)

FreeVA: Offline MLLM as Training-Free Video Assistant
by: Wu, Wenhao
Published: (2024)

VidTok: A Versatile and Open-Source Video Tokenizer
by: Tang, Anni, et al.
Published: (2024)

CounterVid: Counterfactual Video Generation for Mitigating Action and Temporal Hallucinations in Video-Language Models
by: Poppi, Tobia, et al.
Published: (2026)

VidEgoThink: Assessing Egocentric Video Understanding Capabilities for Embodied AI
by: Cheng, Sijie, et al.
Published: (2024)

SurgVidLM: Towards Multi-grained Surgical Video Understanding with Large Language Model
by: Wang, Guankun, et al.
Published: (2025)

Vid-Freeze: Protecting Images from Malicious Image-to-Video Generation via Temporal Freezing
by: Chowdhury, Rohit, et al.
Published: (2025)

V-Stylist: Video Stylization via Collaboration and Reflection of MLLM Agents
by: Yue, Zhengrong, et al.
Published: (2025)

REGen: Multimodal Retrieval-Embedded Generation for Long-to-Short Video Editing
by: Xu, Weihan, et al.
Published: (2025)

Multi-Scale Temporal Difference Transformer for Video-Text Retrieval
by: Wang, Ni, et al.
Published: (2024)

SciVid: Cross-Domain Evaluation of Video Models in Scientific Applications
by: Hasson, Yana, et al.
Published: (2025)

EM-Vid: Training-Free Entity-Centric Memory for Efficient and Consistent Multi-Shot Video Generation
by: Vandersanden, Jente, et al.
Published: (2026)