:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ruan, Chi, Zhao, Jiying, Chen, Wenhu
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2502.20622
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

RTGen: Generating Region-Text Pairs for Open-Vocabulary Object Detection
by: Chen, Fangyi, et al.
Published: (2024)

QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design
by: Schneider, Benjamin, et al.
Published: (2025)

Real-Time Oriented Object Detection Transformer in Remote Sensing Images
by: Ding, Zeyu, et al.
Published: (2026)

LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection
by: Chen, Qiang, et al.
Published: (2024)

RT-DETRv2: Improved Baseline with Bag-of-Freebies for Real-Time Detection Transformer
by: Lv, Wenyu, et al.
Published: (2024)

Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers
by: Ren, Weiming, et al.
Published: (2025)

RF-DETR: Neural Architecture Search for Real-Time Detection Transformers
by: Robinson, Isaac, et al.
Published: (2025)

Livatar-1: Real-Time Talking Heads Generation with Tailored Flow Matching
by: Liu, Haiyang, et al.
Published: (2025)

YOLO-IOD: Towards Real Time Incremental Object Detection
by: Zhang, Shizhou, et al.
Published: (2025)

UAV-Assisted Real-Time Disaster Detection Using Optimized Transformer Model
by: Jankovic, Branislava, et al.
Published: (2025)

Le-DETR: Revisiting Real-Time Detection Transformer with Efficient Encoder Design
by: Huang, Jiannan, et al.
Published: (2026)

PixelWorld: How Far Are We from Perceiving Everything as Pixels?
by: Lyu, Zhiheng, et al.
Published: (2025)

Real-Time Deepfake Detection in the Real-World
by: Cavia, Bar, et al.
Published: (2024)

YOLOv10: Real-Time End-to-End Object Detection
by: Wang, Ao, et al.
Published: (2024)

Learning Motion Blur Robust Vision Transformers for Real-Time UAV Tracking
by: Wu, You, et al.
Published: (2024)

Real-Time 3D Object Detection with Inference-Aligned Learning
by: Zhao, Chenyu, et al.
Published: (2025)

A Real-Time Framework for Domain-Adaptive Underwater Object Detection with Image Enhancement
by: Wen, Junjie, et al.
Published: (2024)

CF-DETR: Coarse-to-Fine Transformer for Real-Time Object Detection
by: Shin, Woojin, et al.
Published: (2025)

Kosmos-G: Generating Images in Context with Multimodal Large Language Models
by: Pan, Xichen, et al.
Published: (2023)

ABC: Achieving Better Control of Multimodal Embeddings using VLMs
by: Schneider, Benjamin, et al.
Published: (2025)

Real-Time Detection of Electronic Components in Waste Printed Circuit Boards: A Transformer-Based Approach
by: Mohsin, Muhammad, et al.
Published: (2024)

Learning an Adaptive and View-Invariant Vision Transformer for Real-Time UAV Tracking
by: Wu, You, et al.
Published: (2024)

KV-Tracker: Real-Time Pose Tracking with Transformers
by: Taher, Marwan, et al.
Published: (2025)

RealCam: Real-Time Novel-View Video Generation with Interactive Camera Control
by: Xu, Youcan, et al.
Published: (2026)

ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation
by: Ren, Weiming, et al.
Published: (2024)

RT-DETRv4: Painlessly Furthering Real-Time Object Detection with Vision Foundation Models
by: Liao, Zijun, et al.
Published: (2025)

Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head
by: Zhao, Tiancheng, et al.
Published: (2024)

DiT as Real-Time Rerenderer: Streaming Video Stylization with Autoregressive Diffusion Transformer
by: Lyu, Hengye, et al.
Published: (2026)

Context Forcing: Consistent Autoregressive Video Generation with Long Context
by: Chen, Shuo, et al.
Published: (2026)

When Every Millisecond Counts: Real-Time Anomaly Detection via the Multimodal Asynchronous Hybrid Network
by: Xiao, Dong, et al.
Published: (2025)

RTMap: Real-Time Recursive Mapping with Change Detection and Localization
by: Du, Yuheng, et al.
Published: (2025)

Real-Time Indoor Object Detection based on hybrid CNN-Transformer Approach
by: Laidoudi, Salah Eddine, et al.
Published: (2024)

UniVideo: Unified Understanding, Generation, and Editing for Videos
by: Wei, Cong, et al.
Published: (2025)

Test-Time Intensity Consistency Adaptation for Shadow Detection
by: Zhu, Leyi, et al.
Published: (2024)

Style-Adaptive Detection Transformer for Single-Source Domain Generalized Object Detection
by: Han, Jianhong, et al.
Published: (2025)

Helios: Real Real-Time Long Video Generation Model
by: Yuan, Shenghai, et al.
Published: (2026)

CogDoc: Towards Unified thinking in Documents
by: Xu, Qixin, et al.
Published: (2025)

Starve to Perceive: Taming Lazy Perception in VLMs with Constrained Visual Bandwidth
by: Wu, Yuhuan, et al.
Published: (2026)

VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation
by: Ren, Weiming, et al.
Published: (2024)

ROMA: Run-Time Object Detection To Maximize Real-Time Accuracy
by: Lee, JunKyu, et al.
Published: (2022)