:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Liao, Pan, Yang, Feng, Wu, Di, Yu, Jinwen, Zhu, Yuhua, Zhao, Wenhui, Zhang, Dingwen
Format:	Preprint
Veröffentlicht:	2026
Schlagworte:	Computer Vision and Pattern Recognition Artificial Intelligence
Online-Zugang:	https://arxiv.org/abs/2601.06550
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

FastTrackTr:Towards Fast Multi-Object Tracking with Transformers
von: Liao, Pan, et al.
Veröffentlicht: (2024)

DecoderTracker: Decoder-Only Method for Multiple-Object Tracking
von: Pan, Liao, et al.
Veröffentlicht: (2023)

StereoMV2D: A Sparse Temporal Stereo-Enhanced Framework for Robust Multi-View 3D Object Detection
von: Wu, Di, et al.
Veröffentlicht: (2025)

HV-BEV: Decoupling Horizontal and Vertical Feature Sampling for Multi-View 3D Object Detection
von: Wu, Di, et al.
Veröffentlicht: (2024)

MonoDETRNext: Next-Generation Accurate and Efficient Monocular 3D Object Detector
von: Liao, Pan, et al.
Veröffentlicht: (2024)

Visual Object Tracking on Multi-modal RGB-D Videos: A Review
von: Zhu, Xue-Feng, et al.
Veröffentlicht: (2022)

Motion State: A New Benchmark Multiple Object Tracking
von: Feng, Yang, et al.
Veröffentlicht: (2023)

DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document Understanding
von: Liao, Wenhui, et al.
Veröffentlicht: (2024)

MambaDFuse: A Mamba-based Dual-phase Model for Multi-modality Image Fusion
von: Li, Zhe, et al.
Veröffentlicht: (2024)

PoseStreamer: A Multi-modal Framework for 3D Tracking of Unseen Moving Objects
von: Yang, Huiming, et al.
Veröffentlicht: (2025)

OCC-MLLM-Alpha:Empowering Multi-modal Large Language Model for the Understanding of Occluded Objects with Self-Supervised Test-Time Learning
von: Yang, Shuxin, et al.
Veröffentlicht: (2024)

Awesome Multi-modal Object Tracking
von: Zhang, Chunhui, et al.
Veröffentlicht: (2024)

Training-Free Semantic Multi-Object Tracking with Vision-Language Models
von: Bonat, Laurence, et al.
Veröffentlicht: (2026)

Multi-modal Deep Learning
von: Yuhua, Chen
Veröffentlicht: (2024)

Adaptive Perception for Unified Visual Multi-modal Object Tracking
von: Hu, Xiantao, et al.
Veröffentlicht: (2025)

Multi-Granularity Language-Guided Training for Multi-Object Tracking
von: Li, Yuhao, et al.
Veröffentlicht: (2024)

View-Centric Multi-Object Tracking with Homographic Matching in Moving UAV
von: Ji, Deyi, et al.
Veröffentlicht: (2024)

Empowering Segmentation Ability to Multi-modal Large Language Models
von: Yang, Yuqi, et al.
Veröffentlicht: (2024)

Unified Generative and Discriminative Training for Multi-modal Large Language Models
von: Chow, Wei, et al.
Veröffentlicht: (2024)

CrossTracker: Robust Multi-modal 3D Multi-Object Tracking via Cross Correction
von: Gu, Lipeng, et al.
Veröffentlicht: (2024)

Beyond MOT: Semantic Multi-Object Tracking
von: Li, Yunhao, et al.
Veröffentlicht: (2024)

Vision-Motion-Reference Alignment for Referring Multi-Object Tracking via Multi-Modal Large Language Models
von: Lv, Weiyi, et al.
Veröffentlicht: (2025)

DiffusionTrack: Diffusion Model For Multi-Object Tracking
von: Luo, Run, et al.
Veröffentlicht: (2023)

Q-Ground: Image Quality Grounding with Large Multi-modality Models
von: Chen, Chaofeng, et al.
Veröffentlicht: (2024)

4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object Understanding
von: Zhu, Wenxuan, et al.
Veröffentlicht: (2025)

Visual Language Tracking with Multi-modal Interaction: A Robust Benchmark
von: Li, Xuchen, et al.
Veröffentlicht: (2024)

ViewSAM: Learning View-aware Cross-modal Semantics for Weakly Supervised Cross-view Referring Multi-Object Tracking
von: Ge, Jiawei, et al.
Veröffentlicht: (2026)

MemorySAM: Memorize Modalities and Semantics with Segment Anything Model 2 for Multi-modal Semantic Segmentation
von: Liao, Chenfei, et al.
Veröffentlicht: (2025)

Universal Video Temporal Grounding with Generative Multi-modal Large Language Models
von: Li, Zeqian, et al.
Veröffentlicht: (2025)

Adapting Multi-modal Large Language Model to Concept Drift From Pre-training Onwards
von: Yang, Xiaoyu, et al.
Veröffentlicht: (2024)

Multi-modal Attribute Prompting for Vision-Language Models
von: Liu, Xin, et al.
Veröffentlicht: (2024)

DualFocus: Integrating Macro and Micro Perspectives in Multi-modal Large Language Models
von: Cao, Yuhang, et al.
Veröffentlicht: (2024)

Q-Doc: Benchmarking Document Image Quality Assessment Capabilities in Multi-modal Large Language Models
von: Huang, Jiaxi, et al.
Veröffentlicht: (2025)

LLMRA: Multi-modal Large Language Model based Restoration Assistant
von: Jin, Xiaoyu, et al.
Veröffentlicht: (2024)

InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models
von: Wei, Cong, et al.
Veröffentlicht: (2024)

FOLDER: Accelerating Multi-modal Large Language Models with Enhanced Performance
von: Wang, Haicheng, et al.
Veröffentlicht: (2025)

MMLF: Multi-modal Multi-class Late Fusion for Object Detection with Uncertainty Estimation
von: Yang, Qihang, et al.
Veröffentlicht: (2024)

A Survey of Deep Learning Based Radar and Vision Fusion for 3D Object Detection in Autonomous Driving
von: Wu, Di, et al.
Veröffentlicht: (2024)

VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models
von: Xu, Weiye, et al.
Veröffentlicht: (2025)

Real-time Multi-modal Object Detection and Tracking on Edge for Regulatory Compliance Monitoring
von: Lim, Jia Syuen, et al.
Veröffentlicht: (2023)