:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Fu, Luxuan, Liu, Chong, Yang, Bisheng, Dong, Zhen
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2601.10551
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

SVII-3D: Advancing Roadside Infrastructure Inventory with Decimeter-level 3D Localization and Comprehension from Sparse Street Imagery
by: Liu, Chong, et al.
Published: (2026)

Lumen: Unleashing Versatile Vision-Centric Capabilities of Large Multimodal Models
by: Jiao, Yang, et al.
Published: (2024)

ME-CPT: Multi-Task Enhanced Cross-Temporal Point Transformer for Urban 3D Change Detection
by: Zhang, Luqi, et al.
Published: (2025)

2.5D Object Detection for Intelligent Roadside Infrastructure
by: Polley, Nikolai, et al.
Published: (2025)

DPG-CD: Depth-Prior-Guided Cross-Modal Joint 2D-3D Change Detection
by: Zhang, Luqi, et al.
Published: (2026)

GAGS: Granularity-Aware Feature Distillation for Language Gaussian Splatting
by: Peng, Yuning, et al.
Published: (2024)

SpatialLLM: From Multi-modality Data to Urban Spatial Intelligence
by: Chen, Jiabin, et al.
Published: (2025)

GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing
by: Fang, Rongyao, et al.
Published: (2025)

NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models
by: Zhou, Gengze, et al.
Published: (2024)

Unleashing the Intrinsic Visual Representation Capability of Multimodal Large Language Models
by: Li, Hengzhuang, et al.
Published: (2025)

Multimodal HD Mapping for Intersections by Intelligent Roadside Units
by: Chen, Zhongzhang, et al.
Published: (2025)

RoadSceneVQA: Benchmarking Visual Question Answering in Roadside Perception Systems for Intelligent Transportation System
by: Guan, Runwei, et al.
Published: (2025)

VistaDream: Sampling multiview consistent images for single-view scene reconstruction
by: Wang, Haiping, et al.
Published: (2024)

OmniPT: Unleashing the Potential of Large Vision Language Models for Pedestrian Tracking and Understanding
by: Fu, Teng, et al.
Published: (2025)

RoScenes: A Large-scale Multi-view 3D Dataset for Roadside Perception
by: Zhu, Xiaosu, et al.
Published: (2024)

An Empirical Study on Configuring In-Context Learning Demonstrations for Unleashing MLLMs' Sentimental Perception Capability
by: Wu, Daiqing, et al.
Published: (2025)

SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models
by: Yue, Tongtian, et al.
Published: (2024)

Explicitly Guided Information Interaction Network for Cross-modal Point Cloud Completion
by: Xu, Hang, et al.
Published: (2024)

FreeReg: Image-to-Point Cloud Registration Leveraging Pretrained Diffusion Models and Monocular Depth Estimators
by: Wang, Haiping, et al.
Published: (2023)

Infrastructure-Centric World Models: Bridging Temporal Depth and Spatial Breadth for Roadside Perception
by: Meng, Siyuan, et al.
Published: (2026)

Speak While Watching: Unleashing TRUE Real-Time Video Understanding Capability of Multimodal Large Language Models
by: Lin, Junyan, et al.
Published: (2026)

Abstract 3D Perception for Spatial Intelligence in Vision-Language Models
by: Liu, Yifan, et al.
Published: (2025)

LifelongPR: Lifelong point cloud place recognition based on sample replay and prompt learning
by: Zou, Xianghong, et al.
Published: (2025)

HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding
by: Tao, Chenxin, et al.
Published: (2024)

RoCo-Sim: Enhancing Roadside Collaborative Perception through Foreground Simulation
by: Du, Yuwen, et al.
Published: (2025)

MoniRefer: A Real-world Large-scale Multi-modal Dataset based on Roadside Infrastructure for 3D Visual Grounding
by: Yang, Panquan, et al.
Published: (2025)

Expert Knowledge-Guided Decision Calibration for Accurate Fine-Grained Tree Species Classification
by: Long, Chen, et al.
Published: (2026)

DeepAAT: Deep Automated Aerial Triangulation for Fast UAV-based Mapping
by: Chen, Zequan, et al.
Published: (2024)

Multi-Turn Adaptive Prompting Attack on Large Vision-Language Models
by: Choi, In Chong, et al.
Published: (2026)

Accurate Cooperative Localization Utilizing LiDAR-equipped Roadside Infrastructure for Autonomous Driving
by: Jiang, Yuze, et al.
Published: (2024)

Evaluating Graphical Perception Capabilities of Vision Transformers
by: Poonam, Poonam, et al.
Published: (2026)

CORP: A Multi-Modal Dataset for Campus-Oriented Roadside Perception Tasks
by: Wang, Beibei, et al.
Published: (2024)

CHOICE: Benchmarking the Remote Sensing Capabilities of Large Vision-Language Models
by: An, Xiao, et al.
Published: (2024)

Reliable-loc: Robust sequential LiDAR global localization in large-scale street scenes based on verifiable cues
by: Zou, Xianghong, et al.
Published: (2024)

SaliencyI2PLoc: saliency-guided image-point cloud localization using contrastive learning
by: Li, Yuhao, et al.
Published: (2024)

VL4Gaze: Unleashing Vision-Language Models for Gaze Following
by: Wang, Shijing, et al.
Published: (2025)

Unleashing Vision-Language Semantics for Deepfake Video Detection
by: Zhu, Jiawen, et al.
Published: (2026)

OSMLoc: Single Image-Based Visual Localization in OpenStreetMap with Fused Geometric and Semantic Guidance
by: Liao, Youqi, et al.
Published: (2024)

RopeBEV: A Multi-Camera Roadside Perception Network in Bird's-Eye-View
by: Jia, Jinrang, et al.
Published: (2024)

Unlocking the Capabilities of Large Vision-Language Models for Generalizable and Explainable Deepfake Detection
by: Yu, Peipeng, et al.
Published: (2025)