Saved in:
| Main Authors: | Fu, Luxuan, Liu, Chong, Yang, Bisheng, Dong, Zhen |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.10551 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SVII-3D: Advancing Roadside Infrastructure Inventory with Decimeter-level 3D Localization and Comprehension from Sparse Street Imagery
by: Liu, Chong, et al.
Published: (2026)
by: Liu, Chong, et al.
Published: (2026)
Lumen: Unleashing Versatile Vision-Centric Capabilities of Large Multimodal Models
by: Jiao, Yang, et al.
Published: (2024)
by: Jiao, Yang, et al.
Published: (2024)
ME-CPT: Multi-Task Enhanced Cross-Temporal Point Transformer for Urban 3D Change Detection
by: Zhang, Luqi, et al.
Published: (2025)
by: Zhang, Luqi, et al.
Published: (2025)
2.5D Object Detection for Intelligent Roadside Infrastructure
by: Polley, Nikolai, et al.
Published: (2025)
by: Polley, Nikolai, et al.
Published: (2025)
DPG-CD: Depth-Prior-Guided Cross-Modal Joint 2D-3D Change Detection
by: Zhang, Luqi, et al.
Published: (2026)
by: Zhang, Luqi, et al.
Published: (2026)
GAGS: Granularity-Aware Feature Distillation for Language Gaussian Splatting
by: Peng, Yuning, et al.
Published: (2024)
by: Peng, Yuning, et al.
Published: (2024)
SpatialLLM: From Multi-modality Data to Urban Spatial Intelligence
by: Chen, Jiabin, et al.
Published: (2025)
by: Chen, Jiabin, et al.
Published: (2025)
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing
by: Fang, Rongyao, et al.
Published: (2025)
by: Fang, Rongyao, et al.
Published: (2025)
NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models
by: Zhou, Gengze, et al.
Published: (2024)
by: Zhou, Gengze, et al.
Published: (2024)
Unleashing the Intrinsic Visual Representation Capability of Multimodal Large Language Models
by: Li, Hengzhuang, et al.
Published: (2025)
by: Li, Hengzhuang, et al.
Published: (2025)
Multimodal HD Mapping for Intersections by Intelligent Roadside Units
by: Chen, Zhongzhang, et al.
Published: (2025)
by: Chen, Zhongzhang, et al.
Published: (2025)
RoadSceneVQA: Benchmarking Visual Question Answering in Roadside Perception Systems for Intelligent Transportation System
by: Guan, Runwei, et al.
Published: (2025)
by: Guan, Runwei, et al.
Published: (2025)
VistaDream: Sampling multiview consistent images for single-view scene reconstruction
by: Wang, Haiping, et al.
Published: (2024)
by: Wang, Haiping, et al.
Published: (2024)
OmniPT: Unleashing the Potential of Large Vision Language Models for Pedestrian Tracking and Understanding
by: Fu, Teng, et al.
Published: (2025)
by: Fu, Teng, et al.
Published: (2025)
RoScenes: A Large-scale Multi-view 3D Dataset for Roadside Perception
by: Zhu, Xiaosu, et al.
Published: (2024)
by: Zhu, Xiaosu, et al.
Published: (2024)
An Empirical Study on Configuring In-Context Learning Demonstrations for Unleashing MLLMs' Sentimental Perception Capability
by: Wu, Daiqing, et al.
Published: (2025)
by: Wu, Daiqing, et al.
Published: (2025)
SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models
by: Yue, Tongtian, et al.
Published: (2024)
by: Yue, Tongtian, et al.
Published: (2024)
Explicitly Guided Information Interaction Network for Cross-modal Point Cloud Completion
by: Xu, Hang, et al.
Published: (2024)
by: Xu, Hang, et al.
Published: (2024)
FreeReg: Image-to-Point Cloud Registration Leveraging Pretrained Diffusion Models and Monocular Depth Estimators
by: Wang, Haiping, et al.
Published: (2023)
by: Wang, Haiping, et al.
Published: (2023)
Infrastructure-Centric World Models: Bridging Temporal Depth and Spatial Breadth for Roadside Perception
by: Meng, Siyuan, et al.
Published: (2026)
by: Meng, Siyuan, et al.
Published: (2026)
Speak While Watching: Unleashing TRUE Real-Time Video Understanding Capability of Multimodal Large Language Models
by: Lin, Junyan, et al.
Published: (2026)
by: Lin, Junyan, et al.
Published: (2026)
Abstract 3D Perception for Spatial Intelligence in Vision-Language Models
by: Liu, Yifan, et al.
Published: (2025)
by: Liu, Yifan, et al.
Published: (2025)
LifelongPR: Lifelong point cloud place recognition based on sample replay and prompt learning
by: Zou, Xianghong, et al.
Published: (2025)
by: Zou, Xianghong, et al.
Published: (2025)
HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding
by: Tao, Chenxin, et al.
Published: (2024)
by: Tao, Chenxin, et al.
Published: (2024)
RoCo-Sim: Enhancing Roadside Collaborative Perception through Foreground Simulation
by: Du, Yuwen, et al.
Published: (2025)
by: Du, Yuwen, et al.
Published: (2025)
MoniRefer: A Real-world Large-scale Multi-modal Dataset based on Roadside Infrastructure for 3D Visual Grounding
by: Yang, Panquan, et al.
Published: (2025)
by: Yang, Panquan, et al.
Published: (2025)
Expert Knowledge-Guided Decision Calibration for Accurate Fine-Grained Tree Species Classification
by: Long, Chen, et al.
Published: (2026)
by: Long, Chen, et al.
Published: (2026)
DeepAAT: Deep Automated Aerial Triangulation for Fast UAV-based Mapping
by: Chen, Zequan, et al.
Published: (2024)
by: Chen, Zequan, et al.
Published: (2024)
Multi-Turn Adaptive Prompting Attack on Large Vision-Language Models
by: Choi, In Chong, et al.
Published: (2026)
by: Choi, In Chong, et al.
Published: (2026)
Accurate Cooperative Localization Utilizing LiDAR-equipped Roadside Infrastructure for Autonomous Driving
by: Jiang, Yuze, et al.
Published: (2024)
by: Jiang, Yuze, et al.
Published: (2024)
Evaluating Graphical Perception Capabilities of Vision Transformers
by: Poonam, Poonam, et al.
Published: (2026)
by: Poonam, Poonam, et al.
Published: (2026)
CORP: A Multi-Modal Dataset for Campus-Oriented Roadside Perception Tasks
by: Wang, Beibei, et al.
Published: (2024)
by: Wang, Beibei, et al.
Published: (2024)
CHOICE: Benchmarking the Remote Sensing Capabilities of Large Vision-Language Models
by: An, Xiao, et al.
Published: (2024)
by: An, Xiao, et al.
Published: (2024)
Reliable-loc: Robust sequential LiDAR global localization in large-scale street scenes based on verifiable cues
by: Zou, Xianghong, et al.
Published: (2024)
by: Zou, Xianghong, et al.
Published: (2024)
SaliencyI2PLoc: saliency-guided image-point cloud localization using contrastive learning
by: Li, Yuhao, et al.
Published: (2024)
by: Li, Yuhao, et al.
Published: (2024)
VL4Gaze: Unleashing Vision-Language Models for Gaze Following
by: Wang, Shijing, et al.
Published: (2025)
by: Wang, Shijing, et al.
Published: (2025)
Unleashing Vision-Language Semantics for Deepfake Video Detection
by: Zhu, Jiawen, et al.
Published: (2026)
by: Zhu, Jiawen, et al.
Published: (2026)
OSMLoc: Single Image-Based Visual Localization in OpenStreetMap with Fused Geometric and Semantic Guidance
by: Liao, Youqi, et al.
Published: (2024)
by: Liao, Youqi, et al.
Published: (2024)
RopeBEV: A Multi-Camera Roadside Perception Network in Bird's-Eye-View
by: Jia, Jinrang, et al.
Published: (2024)
by: Jia, Jinrang, et al.
Published: (2024)
Unlocking the Capabilities of Large Vision-Language Models for Generalizable and Explainable Deepfake Detection
by: Yu, Peipeng, et al.
Published: (2025)
by: Yu, Peipeng, et al.
Published: (2025)
Similar Items
-
SVII-3D: Advancing Roadside Infrastructure Inventory with Decimeter-level 3D Localization and Comprehension from Sparse Street Imagery
by: Liu, Chong, et al.
Published: (2026) -
Lumen: Unleashing Versatile Vision-Centric Capabilities of Large Multimodal Models
by: Jiao, Yang, et al.
Published: (2024) -
ME-CPT: Multi-Task Enhanced Cross-Temporal Point Transformer for Urban 3D Change Detection
by: Zhang, Luqi, et al.
Published: (2025) -
2.5D Object Detection for Intelligent Roadside Infrastructure
by: Polley, Nikolai, et al.
Published: (2025) -
DPG-CD: Depth-Prior-Guided Cross-Modal Joint 2D-3D Change Detection
by: Zhang, Luqi, et al.
Published: (2026)