:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Xia, Zhongyi, Wu, Tianzhao
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2409.01133
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Language-Based Depth Hints for Monocular Depth Estimation
by: Auty, Dylan, et al.
Published: (2024)

Advancing Depth Anything Model for Unsupervised Monocular Depth Estimation in Endoscopy
by: Li, Bojian, et al.
Published: (2024)

Focusable Monocular Depth Estimation
by: Du, Yuxin, et al.
Published: (2026)

BRIDGE -- Building Reinforcement-Learning Depth-to-Image Data Generation Engine for Monocular Depth Estimation
by: Liu, Dingning, et al.
Published: (2025)

Shedding Light on Depth: Explainability Assessment in Monocular Depth Estimation
by: Cirillo, Lorenzo, et al.
Published: (2025)

Can Multimodal Large Language Models Truly Understand Small Objects?
by: Han, Fujun, et al.
Published: (2026)

CLIP Can Understand Depth
by: Kim, Sohee, et al.
Published: (2024)

WorDepth: Variational Language Prior for Monocular Depth Estimation
by: Zeng, Ziyao, et al.
Published: (2024)

DepthDark: Robust Monocular Depth Estimation for Low-Light Environments
by: Zeng, Longjian, et al.
Published: (2025)

Always Clear Depth: Robust Monocular Depth Estimation under Adverse Weather
by: Jiang, Kui, et al.
Published: (2025)

See in Depth: Training-Free Surgical Scene Segmentation with Monocular Depth Priors
by: Yang, Kunyi, et al.
Published: (2025)

SD-VLM: Spatial Measuring and Understanding with Depth-Encoded Vision-Language Models
by: Chen, Pingyi, et al.
Published: (2025)

ECoDepth: Effective Conditioning of Diffusion Models for Monocular Depth Estimation
by: Patni, Suraj, et al.
Published: (2024)

SM4Depth: Seamless Monocular Metric Depth Estimation across Multiple Cameras and Scenes by One Model
by: Liu, Yihao, et al.
Published: (2024)

Argus: Leveraging Multiview Images for Improved 3-D Scene Understanding With Large Language Models
by: Xu, Yifan, et al.
Published: (2025)

Can Multimodal Large Language Models Understand Pathologic Movements? A Pilot Study on Seizure Semiology
by: Zhang, Lina, et al.
Published: (2026)

AuxDepthNet: Real-Time Monocular 3D Object Detection with Depth-Sensitive Features
by: Zhang, Ruochen, et al.
Published: (2025)

UMono: Physical Model Informed Hybrid CNN-Transformer Framework for Underwater Monocular Depth Estimation
by: Wang, Jian, et al.
Published: (2024)

AgriChat: A Multimodal Large Language Model for Agriculture Image Understanding
by: Boudiaf, Abderrahmene, et al.
Published: (2026)

Can Large Language Models Understand Symbolic Graphics Programs?
by: Qiu, Zeju, et al.
Published: (2024)

Real-time Accident Anticipation for Autonomous Driving Through Monocular Depth-Enhanced 3D Modeling
by: Liao, Haicheng, et al.
Published: (2024)

Can Vision-Language Models Understand Construction Workers? An Exploratory Study
by: Bui, Hieu, et al.
Published: (2026)

Can Vision Language Models Understand Mimed Actions?
by: Cho, Hyundong, et al.
Published: (2025)

Deep Neighbor Layer Aggregation for Lightweight Self-Supervised Monocular Depth Estimation
by: Boya, Wang, et al.
Published: (2023)

II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models
by: Liu, Ziqiang, et al.
Published: (2024)

Can Large Multimodal Models Understand Agricultural Scenes? Benchmarking with AgroMind
by: Li, Qingmei, et al.
Published: (2025)

Can Large Vision-Language Models Detect Images Copyright Infringement from GenAI?
by: Xu, Qipan, et al.
Published: (2025)

Understanding Counting Mechanisms in Large Language and Vision-Language Models
by: Hasani, Hosein, et al.
Published: (2025)

Adaptive Discrete Disparity Volume for Self-supervised Monocular Depth Estimation
by: Ren, Jianwei
Published: (2024)

Text as Images: Can Multimodal Large Language Models Follow Printed Instructions in Pixels?
by: Li, Xiujun, et al.
Published: (2023)

A Critical Synthesis of Uncertainty Quantification and Foundation Models in Monocular Depth Estimation
by: Landgraf, Steven, et al.
Published: (2025)

Review of Hallucination Understanding in Large Language and Vision Models
by: Ho, Zhengyi, et al.
Published: (2025)

On Large Visual Language Models for Medical Imaging Analysis: An Empirical Study
by: Van, Minh-Hao, et al.
Published: (2024)

S3MOT: Monocular 3D Object Tracking with Selective State Space Model
by: Yan, Zhuohao, et al.
Published: (2025)

FreDSNet: Joint Monocular Depth and Semantic Segmentation with Fast Fourier Convolutions
by: Berenguel-Baeta, Bruno, et al.
Published: (2022)

In-Depth and In-Breadth: Pre-training Multimodal Language Models Customized for Comprehensive Chart Understanding
by: Fan, Wan-Cyuan, et al.
Published: (2025)

MOSABench: Multi-Object Sentiment Analysis Benchmark for Evaluating Multimodal Large Language Models Understanding of Complex Image
by: Song, Shezheng, et al.
Published: (2024)

EndoGMDE: Generalizable Monocular Depth Estimation with Mixture of Low-Rank Experts for Diverse Endoscopic Scenes
by: Shao, Liangjing, et al.
Published: (2025)

No Pose Estimation? No Problem: Pose-Agnostic and Instance-Aware Test-Time Adaptation for Monocular Depth Estimation
by: Sung, Mingyu, et al.
Published: (2025)

$D^3$-RSMDE: 40$\times$ Faster and High-Fidelity Remote Sensing Monocular Depth Estimation
by: Wang, Ruizhi, et al.
Published: (2026)