Saved in:
| Main Authors: | Xia, Zhongyi, Wu, Tianzhao |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2409.01133 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Language-Based Depth Hints for Monocular Depth Estimation
by: Auty, Dylan, et al.
Published: (2024)
by: Auty, Dylan, et al.
Published: (2024)
Advancing Depth Anything Model for Unsupervised Monocular Depth Estimation in Endoscopy
by: Li, Bojian, et al.
Published: (2024)
by: Li, Bojian, et al.
Published: (2024)
Focusable Monocular Depth Estimation
by: Du, Yuxin, et al.
Published: (2026)
by: Du, Yuxin, et al.
Published: (2026)
BRIDGE -- Building Reinforcement-Learning Depth-to-Image Data Generation Engine for Monocular Depth Estimation
by: Liu, Dingning, et al.
Published: (2025)
by: Liu, Dingning, et al.
Published: (2025)
Shedding Light on Depth: Explainability Assessment in Monocular Depth Estimation
by: Cirillo, Lorenzo, et al.
Published: (2025)
by: Cirillo, Lorenzo, et al.
Published: (2025)
Can Multimodal Large Language Models Truly Understand Small Objects?
by: Han, Fujun, et al.
Published: (2026)
by: Han, Fujun, et al.
Published: (2026)
CLIP Can Understand Depth
by: Kim, Sohee, et al.
Published: (2024)
by: Kim, Sohee, et al.
Published: (2024)
WorDepth: Variational Language Prior for Monocular Depth Estimation
by: Zeng, Ziyao, et al.
Published: (2024)
by: Zeng, Ziyao, et al.
Published: (2024)
DepthDark: Robust Monocular Depth Estimation for Low-Light Environments
by: Zeng, Longjian, et al.
Published: (2025)
by: Zeng, Longjian, et al.
Published: (2025)
Always Clear Depth: Robust Monocular Depth Estimation under Adverse Weather
by: Jiang, Kui, et al.
Published: (2025)
by: Jiang, Kui, et al.
Published: (2025)
See in Depth: Training-Free Surgical Scene Segmentation with Monocular Depth Priors
by: Yang, Kunyi, et al.
Published: (2025)
by: Yang, Kunyi, et al.
Published: (2025)
SD-VLM: Spatial Measuring and Understanding with Depth-Encoded Vision-Language Models
by: Chen, Pingyi, et al.
Published: (2025)
by: Chen, Pingyi, et al.
Published: (2025)
ECoDepth: Effective Conditioning of Diffusion Models for Monocular Depth Estimation
by: Patni, Suraj, et al.
Published: (2024)
by: Patni, Suraj, et al.
Published: (2024)
SM4Depth: Seamless Monocular Metric Depth Estimation across Multiple Cameras and Scenes by One Model
by: Liu, Yihao, et al.
Published: (2024)
by: Liu, Yihao, et al.
Published: (2024)
Argus: Leveraging Multiview Images for Improved 3-D Scene Understanding With Large Language Models
by: Xu, Yifan, et al.
Published: (2025)
by: Xu, Yifan, et al.
Published: (2025)
Can Multimodal Large Language Models Understand Pathologic Movements? A Pilot Study on Seizure Semiology
by: Zhang, Lina, et al.
Published: (2026)
by: Zhang, Lina, et al.
Published: (2026)
AuxDepthNet: Real-Time Monocular 3D Object Detection with Depth-Sensitive Features
by: Zhang, Ruochen, et al.
Published: (2025)
by: Zhang, Ruochen, et al.
Published: (2025)
UMono: Physical Model Informed Hybrid CNN-Transformer Framework for Underwater Monocular Depth Estimation
by: Wang, Jian, et al.
Published: (2024)
by: Wang, Jian, et al.
Published: (2024)
AgriChat: A Multimodal Large Language Model for Agriculture Image Understanding
by: Boudiaf, Abderrahmene, et al.
Published: (2026)
by: Boudiaf, Abderrahmene, et al.
Published: (2026)
Can Large Language Models Understand Symbolic Graphics Programs?
by: Qiu, Zeju, et al.
Published: (2024)
by: Qiu, Zeju, et al.
Published: (2024)
Real-time Accident Anticipation for Autonomous Driving Through Monocular Depth-Enhanced 3D Modeling
by: Liao, Haicheng, et al.
Published: (2024)
by: Liao, Haicheng, et al.
Published: (2024)
Can Vision-Language Models Understand Construction Workers? An Exploratory Study
by: Bui, Hieu, et al.
Published: (2026)
by: Bui, Hieu, et al.
Published: (2026)
Can Vision Language Models Understand Mimed Actions?
by: Cho, Hyundong, et al.
Published: (2025)
by: Cho, Hyundong, et al.
Published: (2025)
Deep Neighbor Layer Aggregation for Lightweight Self-Supervised Monocular Depth Estimation
by: Boya, Wang, et al.
Published: (2023)
by: Boya, Wang, et al.
Published: (2023)
II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models
by: Liu, Ziqiang, et al.
Published: (2024)
by: Liu, Ziqiang, et al.
Published: (2024)
Can Large Multimodal Models Understand Agricultural Scenes? Benchmarking with AgroMind
by: Li, Qingmei, et al.
Published: (2025)
by: Li, Qingmei, et al.
Published: (2025)
Can Large Vision-Language Models Detect Images Copyright Infringement from GenAI?
by: Xu, Qipan, et al.
Published: (2025)
by: Xu, Qipan, et al.
Published: (2025)
Understanding Counting Mechanisms in Large Language and Vision-Language Models
by: Hasani, Hosein, et al.
Published: (2025)
by: Hasani, Hosein, et al.
Published: (2025)
Adaptive Discrete Disparity Volume for Self-supervised Monocular Depth Estimation
by: Ren, Jianwei
Published: (2024)
by: Ren, Jianwei
Published: (2024)
Text as Images: Can Multimodal Large Language Models Follow Printed Instructions in Pixels?
by: Li, Xiujun, et al.
Published: (2023)
by: Li, Xiujun, et al.
Published: (2023)
A Critical Synthesis of Uncertainty Quantification and Foundation Models in Monocular Depth Estimation
by: Landgraf, Steven, et al.
Published: (2025)
by: Landgraf, Steven, et al.
Published: (2025)
Review of Hallucination Understanding in Large Language and Vision Models
by: Ho, Zhengyi, et al.
Published: (2025)
by: Ho, Zhengyi, et al.
Published: (2025)
On Large Visual Language Models for Medical Imaging Analysis: An Empirical Study
by: Van, Minh-Hao, et al.
Published: (2024)
by: Van, Minh-Hao, et al.
Published: (2024)
S3MOT: Monocular 3D Object Tracking with Selective State Space Model
by: Yan, Zhuohao, et al.
Published: (2025)
by: Yan, Zhuohao, et al.
Published: (2025)
FreDSNet: Joint Monocular Depth and Semantic Segmentation with Fast Fourier Convolutions
by: Berenguel-Baeta, Bruno, et al.
Published: (2022)
by: Berenguel-Baeta, Bruno, et al.
Published: (2022)
In-Depth and In-Breadth: Pre-training Multimodal Language Models Customized for Comprehensive Chart Understanding
by: Fan, Wan-Cyuan, et al.
Published: (2025)
by: Fan, Wan-Cyuan, et al.
Published: (2025)
MOSABench: Multi-Object Sentiment Analysis Benchmark for Evaluating Multimodal Large Language Models Understanding of Complex Image
by: Song, Shezheng, et al.
Published: (2024)
by: Song, Shezheng, et al.
Published: (2024)
EndoGMDE: Generalizable Monocular Depth Estimation with Mixture of Low-Rank Experts for Diverse Endoscopic Scenes
by: Shao, Liangjing, et al.
Published: (2025)
by: Shao, Liangjing, et al.
Published: (2025)
No Pose Estimation? No Problem: Pose-Agnostic and Instance-Aware Test-Time Adaptation for Monocular Depth Estimation
by: Sung, Mingyu, et al.
Published: (2025)
by: Sung, Mingyu, et al.
Published: (2025)
$D^3$-RSMDE: 40$\times$ Faster and High-Fidelity Remote Sensing Monocular Depth Estimation
by: Wang, Ruizhi, et al.
Published: (2026)
by: Wang, Ruizhi, et al.
Published: (2026)
Similar Items
-
Language-Based Depth Hints for Monocular Depth Estimation
by: Auty, Dylan, et al.
Published: (2024) -
Advancing Depth Anything Model for Unsupervised Monocular Depth Estimation in Endoscopy
by: Li, Bojian, et al.
Published: (2024) -
Focusable Monocular Depth Estimation
by: Du, Yuxin, et al.
Published: (2026) -
BRIDGE -- Building Reinforcement-Learning Depth-to-Image Data Generation Engine for Monocular Depth Estimation
by: Liu, Dingning, et al.
Published: (2025) -
Shedding Light on Depth: Explainability Assessment in Monocular Depth Estimation
by: Cirillo, Lorenzo, et al.
Published: (2025)