Vista Equipo: :: Library Catalog

Guardado en:

Detalles Bibliográficos
Autores principales:	Xia, Linhan, Liu, Junbang, Wu, Tong
Formato:	Preprint
Publicado:	2024
Materias:	Computer Vision and Pattern Recognition
Acceso en línea:	https://arxiv.org/abs/2403.01370
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

_version_	1866917701178359808
author	Xia, Linhan Liu, Junbang Wu, Tong
author_facet	Xia, Linhan Liu, Junbang Wu, Tong
contents	This research presents a novel depth estimation algorithm based on a Transformer-encoder architecture, tailored for the NYU and KITTI Depth Dataset. This research adopts a transformer model, initially renowned for its success in natural language processing, to capture intricate spatial relationships in visual data for depth estimation tasks. A significant innovation of the research is the integration of a composite loss function that combines Structural Similarity Index Measure (SSIM) with Mean Squared Error (MSE). This combined loss function is designed to ensure the structural integrity of the predicted depth maps relative to the original images (via SSIM) while minimizing pixel-wise estimation errors (via MSE). This research approach addresses the challenges of over-smoothing often seen in MSE-based losses and enhances the model's ability to predict depth maps that are not only accurate but also maintain structural coherence with the input images. Through rigorous training and evaluation using the NYU Depth Dataset, the model demonstrates superior performance, marking a significant advancement in single-image depth estimation, particularly in complex indoor and traffic environments.
format	Preprint
id	arxiv_https___arxiv_org_abs_2403_01370
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Depth Estimation Algorithm Based on Transformer-Encoder and Feature Fusion Xia, Linhan Liu, Junbang Wu, Tong Computer Vision and Pattern Recognition This research presents a novel depth estimation algorithm based on a Transformer-encoder architecture, tailored for the NYU and KITTI Depth Dataset. This research adopts a transformer model, initially renowned for its success in natural language processing, to capture intricate spatial relationships in visual data for depth estimation tasks. A significant innovation of the research is the integration of a composite loss function that combines Structural Similarity Index Measure (SSIM) with Mean Squared Error (MSE). This combined loss function is designed to ensure the structural integrity of the predicted depth maps relative to the original images (via SSIM) while minimizing pixel-wise estimation errors (via MSE). This research approach addresses the challenges of over-smoothing often seen in MSE-based losses and enhances the model's ability to predict depth maps that are not only accurate but also maintain structural coherence with the input images. Through rigorous training and evaluation using the NYU Depth Dataset, the model demonstrates superior performance, marking a significant advancement in single-image depth estimation, particularly in complex indoor and traffic environments.
title	Depth Estimation Algorithm Based on Transformer-Encoder and Feature Fusion
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2403.01370

Ejemplares similares