Saved in:
| Main Authors: | , |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2501.01993 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866911357138370560 |
|---|---|
| author | Du, Alexander Liu, Xiujin |
| author_facet | Du, Alexander Liu, Xiujin |
| contents | This paper proposes PoseLecTr, a graph-based encoder-decoder framework that integrates a novel Legendre convolution with attention mechanisms for six-degree-of-freedom (6-DOF) object pose estimation from monocular RGB images. Conventional learning-based approaches predominantly rely on grid-structured convolutions, which can limit their ability to model higher-order and long-range dependencies among image features, especially in cluttered or occluded scenes. PoseLecTr addresses this limitation by constructing a graph representation from image features, where spatial relationships are explicitly modeled through graph connectivity. The proposed framework incorporates a Legendre convolution layer to improve numerical stability in graph convolution, together with spatial-attention and self-attention distillation to enhance feature selection. Experiments conducted on the LINEMOD, Occluded LINEMOD, and YCB-VIDEO datasets demonstrate that our method achieves competitive performance and shows consistent improvements across a wide range of objects and scene complexities. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2501_01993 |
| institution | arXiv |
| publishDate | 2024 |
| record_format | arxiv |
| spellingShingle | A Novel Convolution and Attention Mechanism-based Model for 6D Object Pose Estimation Du, Alexander Liu, Xiujin Computer Vision and Pattern Recognition Machine Learning This paper proposes PoseLecTr, a graph-based encoder-decoder framework that integrates a novel Legendre convolution with attention mechanisms for six-degree-of-freedom (6-DOF) object pose estimation from monocular RGB images. Conventional learning-based approaches predominantly rely on grid-structured convolutions, which can limit their ability to model higher-order and long-range dependencies among image features, especially in cluttered or occluded scenes. PoseLecTr addresses this limitation by constructing a graph representation from image features, where spatial relationships are explicitly modeled through graph connectivity. The proposed framework incorporates a Legendre convolution layer to improve numerical stability in graph convolution, together with spatial-attention and self-attention distillation to enhance feature selection. Experiments conducted on the LINEMOD, Occluded LINEMOD, and YCB-VIDEO datasets demonstrate that our method achieves competitive performance and shows consistent improvements across a wide range of objects and scene complexities. |
| title | A Novel Convolution and Attention Mechanism-based Model for 6D Object Pose Estimation |
| topic | Computer Vision and Pattern Recognition Machine Learning |
| url | https://arxiv.org/abs/2501.01993 |