محفوظ في:
| المؤلفون الرئيسيون: | , , , , , |
|---|---|
| التنسيق: | Preprint |
| منشور في: |
2025
|
| الموضوعات: | |
| الوصول للمادة أونلاين: | https://arxiv.org/abs/2503.19721 |
| الوسوم: |
إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
|
| _version_ | 1866909560225136640 |
|---|---|
| author | Ge, Chengjie Fu, Xueyang He, Peng Wang, Kunyu Cao, Chengzhi Zha, Zheng-Jun |
| author_facet | Ge, Chengjie Fu, Xueyang He, Peng Wang, Kunyu Cao, Chengzhi Zha, Zheng-Jun |
| contents | Leveraging its robust linear global modeling capability, Mamba has notably excelled in computer vision. Despite its success, existing Mamba-based vision models have overlooked the nuances of event-driven tasks, especially in video reconstruction. Event-based video reconstruction (EBVR) demands spatial translation invariance and close attention to local event relationships in the spatio-temporal domain. Unfortunately, conventional Mamba algorithms apply static window partitions and standard reshape scanning methods, leading to significant losses in local connectivity. To overcome these limitations, we introduce EventMamba--a specialized model designed for EBVR tasks. EventMamba innovates by incorporating random window offset (RWO) in the spatial domain, moving away from the restrictive fixed partitioning. Additionally, it features a new consistent traversal serialization approach in the spatio-temporal domain, which maintains the proximity of adjacent events both spatially and temporally. These enhancements enable EventMamba to retain Mamba's robust modeling capabilities while significantly preserving the spatio-temporal locality of event data. Comprehensive testing on multiple datasets shows that EventMamba markedly enhances video reconstruction, drastically improving computation speed while delivering superior visual quality compared to Transformer-based methods. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2503_19721 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | EventMamba: Enhancing Spatio-Temporal Locality with State Space Models for Event-Based Video Reconstruction Ge, Chengjie Fu, Xueyang He, Peng Wang, Kunyu Cao, Chengzhi Zha, Zheng-Jun Computer Vision and Pattern Recognition Leveraging its robust linear global modeling capability, Mamba has notably excelled in computer vision. Despite its success, existing Mamba-based vision models have overlooked the nuances of event-driven tasks, especially in video reconstruction. Event-based video reconstruction (EBVR) demands spatial translation invariance and close attention to local event relationships in the spatio-temporal domain. Unfortunately, conventional Mamba algorithms apply static window partitions and standard reshape scanning methods, leading to significant losses in local connectivity. To overcome these limitations, we introduce EventMamba--a specialized model designed for EBVR tasks. EventMamba innovates by incorporating random window offset (RWO) in the spatial domain, moving away from the restrictive fixed partitioning. Additionally, it features a new consistent traversal serialization approach in the spatio-temporal domain, which maintains the proximity of adjacent events both spatially and temporally. These enhancements enable EventMamba to retain Mamba's robust modeling capabilities while significantly preserving the spatio-temporal locality of event data. Comprehensive testing on multiple datasets shows that EventMamba markedly enhances video reconstruction, drastically improving computation speed while delivering superior visual quality compared to Transformer-based methods. |
| title | EventMamba: Enhancing Spatio-Temporal Locality with State Space Models for Event-Based Video Reconstruction |
| topic | Computer Vision and Pattern Recognition |
| url | https://arxiv.org/abs/2503.19721 |