Saved in:
| Main Author: | |
|---|---|
| Format: | Preprint |
| Published: |
2019
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/1910.11030 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866912633686327296 |
|---|---|
| author | Nguyen, Tu |
| author_facet | Nguyen, Tu |
| contents | This extended abstract describes our solution for the Traffic4Cast Challenge 2019. The task requires modeling both fine-grained (pixel-level) and coarse (region-level) spatial structure while preserving temporal relationships across long sequences. Building on Conv-LSTM ideas, we introduce a tile-aware, cascaded-memory Conv-LSTM augmented with cross-frame additive attention and a memory-flexible training scheme: frames are sampled per spatial tile so the model learns tile-local dynamics and per-tile memory cells can be updated sparsely, paged, or compressed to scale to large maps. We provide a compact theoretical analysis (tight softmax/attention Lipschitz bound and a tiling error lower bound) explaining stability and the memory-accuracy tradeoffs, and empirically demonstrate improved scalability and competitive forecasting performance on large-scale traffic heatmaps. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_1910_11030 |
| institution | arXiv |
| publishDate | 2019 |
| record_format | arxiv |
| spellingShingle | Spatiotemporal Tile-based Attention-guided LSTMs for Traffic Video Prediction Nguyen, Tu Computer Vision and Pattern Recognition Machine Learning Image and Video Processing This extended abstract describes our solution for the Traffic4Cast Challenge 2019. The task requires modeling both fine-grained (pixel-level) and coarse (region-level) spatial structure while preserving temporal relationships across long sequences. Building on Conv-LSTM ideas, we introduce a tile-aware, cascaded-memory Conv-LSTM augmented with cross-frame additive attention and a memory-flexible training scheme: frames are sampled per spatial tile so the model learns tile-local dynamics and per-tile memory cells can be updated sparsely, paged, or compressed to scale to large maps. We provide a compact theoretical analysis (tight softmax/attention Lipschitz bound and a tiling error lower bound) explaining stability and the memory-accuracy tradeoffs, and empirically demonstrate improved scalability and competitive forecasting performance on large-scale traffic heatmaps. |
| title | Spatiotemporal Tile-based Attention-guided LSTMs for Traffic Video Prediction |
| topic | Computer Vision and Pattern Recognition Machine Learning Image and Video Processing |
| url | https://arxiv.org/abs/1910.11030 |