Saved in:
Bibliographic Details
Main Author: Nguyen, Tu
Format: Preprint
Published: 2019
Subjects:
Online Access:https://arxiv.org/abs/1910.11030
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912633686327296
author Nguyen, Tu
author_facet Nguyen, Tu
contents This extended abstract describes our solution for the Traffic4Cast Challenge 2019. The task requires modeling both fine-grained (pixel-level) and coarse (region-level) spatial structure while preserving temporal relationships across long sequences. Building on Conv-LSTM ideas, we introduce a tile-aware, cascaded-memory Conv-LSTM augmented with cross-frame additive attention and a memory-flexible training scheme: frames are sampled per spatial tile so the model learns tile-local dynamics and per-tile memory cells can be updated sparsely, paged, or compressed to scale to large maps. We provide a compact theoretical analysis (tight softmax/attention Lipschitz bound and a tiling error lower bound) explaining stability and the memory-accuracy tradeoffs, and empirically demonstrate improved scalability and competitive forecasting performance on large-scale traffic heatmaps.
format Preprint
id arxiv_https___arxiv_org_abs_1910_11030
institution arXiv
publishDate 2019
record_format arxiv
spellingShingle Spatiotemporal Tile-based Attention-guided LSTMs for Traffic Video Prediction
Nguyen, Tu
Computer Vision and Pattern Recognition
Machine Learning
Image and Video Processing
This extended abstract describes our solution for the Traffic4Cast Challenge 2019. The task requires modeling both fine-grained (pixel-level) and coarse (region-level) spatial structure while preserving temporal relationships across long sequences. Building on Conv-LSTM ideas, we introduce a tile-aware, cascaded-memory Conv-LSTM augmented with cross-frame additive attention and a memory-flexible training scheme: frames are sampled per spatial tile so the model learns tile-local dynamics and per-tile memory cells can be updated sparsely, paged, or compressed to scale to large maps. We provide a compact theoretical analysis (tight softmax/attention Lipschitz bound and a tiling error lower bound) explaining stability and the memory-accuracy tradeoffs, and empirically demonstrate improved scalability and competitive forecasting performance on large-scale traffic heatmaps.
title Spatiotemporal Tile-based Attention-guided LSTMs for Traffic Video Prediction
topic Computer Vision and Pattern Recognition
Machine Learning
Image and Video Processing
url https://arxiv.org/abs/1910.11030