Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Author:	Nguyen, Tu
Format:	Preprint
Published:	2019
Subjects:	Computer Vision and Pattern Recognition Machine Learning Image and Video Processing
Online Access:	https://arxiv.org/abs/1910.11030
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912633686327296
author	Nguyen, Tu
author_facet	Nguyen, Tu
contents	This extended abstract describes our solution for the Traffic4Cast Challenge 2019. The task requires modeling both fine-grained (pixel-level) and coarse (region-level) spatial structure while preserving temporal relationships across long sequences. Building on Conv-LSTM ideas, we introduce a tile-aware, cascaded-memory Conv-LSTM augmented with cross-frame additive attention and a memory-flexible training scheme: frames are sampled per spatial tile so the model learns tile-local dynamics and per-tile memory cells can be updated sparsely, paged, or compressed to scale to large maps. We provide a compact theoretical analysis (tight softmax/attention Lipschitz bound and a tiling error lower bound) explaining stability and the memory-accuracy tradeoffs, and empirically demonstrate improved scalability and competitive forecasting performance on large-scale traffic heatmaps.
format	Preprint
id	arxiv_https___arxiv_org_abs_1910_11030
institution	arXiv
publishDate	2019
record_format	arxiv
spellingShingle	Spatiotemporal Tile-based Attention-guided LSTMs for Traffic Video Prediction Nguyen, Tu Computer Vision and Pattern Recognition Machine Learning Image and Video Processing This extended abstract describes our solution for the Traffic4Cast Challenge 2019. The task requires modeling both fine-grained (pixel-level) and coarse (region-level) spatial structure while preserving temporal relationships across long sequences. Building on Conv-LSTM ideas, we introduce a tile-aware, cascaded-memory Conv-LSTM augmented with cross-frame additive attention and a memory-flexible training scheme: frames are sampled per spatial tile so the model learns tile-local dynamics and per-tile memory cells can be updated sparsely, paged, or compressed to scale to large maps. We provide a compact theoretical analysis (tight softmax/attention Lipschitz bound and a tiling error lower bound) explaining stability and the memory-accuracy tradeoffs, and empirically demonstrate improved scalability and competitive forecasting performance on large-scale traffic heatmaps.
title	Spatiotemporal Tile-based Attention-guided LSTMs for Traffic Video Prediction
topic	Computer Vision and Pattern Recognition Machine Learning Image and Video Processing
url	https://arxiv.org/abs/1910.11030

Similar Items