Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Singh, Abhineet, Rozeboom, Justin, Ray, Nilanjan
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2602.21627
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911593427632128
author	Singh, Abhineet Rozeboom, Justin Ray, Nilanjan
author_facet	Singh, Abhineet Rozeboom, Justin Ray, Nilanjan
contents	This paper presents a new unified approach to semantic segmentation in both images and videos by using language modeling to output the masks as sequences of discrete tokens. We use run length encoding (RLE) to discretize the segmentation masks, and adapt the Pix2Seq framework to learn autoregressive models to output these tokens. We propose novel tokenization strategies to compress the lengths of the token sequences to make it practicable to extend this approach to videos. We also show how instance information can be incorporated into the tokenization process to perform panoptic segmentation. We evaluate our models on two domain-specific datasets to demonstrate their competitiveness with the state of the art in certain scenarios, in spite of being severely bottlenecked by our limited computational resources. We supplement these analyses by proposing several promising approaches to foster future competitiveness in general-purpose applications, and facilitate this by making our code and models publicly available.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_21627
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Tokenizing Semantic Segmentation with Run Length Encoding Singh, Abhineet Rozeboom, Justin Ray, Nilanjan Computer Vision and Pattern Recognition This paper presents a new unified approach to semantic segmentation in both images and videos by using language modeling to output the masks as sequences of discrete tokens. We use run length encoding (RLE) to discretize the segmentation masks, and adapt the Pix2Seq framework to learn autoregressive models to output these tokens. We propose novel tokenization strategies to compress the lengths of the token sequences to make it practicable to extend this approach to videos. We also show how instance information can be incorporated into the tokenization process to perform panoptic segmentation. We evaluate our models on two domain-specific datasets to demonstrate their competitiveness with the state of the art in certain scenarios, in spite of being severely bottlenecked by our limited computational resources. We supplement these analyses by proposing several promising approaches to foster future competitiveness in general-purpose applications, and facilitate this by making our code and models publicly available.
title	Tokenizing Semantic Segmentation with Run Length Encoding
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2602.21627

Similar Items