Saved in:
Bibliographic Details
Main Authors: Tiwary, Piyush, Ahuja, Utkarsh, Sani, Depanshu, Jayagopal, Aishwarya, Gubbi, Sagar, Venugopalan, Subhashini, Talekar, Alok, Rajan, Vaibhav
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2605.16179
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • Agricultural landscape segmentation in the Global South is challenging as it is characterized by fragmented plots, high intra-class variance, and a scarcity of labeled training data. Recent advances in segmentation have been made by Multimodal Large Language Models (MLLMs). However, current approaches encounter critical context length bottlenecks and a domain alignment gap in understanding satellite features. We address these limitations through MAgSeg, a novel, decoder-free MLLM segmentation approach. MAgSeg is an architecturally efficient approach that enables standard MLLMs to perform segmentation of complex smallholder agricultural landscapes from high-resolution satellite imagery, without requiring auxiliary vision decoders. We introduce a novel instruction tuning data format designed to enable scalable fine-tuning and post-training on high resolution satellite imagery, which enables MAgSeg to learn from the global context of the image while generating text tokens for only a patch within the image. Extensive evaluations on datasets spanning three countries in the Global South demonstrate that MAgSeg significantly outperforms state-of-the-art MLLM baselines, offering a scalable solution to map smallholder agricultural environments.