Saved in:
Bibliographic Details
Main Authors: Shui, Zhongyi, Li, Honglin, Ji, Xiaozhong, Zhang, Ye, Yang, Zijiang, Zhu, Chenglu, Sun, Yuxuan, Yao, Kai, He, Conghui, Tan, Cheng
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2603.07098
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910044962947072
author Shui, Zhongyi
Li, Honglin
Ji, Xiaozhong
Zhang, Ye
Yang, Zijiang
Zhu, Chenglu
Sun, Yuxuan
Yao, Kai
He, Conghui
Tan, Cheng
author_facet Shui, Zhongyi
Li, Honglin
Ji, Xiaozhong
Zhang, Ye
Yang, Zijiang
Zhu, Chenglu
Sun, Yuxuan
Yao, Kai
He, Conghui
Tan, Cheng
contents Nucleus detection in histopathology is pivotal for a wide range of clinical applications. Existing approaches either regress nuclear proxy maps that require complex post-processing, or employ dense anchors or queries that introduce severe foreground-background imbalance. In this work, we reformulate nucleus detection as next-point prediction, wherein a multimodal large language model is developed to directly output foreground nucleus centroids from the input image. The model is trained in two stages. In the supervised learning stage, we propose spatial-aware soft supervision to relax strict centroid matching and a chain-of-visual-thought strategy to incorporate visual priors that facilitate coordinate prediction. In the reinforcement fine-tuning stage, we design distribution matching reward, low-variance group filtering, and fine-grained advantage shaping to further improve the model's detection quality. Extensive experiments on nine widely used benchmarks demonstrate the superiority of our method. Code will be released soon.
format Preprint
id arxiv_https___arxiv_org_abs_2603_07098
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle NuNext: Reframing Nucleus Detection as Next-Point Detection
Shui, Zhongyi
Li, Honglin
Ji, Xiaozhong
Zhang, Ye
Yang, Zijiang
Zhu, Chenglu
Sun, Yuxuan
Yao, Kai
He, Conghui
Tan, Cheng
Computer Vision and Pattern Recognition
Nucleus detection in histopathology is pivotal for a wide range of clinical applications. Existing approaches either regress nuclear proxy maps that require complex post-processing, or employ dense anchors or queries that introduce severe foreground-background imbalance. In this work, we reformulate nucleus detection as next-point prediction, wherein a multimodal large language model is developed to directly output foreground nucleus centroids from the input image. The model is trained in two stages. In the supervised learning stage, we propose spatial-aware soft supervision to relax strict centroid matching and a chain-of-visual-thought strategy to incorporate visual priors that facilitate coordinate prediction. In the reinforcement fine-tuning stage, we design distribution matching reward, low-variance group filtering, and fine-grained advantage shaping to further improve the model's detection quality. Extensive experiments on nine widely used benchmarks demonstrate the superiority of our method. Code will be released soon.
title NuNext: Reframing Nucleus Detection as Next-Point Detection
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2603.07098