All Scholarly Works

NuCLS: A scalable crowdsourcing approach and dataset for nucleus classification and segmentation in breast cancer

Ahmed Alhusseiny MD, Baystate HealthFollow

Author Department

Pathology

Document Type

Article, Peer-reviewed

Publication Date

5-2022

Abstract

Background: Deep learning enables accurate high-resolution mapping of cells and tissue structures that can serve as the foundation of interpretable machine-learning models for computational pathology. However, generating adequate labels for these structures is a critical barrier, given the time and effort required from pathologists.

Results: This article describes a novel collaborative framework for engaging crowds of medical students and pathologists to produce quality labels for cell nuclei. We used this approach to produce the NuCLS dataset, containing >220,000 annotations of cell nuclei in breast cancers. This builds on prior work labeling tissue regions to produce an integrated tissue region- and cell-level annotation dataset for training that is the largest such resource for multi-scale analysis of breast cancer histology. This article presents data and analysis results for single and multi-rater annotations from both non-experts and pathologists. We present a novel workflow that uses algorithmic suggestions to collect accurate segmentation data without the need for laborious manual tracing of nuclei. Our results indicate that even noisy algorithmic suggestions do not adversely affect pathologist accuracy and can help non-experts improve annotation quality. We also present a new approach for inferring truth from multiple raters and show that non-experts can produce accurate annotations for visually distinctive classes.

Conclusions: This study is the most extensive systematic exploration of the large-scale use of wisdom-of-the-crowd approaches to generate data for computational pathology applications.

Keywords: breast cancer; crowdsourcing; deep learning; nucleus classification; nucleus segmentation.

Recommended Citation

Amgad M, Atteya LA, Hussein H, Mohammed KH, Hafiz E, Elsebaie MAT, Alhusseiny AM, AlMoslemany MA, Elmatboly AM, Pappalardo PA, Sakr RA, Mobadersany P, Rachid A, Saad AM, Alkashash AM, Ruhban IA, Alrefai A, Elgazar NM, Abdulkarim A, Farag AA, Etman A, Elsaeed AG, Alagha Y, Amer YA, Raslan AM, Nadim MK, Elsebaie MAT, Ayad A, Hanna LE, Gadallah A, Elkady M, Drumheller B, Jaye D, Manthey D, Gutman DA, Elfandy H, Cooper LAD. NuCLS: A scalable crowdsourcing approach and dataset for nucleus classification and segmentation in breast cancer. Gigascience. 2022 May 17;11:giac037. doi: 10.1093/gigascience/giac037.

PMID

35579553

Link to Full Text

COinS

All Scholarly Works

NuCLS: A scalable crowdsourcing approach and dataset for nucleus classification and segmentation in breast cancer

Author Department

Document Type

Publication Date

Abstract

Recommended Citation

PMID

Search

Browse

Author Corner

Links

All Scholarly Works

NuCLS: A scalable crowdsourcing approach and dataset for nucleus classification and segmentation in breast cancer

Authors

Author Department

Document Type

Publication Date

Abstract

Recommended Citation

PMID

Share

Search

Browse

Author Corner

Links