Evaluation of a novel deep learning-based classifier for perifissural nodules

Daiwei Han; Marjolein Heuvelmans; Mieneke Rook; Monique Dorrius; Luutsen van Houten; Noah Waterfield Price; Lyndsey C Pickup; Petr Novotny; Matthijs Oudkerk; Jerome Declerck; Fergus Gleeson; Peter van Ooijen; Rozemarijn Vliegenthart

doi:10.1007/s00330-020-07509-x

Evaluation of a novel deep learning-based classifier for perifissural nodules

Eur Radiol. 2021 Jun;31(6):4023-4030. doi: 10.1007/s00330-020-07509-x. Epub 2020 Dec 2.

Authors

Affiliations

¹ University Medical Center Groningen, Department of Radiology, University of Groningen, Groningen, The Netherlands.
² University Medical Center Groningen, Department of Epidemiology, University of Groningen, Groningen, The Netherlands. m.a.heuvelmans@umcg.nl.
³ Department of Pulmonology, Medisch Spectrum Twente, Enschede, The Netherlands. m.a.heuvelmans@umcg.nl.
⁴ Department of Radiology, Martini Ziekenhuis, Groningen, The Netherlands.
⁵ Optellum Ltd, Oxford, UK.
⁶ Faculty of Medical Sciences, University of Groningen, Groningen, the Netherlands.
⁷ Institute for Diagnostic Accuracy, Groningen, The Netherlands.
⁸ National Consortium of Intelligent Medical Imaging, Oxford University, Oxford, Great Britain, UK.
⁹ University Medical Center Groningen, Department of Radiotherapy, University of Groningen, Groningen, The Netherlands.

Abstract

Objectives: To evaluate the performance of a novel convolutional neural network (CNN) for the classification of typical perifissural nodules (PFN).

Methods: Chest CT data from two centers in the UK and The Netherlands (1668 unique nodules, 1260 individuals) were collected. Pulmonary nodules were classified into subtypes, including "typical PFNs" on-site, and were reviewed by a central clinician. The dataset was divided into a training/cross-validation set of 1557 nodules (1103 individuals) and a test set of 196 nodules (158 individuals). For the test set, three radiologically trained readers classified the nodules into three nodule categories: typical PFN, atypical PFN, and non-PFN. The consensus of the three readers was used as reference to evaluate the performance of the PFN-CNN. Typical PFNs were considered as positive results, and atypical PFNs and non-PFNs were grouped as negative results. PFN-CNN performance was evaluated using the ROC curve, confusion matrix, and Cohen's kappa.

Results: Internal validation yielded a mean AUC of 91.9% (95% CI 90.6-92.9) with 78.7% sensitivity and 90.4% specificity. For the test set, the reader consensus rated 45/196 (23%) of nodules as typical PFN. The classifier-reader agreement (k = 0.62-0.75) was similar to the inter-reader agreement (k = 0.64-0.79). Area under the ROC curve was 95.8% (95% CI 93.3-98.4), with a sensitivity of 95.6% (95% CI 84.9-99.5), and specificity of 88.1% (95% CI 81.8-92.8).

Conclusion: The PFN-CNN showed excellent performance in classifying typical PFNs. Its agreement with radiologically trained readers is within the range of inter-reader agreement. Thus, the CNN-based system has potential in clinical and screening settings to rule out perifissural nodules and increase reader efficiency.

Key points: • Agreement between the PFN-CNN and radiologically trained readers is within the range of inter-reader agreement. • The CNN model for the classification of typical PFNs achieved an AUC of 95.8% (95% CI 93.3-98.4) with 95.6% (95% CI 84.9-99.5) sensitivity and 88.1% (95% CI 81.8-92.8) specificity compared to the consensus of three readers.

Keywords: Deep learning; Solitary pulmonary nodule; Tomography, X-ray computed.

MeSH terms

Deep Learning*
Humans
Lung Neoplasms*
Multiple Pulmonary Nodules*
Netherlands
Solitary Pulmonary Nodule* / diagnostic imaging

Grants and funding

17189/EIT Health