Abstract
Artificial intelligence (AI) tools may assist breast screening mammography programmes, but limited evidence supports their generalisability to new settings. This retrospective study used a three-year dataset (1/04/2016-31/03/2019) from a UK regional screening programme. The performance of a commercially available breast screening AI algorithm was assessed with a
pre-specified and a site-specific decision threshold to evaluate whether its performance was transferable to a new clinical site. The dataset consisted of women who attended routine screening (50-70 years), excluding technical recalls, self-referrals, and those with a previous mastectomy, complex physical requirements or without the four standard image views. In total, 55,916 screening attendees (mean age, 60 ± 6 [SD] years) met the inclusion criteria.
The pre-specified threshold resulted in high recall rates (48.3%; 21,929/45,444), which reduced to 13.0% (5,896/45,444) following threshold calibration, closer to the observed service level (5.0%; 2,774/55,916). Recall rates also increased approximately three-fold following a software upgrade on the mammography equipment, requiring per-software version thresholds. Using software-specific thresholds, the AI algorithm would have recalled 277/303 (91.4%) screen-detected cancers and 47/138 (34.1%) interval cancers. AI performance and thresholds should be validated for new clinical settings before deployment,
while quality assurance systems should monitor AI performance for consistency.
pre-specified and a site-specific decision threshold to evaluate whether its performance was transferable to a new clinical site. The dataset consisted of women who attended routine screening (50-70 years), excluding technical recalls, self-referrals, and those with a previous mastectomy, complex physical requirements or without the four standard image views. In total, 55,916 screening attendees (mean age, 60 ± 6 [SD] years) met the inclusion criteria.
The pre-specified threshold resulted in high recall rates (48.3%; 21,929/45,444), which reduced to 13.0% (5,896/45,444) following threshold calibration, closer to the observed service level (5.0%; 2,774/55,916). Recall rates also increased approximately three-fold following a software upgrade on the mammography equipment, requiring per-software version thresholds. Using software-specific thresholds, the AI algorithm would have recalled 277/303 (91.4%) screen-detected cancers and 47/138 (34.1%) interval cancers. AI performance and thresholds should be validated for new clinical settings before deployment,
while quality assurance systems should monitor AI performance for consistency.
Original language | English |
---|---|
Article number | e220146 |
Journal | Radiology: Artificial Intelligence |
Volume | 5 |
Issue number | 3 |
Early online date | 22 Mar 2023 |
DOIs | |
Publication status | E-pub ahead of print - 22 Mar 2023 |