Show simple item record

AAPM task group report 273: Recommendations on best practices for AI and machine learning for computer-aided diagnosis in medical imaging

dc.contributor.authorHadjiiski, Lubomir
dc.contributor.authorCha, Kenny
dc.contributor.authorChan, Heang-Ping
dc.contributor.authorDrukker, Karen
dc.contributor.authorMorra, Lia
dc.contributor.authorNäppi, Janne J.
dc.contributor.authorSahiner, Berkman
dc.contributor.authorYoshida, Hiroyuki
dc.contributor.authorChen, Quan
dc.contributor.authorDeserno, Thomas M.
dc.contributor.authorGreenspan, Hayit
dc.contributor.authorHuisman, Henkjan
dc.contributor.authorHuo, Zhimin
dc.contributor.authorMazurchuk, Richard
dc.contributor.authorPetrick, Nicholas
dc.contributor.authorRegge, Daniele
dc.contributor.authorSamala, Ravi
dc.contributor.authorSummers, Ronald M.
dc.contributor.authorSuzuki, Kenji
dc.contributor.authorTourassi, Georgia
dc.contributor.authorVergara, Daniel
dc.contributor.authorArmato, Samuel G.
dc.date.accessioned2023-03-03T21:09:23Z
dc.date.available2024-03-03 16:09:21en
dc.date.available2023-03-03T21:09:23Z
dc.date.issued2023-02
dc.identifier.citationHadjiiski, Lubomir; Cha, Kenny; Chan, Heang-Ping ; Drukker, Karen; Morra, Lia; Näppi, Janne J. ; Sahiner, Berkman; Yoshida, Hiroyuki; Chen, Quan; Deserno, Thomas M.; Greenspan, Hayit; Huisman, Henkjan; Huo, Zhimin; Mazurchuk, Richard; Petrick, Nicholas; Regge, Daniele; Samala, Ravi; Summers, Ronald M.; Suzuki, Kenji; Tourassi, Georgia; Vergara, Daniel; Armato, Samuel G. (2023). "AAPM task group report 273: Recommendations on best practices for AI and machine learning for computer- aided diagnosis in medical imaging." Medical Physics 50(2): e1-e24.
dc.identifier.issn0094-2405
dc.identifier.issn2473-4209
dc.identifier.urihttps://hdl.handle.net/2027.42/175903
dc.description.abstractRapid advances in artificial intelligence (AI) and machine learning, and specifically in deep learning (DL) techniques, have enabled broad application of these methods in health care. The promise of the DL approach has spurred further interest in computer-aided diagnosis (CAD) development and applications using both “traditional” machine learning methods and newer DL-based methods. We use the term CAD-AI to refer to this expanded clinical decision support environment that uses traditional and DL-based AI methods.Numerous studies have been published to date on the development of machine learning tools for computer-aided, or AI-assisted, clinical tasks. However, most of these machine learning models are not ready for clinical deployment. It is of paramount importance to ensure that a clinical decision support tool undergoes proper training and rigorous validation of its generalizability and robustness before adoption for patient care in the clinic.To address these important issues, the American Association of Physicists in Medicine (AAPM) Computer-Aided Image Analysis Subcommittee (CADSC) is charged, in part, to develop recommendations on practices and standards for the development and performance assessment of computer-aided decision support systems. The committee has previously published two opinion papers on the evaluation of CAD systems and issues associated with user training and quality assurance of these systems in the clinic. With machine learning techniques continuing to evolve and CAD applications expanding to new stages of the patient care process, the current task group report considers the broader issues common to the development of most, if not all, CAD-AI applications and their translation from the bench to the clinic. The goal is to bring attention to the proper training and validation of machine learning algorithms that may improve their generalizability and reliability and accelerate the adoption of CAD-AI systems for clinical decision support.
dc.publisherUniversity of Warwick
dc.publisherWiley Periodicals, Inc.
dc.subject.othermachine learning
dc.subject.otherAI
dc.subject.otherbest practices
dc.subject.otherCAD
dc.subject.otherdecision support systems
dc.subject.otherimage analysis
dc.subject.othermedical Imaging
dc.subject.othermodel development
dc.subject.otherreference standards
dc.titleAAPM task group report 273: Recommendations on best practices for AI and machine learning for computer-aided diagnosis in medical imaging
dc.typeArticle
dc.rights.robotsIndexNoFollow
dc.subject.hlbsecondlevelMedicine (General)
dc.subject.hlbtoplevelHealth Sciences
dc.description.peerreviewedPeer Reviewed
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/175903/1/mp16188_am.pdf
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/175903/2/mp16188.pdf
dc.identifier.doi10.1002/mp.16188
dc.identifier.sourceMedical Physics
dc.identifier.citedreferenceHarrell FE, Lee KL, Mark DB. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med. 1996; 15: 361 - 387.
dc.identifier.citedreferenceRodriguez-Ruiz A, Lang K, Gubern-Merida A, et al. Stand-Alone Artificial Intelligence for Breast Cancer Detection in Mammography: comparison With 101 Radiologists. Jnci-J Natl Cancer Inst. 2019; 111: 916 - 922.
dc.identifier.citedreferenceChan HP, Doi K, Vyborny CJ, et al. Improvement In Radiologists Detection Of Clustered Microcalcifications On Mammograms - The Potential Of Computer-Aided Diagnosis. Invest Radiol. 1990; 25: 1102 - 1110.
dc.identifier.citedreferenceHadjiiski LM, Chan H-P, Sahiner B, et al. Breast Masses: computer-aided Diagnosis with Serial Mammograms. Radiology. 2006; 240: 343 - 356.
dc.identifier.citedreferenceBeiden SV, Wagner RF, Doi K, et al. Independent versus sequential reading in ROC studies of computer-assist modalities: analysis of components of variance. Acad Radiol. 2002; 9: 1036 - 1043.
dc.identifier.citedreferenceMetz CE. Some practical issues of experimental design and data analysis in radiological ROC studies. Invest Radiol. 1989; 24: 234 - 245.
dc.identifier.citedreferenceU.S. Food and Drug Adminstration. Guidance for industry and FDA staff: computer-assisted detection devices applied to radiology images and radiology device data – premarket notification [510(k)] submissions. 2012 Nov. 21, 2017 ];. http://www.fda.gov/downloads/MedicalDevices/DeviceRegulationandGuidance/GuidanceDocuments/UCM187294.pdf, (2017)
dc.identifier.citedreferenceU.S. Food and Drug Adminstration. Guidance for industry and FDA staff: clinical performance assessment: considerations for computer-assisted detection devices applied to radiology images and radiology device data - premarket approval (PMA) and premarket notification [510(k)] submissions. 2012 Nov. 21, 2017 ]; http://www.fda.gov/downloads/MedicalDevices/DeviceRegulationandGuidance/GuidanceDocuments/UCM187315.pdf (2017)
dc.identifier.citedreferenceSamuelson FW, Abbey CK. The Reproducibility of Changes in Diagnostic Figures of Merit Across Laboratory and Clinical Imaging Reader Studies. Acad Radiol. 2017; 24: 1436 - 1446.
dc.identifier.citedreferenceGallas BD, Chen W, Cole E, et al. Impact of prevalence and case distribution in lab-based diagnostic imaging studies. J Med Imaging. 2019; 6: 015501.
dc.identifier.citedreferenceWagner RF, Metz CE, Campbell G. Assessment of medical imaging systems and computer aids: a tutorial review. Acad Radiol. 2007; 14: 723 - 748.
dc.identifier.citedreferenceObuchowski NA. New methodological tools for multiple-reader ROC studies. Radiology. 2007; 243: 10 - 12.
dc.identifier.citedreferenceChan H-P, Sahiner B, Wagner RF, Petrick N. Classifier design for computer-aided diagnosis: effects of finite sample size on the mean performance of classical and neural network classifiers. Med Phys. 1999; 26: 2654 - 2668.
dc.identifier.citedreferenceSahiner B, Chan H-P, Petrick N, Wagner RF, Hadjiiski L. Feature selection and classifier performance in computer-aided diagnosis: the effect of finite sample size. Med Phys. 2000; 27: 1509 - 1522.
dc.identifier.citedreferenceBouthillier X, Laurent C, Vincent P, Unreproducible research is reproducible. in 36th International Conference on Machine Learning, ICML 2019, 2019. pp. 1150 - 1159.
dc.identifier.citedreferenceMcDermott M, Wang S, Marinsek, Ranganath N, Ghassemi R, Foschini LM, Reproducibility in machine learning for health. arXiv preprint arXiv:1907.01463, 2019.
dc.identifier.citedreferenceGoodman SN, Fanelli D, Ioannidis JPA. What does research reproducibility mean? Sci Transl Med. 2016; 8: 341ps12.
dc.identifier.citedreferenceGal Y, Ghahramani Z, Dropout as a Bayesian approximation: representing model uncertainty in deep learning. in 33rd International Conference on Machine Learning, ICML 2016, 2016, pp. 1651 - 1660.
dc.identifier.citedreferenceKendall A, Gal Y. What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?. In: Guyon I, ed. Advances in Neural Information Processing Systems 30. 2017. Editors.
dc.identifier.citedreferenceRobinson R, Valindria VV, Bai W, et al. Automated quality control in image segmentation: application to the UK Biobank cardiovascular magnetic resonance imaging study. J Cardiovasc Magn Reson. 2019; 21: 18.
dc.identifier.citedreferenceYang Y, Guo X, Pan Y, Shi P, Lv H, Ma T, Uncertainty Quantification in Medical Image Segmentation with Multi-decoder U-Net. arXiv preprint arXiv:2109.07045, 2021.
dc.identifier.citedreferenceRezaei M, Näppi J, Bischl B, Yoshida H. Bayesian uncertainty estimation for detection of long-tail and unseen conditions in abdominal images. Proc of SPIE Medical Imaging. 2022; 12033: 1203311.
dc.identifier.citedreferenceSalahuddin Z, Woodruff HC, Chatterjee A, Lambin P. Transparency of deep neural networks for medical image analysis: a review of interpretability methods. Comput Biol Med. 2022; 140: 105111.
dc.identifier.citedreferenceReyes M, Meier R, Pereira S, et al. On the Interpretability of Artificial Intelligence in Radiology: challenges and Opportunities. Radiol Artif Intell. 2020; 2: e190043 - e190043.
dc.identifier.citedreferenceSamek W, Binder A, Montavon G, Lapuschkin S, Mueller K-R. Evaluating the Visualization of What a Deep Neural Network Has Learned. IEEE Trans Neural Netw Learn Syst. 2017; 28: 2660 - 2673.
dc.identifier.citedreferenceZhou B, Khosla A, Lapedriza A, Oliva A, Torralba A, Learning Deep Features for Discriminative Localization. in IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016). 2016, p. 2921 - 2929.
dc.identifier.citedreferenceSelvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D, Grad-CAM: visual Explanations from Deep Networks via Gradient-based Localization. in 2017 Ieee International Conference on Computer Vision. 2017, p. 618 - 626.
dc.identifier.citedreferenceWang H, Wang Z, Du M, et al. Score-CAM: score-weighted visual explanations for convolutional neural networks. in IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2020. pp. 111 - 119.
dc.identifier.citedreferenceBarnett AJ, Schwartz FR, Tao C, et al. A case-based interpretable deep learning model for classification of mass lesions in digital mammography. Nat Mach Intell. 2021; 3: 1061.
dc.identifier.citedreferenceArun N, Gaw N, Singh P, et al. Assessing the Trustworthiness of Saliency Maps for Localizing Abnormalities in Medical Imaging. Radiol Artif Intell. 2021; 3: e200267 - e200267.
dc.identifier.citedreferenceRibeiro MT, Singh S, Guestrin C, Comp MA, Why Should I Trust You? Explaining the Predictions of Any Classifier. Kdd’16: Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, 2016. 1135 - 1144.
dc.identifier.citedreferenceChan HP, Samala RK, Hadjiiski LM, Zhou C. Deep Learning in Medical Image Analysis. In: Lee G, Fujita H, eds. Deep Learning in Medical Image Analysis: Challenges and Applications. 2020: 3 - 21. Editors.
dc.identifier.citedreferenceChan H-P, Hadjiiski LM, Samala RK. Computer-aided diagnosis in the era of deep learning. Med Phys. 2020; 47: e218 - e227.
dc.identifier.citedreferenceFreer TW, Ulissey MJ. Screening mammography with computer-aided detection: prospective study of 12,860 patients in a community breast center. Radiology. 2001; 220: 781 - 786.
dc.identifier.citedreferenceHelvie MA, Hadjiiski L, Makariou E, et al. Sensitivity of noncommercial computer-aided detection system for mammographic breast cancer detection: pilot clinical trial. Radiology. 2004; 231: 208 - 214.
dc.identifier.citedreferenceBirdwell RL, Bandodkar P, Ikeda DM. Computer-aided detection with screening mammography in a university hospital setting. Radiology. 2005; 236: 451 - 457.
dc.identifier.citedreferenceDean JC, Ilvento CC. Improved cancer detection using computer-aided detection with diagnostic and screening mammography: prospective study of 104 cancers. Am J Roentgenol. 2006; 187: 20 - 28.
dc.identifier.citedreferenceMorton MJ, Whaley DH, Brandt KR, Amrami KK. Screening mammograms: interpretation with computer-aided detection - Prospective evaluation. Radiology. 2006; 239: 375 - 383.
dc.identifier.citedreferenceGilbert FJ, Astley SM, Gillan MG, et al. CADET II: a prospective trial of computer-aided detection (CAD) in the UK Breast Screening Programme. J Clin Oncol. 2008; 26: 508.
dc.identifier.citedreferenceRegge D, Monica PD, Galatola G, et al. Efficacy of Computer-aided Detection as a Second Reader for 6-9-mm Lesions at CT Colonography: multicenter Prospective Trial. Radiology. 2013; 266: 168 - 176.
dc.identifier.citedreferenceConcato J, Shah N, Horwitz RI. Randomized, controlled trials, observational studies, and the hierarchy of research designs. N Engl J Med. 2000; 342: 1887 - 1892.
dc.identifier.citedreferenceGur D, Sumkin JH, Rockette HE, et al. Changes in breast cancer detection and mammography recall rates after the introduction of a computer-aided detection system. J Natl Cancer Inst. 2004; 96: 185 - 190.
dc.identifier.citedreferenceFenton JJ, Abraham L, Taplin SH, et al. Effectiveness of Computer-Aided Detection in Community Mammography Practice. J Natl Cancer Inst. 2011; 103: 1152 - 1161.
dc.identifier.citedreferenceGromet M. Comparison of computer-aided detection to double reading of screening mammograms: review of 231,221 mammograms. Am J Roentgenol. 2008; 190: 854 - 859.
dc.identifier.citedreferenceLehman CD, Wellman RD, Buist DSM, et al. Diagnostic Accuracy of Digital Screening Mammography With and Without Computer-Aided Detection. JAMA Intern Med. 2015; 175: 1828 - 1837.
dc.identifier.citedreferenceCruz Rivera S, Liu X, Chan A-W, et al. Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI extension. Nat Med. 2020; 26: 1351 - 1363.
dc.identifier.citedreferenceLiu X, Rivera SC, Moher D, et al. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension. Nat Med. 2020; 26: 1364 - 1374.
dc.identifier.citedreferenceRoberts M, Driggs D, Thorpe M, et al. Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans. Nat Mach Intell. 2021; 3: 199 - 217.
dc.identifier.citedreferenceNagendran M, Chen Y, Lovejoy CA, et al. Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies. Bmj-British Medical Journal. 2020; 368: 1 - 7.
dc.identifier.citedreferenceAggarwal R, Sounderajah V, Martin G, et al. Diagnostic accuracy of deep learning in medical imaging: a systematic review and meta-analysis. NPJ Dig Med. 2021; 4: 65 - 65.
dc.identifier.citedreferenceKim DW, Jang HY, Kim KW, Shin Y, Park S. Design characteristics of studies reporting the performance of artificial intelligence algorithms for diagnostic analysis of medical images: results from recently published papers. Korean J Radiol. 2019; 20: 405 - 410.
dc.identifier.citedreferencePetrick N, Sahiner B, Armato SG, et al. Evaluation of computer-aided detection and diagnosis systems. Med Phys. 2013; 40: 087001.
dc.identifier.citedreferenceHuo ZM, Summers RM, Paquerault S, et al. Quality assurance and training procedures for computer-aided detection and diagnosis systems in clinical use. Med Phys. 2013; 40: 077001.
dc.identifier.citedreferenceCohen JF, Korevaar DA, Altman DG, et al. STARD 2015 guidelines for reporting diagnostic accuracy studies: explanation and elaboration. Bmj Open. 2016; 6: e012799.
dc.identifier.citedreferenceTrost J. Statistically nonrepresentative stratified sampling: a sampling technique for qualitative studies. Qual Sociol. 1986; 9: 54 - 57.
dc.identifier.citedreferenceEtikan I, Musa SA, Alkassim RS. Comparison of convenience sampling and purposive sampling. Am J Theor Appl. 2016; 5: 1 - 4.
dc.identifier.citedreferencePan I, Agarwal S, Merck D. Generalizable inter-Institutional classification of abnormal chest radiographs using efficient convolutional neural networks. J Digit Imaging. 2019; 32: 888 - 896.
dc.identifier.citedreferenceZech JR, Badgeley MA, Liu M, Costa AB, Titano JJ, Oermann EK. Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: a cross-sectional study. PLoS Med. 2018; 15: e1002683.
dc.identifier.citedreferenceFeng X, Bernard ME, Hunter T, Chen Q. Improving accuracy and robustness of deep convolutional neural network based thoracic OAR segmentation. Phys Med Biol. 2020; 65: 07NT01.
dc.identifier.citedreferenceLiu XX, Faes L, Kale AU, et al. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. Lancet Dig Health. 2019; 1: E271 - E297.
dc.identifier.citedreferenceMoons KGM, Altman DG, Reitsma JB, et al. Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): explanation and Elaboration. Ann Intern Med. 2015; 162: W1 - W73.
dc.identifier.citedreferenceWhitney HM, Li H, Ji Y, Liu P, Giger ML. Harmonization of radiomic features of breast lesions across international DCE-MRI datasets. J Med Imaging. 2020; 7: 012707.
dc.identifier.citedreferenceNishikawa RM, Giger ML, Doi K, et al. Effect of case selection on the performance of computer-aided detection schemes. Med Phys. 1994; 21: 265 - 269.
dc.identifier.citedreferenceNishikawa RM, Yarusso LM. Variations in measured performance of CAD schemes due to database composition and scoring protocol. In: Hanson KM, ed. Medical Imaging1998: Image Processing, Pts 1 and 2. 1998: 840 - 844. Editor.
dc.identifier.citedreferenceArmato SG, Roberts RY, McNitt-Gray MF, et al. The lung image database consortium (LIDC): ensuring the integrity of expert-defined “truth”. Acad Radiol. 2007; 14: 1455 - 1463.
dc.identifier.citedreferenceClark KW, Gierada DS, Marquez G, et al. Collecting 48,000 CT Exams for the Lung Screening Study of the National Lung Screening Trial. J Digit Imaging. 2009; 22: 667 - 680.
dc.identifier.citedreferenceWilkinson MD, Dumontier M, Aalbersberg IJ, et al. The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data. 2016; 15: 160018.
dc.identifier.citedreferenceSummary of the HIPAA Privacy Rule. https://www.hhs.gov/hipaa/for-professionals/privacy/laws-regulations/index.html
dc.identifier.citedreferenceInternational Compilation of Human Research Standards. 2021 Edition. Compiled by Office for Human Research Protections, Office of the Assistant Secretary for Health, U.S. Department of Health and Human Services. https://www.hhs.gov/sites/default/files/ohrp-international-compilation-2021.pdf
dc.identifier.citedreferenceRoberts H, Cowls J, Morley J, Taddeo M, Wang V, Floridi L. The Chinese approach to artificial intelligence: an analysis of policy, ethics, and regulation. Ai & Society. 2021; 36: 59 - 77.
dc.identifier.citedreferenceGong M, Wang S, Wang L, et al. Evaluation of privacy risks of patients’ data in China: case study. Jmir Med Inform. 2020; 8: e13046.
dc.identifier.citedreferencePinhao K, R MM, Twenty reasons why GDPR compliance does not exempt companies from adjusting to the LGPD. in International Bar Association, 2021, https://www.ibanet.org/article/0634B90E-98DE-40E6-953F-2F63CB481F02
dc.identifier.citedreferenceLarson DB, Magnus DC, Lungren MP, Shah NH, Langlotz CP. Ethics of Using and Sharing Clinical Imaging Data for Artificial Intelligence: a Proposed Framework. Radiology. 2020; 295: 675 - 682.
dc.identifier.citedreferenceGeis JR, Brady AP, Wu CC, et al. Ethics of Artificial Intelligence in Radiology: summary of the Joint European and North American Multisociety Statement. J Am Coll Radiol. 2019; 16: 1516 - 1521.
dc.identifier.citedreferenceAryanto KYE, Oudkerk M, van Ooijen PMA. Free DICOM de-identification tools in clinical research: functioning and safety of patient privacy. Eur Radiol. 2015; 25: 3685 - 3695.
dc.identifier.citedreferenceRobinson JD. Beyond the DICOM Header: additional Issues in Deidentification. Am J Roentgenol. 2014; 203: W658 - W664.
dc.identifier.citedreferenceBuolamwini J, Gebru T, Gender Shades: intersectional Accuracy Disparities in Commercial Gender Classification. in Proceedings of the 1st Conference on Fairness, Accountability and Transparency, Sorelle AF, Christo W. Editors. 2018, PMLR: Proceedings of Machine Learning Research. p. 77 - 91.
dc.identifier.citedreferenceLiu Y, Jain A, Eng C, et al. A deep learning system for differential diagnosis of skin diseases. Nat Med. 2020; 26: 900 - 908.
dc.identifier.citedreferenceGichoya JW, Banerjee I, Bhimireddy AR, et al. Al recognition of patient race in medical imaging: a modelling study. Lancet Digital Health. 2022; 4: E406 - E414.
dc.identifier.citedreferenceShrestha S, Das S. Exploring gender biases in ML and AI academic research through systematic literature review. Front Artif Intell. 2022; 5: 976838 - 976838.
dc.identifier.citedreferenceDankwa-Mullan I, Weeraratne D. Artificial Intelligence and Machine Learning Technologies in Cancer Care: addressing Disparities, Bias, and Data Diversity. Cancer Discov. 2022; 12: 1423 - 1427.
dc.identifier.citedreferenceChan H-P, Lo SCB, Sahiner B, Lam KL, Helvie MA. Computer-aided detection of mammographic microcalcifications: pattern recognition with an artificial neural network. Med Phys. 1995; 22: 1555 - 1567.
dc.identifier.citedreferenceKrizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems. 2012: 1097 - 1105.
dc.identifier.citedreferenceGoodfellow IJ, Pouget-Abadie J, Mirza M, et al. Generative Adversarial Nets. arXiv:1406.2661v1 2014.
dc.identifier.citedreferenceFrid-Adar M, Klang E, Amitai M, Goldberger J, Greenspan H, Synthetic data augmentation using gan for improved liver lesion classification. in 15th IEEE International Symposium on Biomedical Imaging (ISBI), Washington, DC. 2018, pp. 289 - 293.
dc.identifier.citedreferenceCha KH, Petrick N, Pezeshk A, et al. Evaluation of data augmentation via synthetic images for improved breast mass detection on mammograms using deep learning. J Med Imaging (Bellingham Wash). 2020; 7: 012703 - 012703.
dc.identifier.citedreferenceHagiwara A, Fujita S, Ohno Y, Aoki S. Variability and Standardization of Quantitative Imaging Monoparametric to Multiparametric Quantification, Radiomics, and Artificial Intelligence. Invest Radiol. 2020; 55: 601 - 616.
dc.identifier.citedreferenceGraham B. Kaggle diabetic retinopathy detection competition report. University of Warwick; 2015.
dc.identifier.citedreferenceRobinson K, Li H, Lan L, Schacht D, Giger M. Radiomics robustness assessment and classification evaluation: a two-stage method demonstrated on multivendor FFDM. Med Phys. 2019; 46: 2145 - 2156.
dc.identifier.citedreferenceBaessler B, Weiss K, dos Santos DP. Robustness and Reproducibility of Radiomics in Magnetic Resonance Imaging A Phantom Study. Invest Radiol. 2019; 54: 221 - 228.
dc.identifier.citedreferenceMali SA, Ibrahim A, Woodruff HC, et al. Making Radiomics More Reproducible across Scanner and Imaging Protocol Variations: a Review of Harmonization Methods. J Pers Med. 2021; 11: 842.
dc.identifier.citedreferenceGallardo-Estrella L, Lynch DA, Prokop M, et al. Normalizing computed tomography data reconstructed with different filter kernels: effect on emphysema quantification. Eur Radiol. 2016; 26: 478 - 486.
dc.identifier.citedreferenceLiu M, Maiti P, Thomopoulos S, et al, Style Transfer Using Generative Adversarial Networks for Multi-site MRI Harmonization. in International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), Electr Network. 2021. pp. 313 - 322.
dc.identifier.citedreferenceRai R, Holloway LC, Brink C, et al. Multicenter evaluation of MRI-based radiomic features: a phantom study. Med Phys. 2020; 47: 3054 - 3063.
dc.identifier.citedreferenceFortin J-P, Parker D, Tunc B, et al. Harmonization of multi-site diffusion tensor imaging data. Neuroimage. 2017; 161: 149 - 170.
dc.identifier.citedreferenceOrlhac F, Frouin F, Nioche C, Ayache N, Buvat I. Validation of a Method to Compensate Multicenter Effects Affecting CT Radiomics. Radiology. 2019; 291: 52 - 58.
dc.identifier.citedreferenceNakahara T, Daisaki H, Yamamoto Y, et al. Use of a digital phantom developed by QIBA for harmonizing SUVs obtained from the state-of-the-art SPECT/CT systems: a multicenter study. EJNMMI Res. 2017; 7: 53.
dc.identifier.citedreferenceKeller H, Shek T, Driscoll B, Xu Y, et al. Noise-Based Image Harmonization Significantly Increases Repeatability and Reproducibility of Radiomics Features in PET Images: a Phantom Study. Tomography. 2022; 8: 1113 - 1128.
dc.identifier.citedreferenceRevesz G, Kundel HL, Bonitatibus M. The effect of verification on the assessment of imaging techniques. Invest Radiol. 1983; 18: 194 - 198.
dc.identifier.citedreferenceMiller DP, O’Shaughnessy KF, Wood SA, Castellino RA, Gold standards and expert panels: a pulmonary nodule case study with challenges and solutions. in Medical Imaging 2004: Image Perception, Observer Performance, and Technology Assessment, Chakraborty DP, Eckstein MP. Editors, 2004, p. 173 - 184.
dc.identifier.citedreferenceJiang Y, A Monte Carlo simulation method to understand expert-panel consensus truth and double readings. Medical Image Perception Conference XII. 2007. The University of Iowa, Iowa City, IA, ( 2007 ).
dc.identifier.citedreferenceLi Q, Doi K. Comparison of typical evaluation methods for computer-aided diagnostic schemes: monte Carlo simulation study. Med Phys. 2007; 34: 871 - 876.
dc.identifier.citedreferenceArmato SG, Roberts RY, Kocherginsky M, et al. Assessment of Radiologist Performance in the Detection of Lung Nodules: dependence on the Definition of “Truth”. Acad Radiol. 2009; 16: 28 - 38.
dc.identifier.citedreferenceZhou C, Chan H-P, Chughtai A, et al. Variabilities in Reference Standard by Radiologists and Performance Assessment in Detection of Pulmonary Embolism in CT Pulmonary Angiography. J Digit Imaging. 2019; 32: 1089 - 1096.
dc.identifier.citedreferenceSahiner B, Chan H-P, Hadjiiski LM, et al. Effect of CAD on radiologists’ detection of lung nodules on thoracic CT scans: analysis of an observer performance study by nodule size. Acad Radiol. 2009; 16: 1518 - 1530.
dc.identifier.citedreferenceWenzel A, Hintze H. The choice of gold standard for evaluating tests for caries diagnosis. Dentomaxillofac Radiol. 1999; 28: 132 - 136.
dc.identifier.citedreferenceLehmann TM. From plastic to gold: a unified classification scheme for reference standards in medical image processing. In: Sonka M, Fitzpatrick JM, eds. Medical Imaging2002: Image Processing, Vol 1–3. 2002: 1819 - 1827. Editors.
dc.identifier.citedreferenceLi F, Engelmann R, Armato SG, MacMahon H. Computer-Aided Nodule Detection System: results in an Unselected Series of Consecutive Chest Radiographs. Acad Radiol. 2015; 22: 475 - 480.
dc.identifier.citedreferenceYankelevitz DF, Henschke CI. Does 2-year stability imply that pulmonary nodules are benign. Am J Roentgenol. 1997; 168: 325 - 328.
dc.identifier.citedreferenceLitjens GJS, Barentsz JO, Karssemeijer N, Huisman HJ. Clinical evaluation of a computer-aided diagnosis system for determining cancer aggressiveness in prostate MRI. Eur Radiol. 2015; 25: 3187 - 3199.
dc.identifier.citedreferenceDREAM. The digital mammography dream challenge. ( 2017 ). https://www.synapse.org/Digital_Mammography_DREAM_challenge
dc.identifier.citedreferenceMcKinney SM, Sieniek M, Godbole V, et al. International evaluation of an AI system for breast cancer screening. Nature. 2020; 577: 89 - 94.
dc.identifier.citedreferenceMeyer CR, Johnson TD, McLennan G, et al. Evaluation of lung MDCT nodule annotation across radiologists and methods. Acad Radiol. 2006; 13: 1254 - 1265.
dc.identifier.citedreferenceTan J, Pu J, Zheng B, Wang X, Leader JK. Computerized comprehensive data analysis of Lung Imaging Database Consortium (LIDC). Med Phys. 2010; 37: 3802 - 3808.
dc.identifier.citedreferenceYan K, Wang X, Lu L, Summers RM. DeepLesion: automated mining of large-scale lesion annotations and universal lesion detection with deep learning. J Med Imaging. 2018; 5: 036501.
dc.identifier.citedreferenceOakden-Rayner L. Exploring Large-scale Public Medical Image Datasets. Acad Radiol. 2020; 27: 106 - 112.
dc.identifier.citedreferenceBluemke DA, Moy L, Bredella MA, et al. Assessing Radiology Research on Artificial Intelligence: a Brief Guide for Authors, Reviewers, and Readers-From the Radiology Editorial Board. Radiology. 2020; 294: 487 - 489.
dc.identifier.citedreferenceGoel S, Sharma Y, Jauer M-L, Deserno TM. WeLineation: crowdsourcing delineations for reliable ground truth estimation. Medical Imaging 2020: Imaging Informatics for Healthcare, Research, and Applications. 2020; 11318: 113180C.
dc.identifier.citedreferenceNguyen TB, Wang S, Anugu V, et al. Distributed Human Intelligence for Colonic Polyp Classification in Computer-aided Detection for CT Colonography. Radiology. 2012; 262: 824 - 833.
dc.identifier.citedreferenceJauer M-L, Goel S, Sharma Y, et al. STAPLE performance assessed on crowdsourced sclera segmentations. Medical Imaging 2020: Imaging Informatics for Healthcare, Research, and Applications. 2020; 11318: 113180K.
dc.identifier.citedreferenceBadano A, Graff CG, Badal A, et al. Evaluation of Digital Breast Tomosynthesis as Replacement of Full-Field Digital Mammography Using an In Silico Imaging Trial. JAMA Network Open. 2018; 1: e185474 - e185474.
dc.identifier.citedreferenceAbadi E, Segars WP, Chalian H, Samei E. Virtual Imaging Trials for Coronavirus Disease (COVID-19). Am J Roentgenol. 2021; 216: 362 - 368.
dc.identifier.citedreferenceSamala RK, Chan H-P, Hadjiiski LM, Helvie MA, Richter CD. Generalization error analysis for deep convolutional neural network with transfer learning in breast cancer diagnosis. Phys Med Biol. 2020; 65: 105002.
dc.identifier.citedreferenceRajchl M, Lee MCH, Oktay O, et al. DeepCut: object Segmentation From Bounding Box Annotations Using Convolutional Neural Networks. IEEE Trans Med Imaging. 2017; 36: 674 - 683.
dc.identifier.citedreferenceWarfield SK, Zou KH, Wells WM. Simultaneous truth and performance level estimation (STAPLE): an algorithm for the validation of image segmentation. IEEE Trans Med Imaging. 2004; 23: 903 - 921.
dc.identifier.citedreferencePetrick N, Sahiner B, Chan H-P, Helvie MA, Paquerault S, Hadjiiski LM. Breast cancer detection: evaluation of a mass-detection algorithm for computer-aided diagnosis - Experience in 263 patients. Radiology. 2002; 224: 217 - 224.
dc.identifier.citedreferenceKallergi M, Carney GM, Gaviria J. Evaluating the performance of detection algorithms in digital mammography. Med Phys. 1999; 26: 267 - 275.
dc.identifier.citedreferenceZhou SK, Greenspan H, Davatzikos C, et al. A Review of Deep Learning in Medical Imaging: imaging Traits, Technology Trends, Case Studies With Progress Highlights, and Future Promises. Proc IEEE. 2021; 109: 820 - 838.
dc.identifier.citedreferenceGur D, Wagner RF, Chan H-P. On the repeated use of databases for testing incremental improvement of computer-aided detection schemes. Acad Radiol. 2004; 11: 103 - 105.
dc.identifier.citedreferenceFukunaga K. Introduction to statistical pattern recognition. 2nd ed. Academic Press; 1990.
dc.identifier.citedreferenceEfron B. Estimating the error rate of a prediction rule - improvement on cross-validation. J Am Statist Assoc. 1983; 78: 316 - 331.
dc.identifier.citedreferenceSahiner B, Chan H-P, Hadjiiski L. Classifier performance prediction for computer-aided diagnosis using a limited dataset. Med Phys. 2008; 35: 1559 - 1570.
dc.identifier.citedreferenceBland JM, Altman DG. Statistics Notes: bootstrap resampling methods. Bmj-Br Med J. 2015; 350: h2622.
dc.identifier.citedreferenceSamala RK, Chan HP, Hadjiiski L, Helvie MA. Risks of feature leakage and sample size dependencies in deep feature extraction for breast mass classification. Med Phys. 2021; 48: 2827 - 2837.
dc.identifier.citedreferenceRussell S, Norving P. Artificial intelligence: a modern approach. 4th ed. 2020.
dc.identifier.citedreferenceBishop C. Pattern recognition and machine learning. Springer; 2006.
dc.identifier.citedreferenceWinston P. Artificial Intelligence. 3rd ed. Addison-Wesley; 1993.
dc.identifier.citedreferenceJaiswal A, Babu AR, Zadeh MZ, Banerjee D, Makedon F. A Survey on Contrastive Self-Supervised Learning. Technologies. 2021; 9.
dc.identifier.citedreferenceLi J, Zhao G, Tao Y, et al. Multi -task contrastive learning for automatic CT and X-ray diagnosis of COVID-19. Pattern Recognit. 2021; 114: 107848.
dc.identifier.citedreferenceNappi JJ, Tachibana R, Hironaka T, Yoshida H. Electronic cleansing by unpaired contrastive learning in non-cathartic laxative-free CT colonography. Proc SPIE Med Imaging. 2022; 12037: 120370S.
dc.identifier.citedreferenceTajbakhsh N, Hu YF, Cao JL, et al. Surrogate Supervision For Medical Image Analysis: effective Deep Learning From Limited Quantities of Labeled Data. in 2019 Ieee 16th International Symposium on Biomedical Imaging. 2019; p. 1251 - 1255.
dc.identifier.citedreferenceTachibana R, Nappi JJ, Hironaka T, Yoshida H. Self-Supervised adversarial learning with a limited dataset for electronic cleansing in computed tomographic colonography: a preliminary feasibility study. Cancers. 2022; 14: 4125.
dc.identifier.citedreferenceZhou Z, Sodha V, Siddiquee MMR, et al. Models Genesis: generic Autodidactic Models for 3D Medical Image Analysis. in Medical Image Computing and Computer Assisted Intervention - Miccai 2019, Pt Iv, Shen D, et al., Editors. 2019, p. 384 - 393.
dc.identifier.citedreferenceBeiden S, Campbell G, Meier K, Wagner R. The problem of ROC analysis without truth: the EM algorithm and the information matrix. Proc SPIE Medical Imaging. 2000; 3981: 126 - 134.
dc.identifier.citedreferenceCheplygina V, de Bruijne M, Pluim JPW. Not-so-supervised: a survey of semi-supervised, multi-instance, and transfer learning in medical image analysis. Med Image Anal. 2019; 54: 280 - 296.
dc.identifier.citedreferenceLeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015; 521: 436 - 444.
dc.identifier.citedreferenceYosinski J, Clune J, Bengio Y, Lipson H. How transferable are features in deep neural networks? In: Ghahramani Z, ed. Advances in Neural Information Processing Systems. 2014: 3320 - 3328. Editors.
dc.identifier.citedreferenceBar Y, Diamant I, Wolf L, Lieberman S, Konen E, Greenspan H, Chest pathology detection using deep learning with non-medical training. in 2015 Ieee 12th International Symposium on Biomedical Imaging, 2015, p. 294 - 297.
dc.identifier.citedreferenceShin HC, Roth HR, Gao MC, et al. Deep Convolutional Neural Networks for Computer-Aided Detection: cNN Architectures, Dataset Characteristics and Transfer Learning. IEEE Trans Med Imaging. 2016; 35: 1285 - 1298.
dc.identifier.citedreferenceDiamant I, Bar Y, Geva O. Chest Radiograph Pathology Categorization via Transfer Learning. In: Zhou SK, Greenspan H, Shen D, eds. Deep Learning for Medical Image Analysis. 2017: 299 - 320.
dc.identifier.citedreferenceSamala RK, Chan H-P, Hadjiiski L, Helvie MA, Wei J, Cha K. Mass detection in digital breast tomosynthesis: deep convolutional neural network with transfer learning from mammography. Med Phys. 2016; 43: 6654 - 6666.
dc.identifier.citedreferenceTajbakhsh N, Shin JY, Gurudu SR, et al. Convolutional Neural Networks for Medical Image Analysis: full Training or Fine Tuning? IEEE Trans Med Imaging. 2016; 35: 1299 - 1312.
dc.identifier.citedreferenceYang J, Huang X, He Y, et al. Reinventing 2D Convolutions for 3D Images. IEEE J Biomed Health Inform. 2021; 25: 3009 - 3018.
dc.identifier.citedreferenceTachibana R, Nappi JJ, Ota J, et al. Deep Learning Electronic Cleansing for Single- and Dual-Energy CT Colonography. Radiographics. 2018; 38: 2034 - 2050.
dc.identifier.citedreferenceSamala RK, Chan H-P, Hadjiiski L, Helvie MA, Richter CD, Cha KH. Breast Cancer Diagnosis in Digital Breast Tomosynthesis: effects of Training Sample Size on Multi-Stage Transfer Learning Using Deep Neural Nets. IEEE Trans Med Imaging. 2019; 38: 686 - 696.
dc.identifier.citedreferenceMei X, Liu Z, Robson PM, et al. RadImageNet: an Open Radiologic Deep Learning Research Dataset for Effective Transfer Learning. Radiol Artif Intell. 2022; 4: e210315.
dc.identifier.citedreferenceHeker M, Greenspan H, Joint Liver Lesion Segmentation and Classification via Transfer Learning. arXiv preprint arXiv:2004.12352, 2020.
dc.identifier.citedreferenceCaruana R. Multitask learning. Learning to learn. Springer; 1998: 95 - 133.
dc.identifier.citedreferenceSamala RK, Chan H-P, Hadjiiski LM, Helvie MA, Cha KH, Richter CD. Multi-task transfer learning deep convolutional neural network: application to computer-aided diagnosis of breast cancer on mammograms. Phys Med Biol. 2017; 62: 8894 - 8908.
dc.identifier.citedreferenceGoodman SN. Toward evidence-based medical statistics. 1: the P value fallacy. Ann Intern Med. 1999; 130: 995 - 1004.
dc.identifier.citedreferenceQuinonero-Candela J, Sugiyama M, Schwaighofer A, Lawrence ND. Dataset Shift in Machine Learning. ACM Digital Library. The MIT Press; 2009: 1 - 248.
dc.identifier.citedreferenceCastro DC, Walker I, Glocker B, Causality matters in medical imaging. arXiv preprint arXiv:1912.08142, 2019.
dc.identifier.citedreferenceCsurka G. A Comprehensive Survey on Domain Adaptation for Visual Applications. In: Csurka G, ed. Domain Adaptation in Computer Vision Applications. Springer; 2017: 1 - 35. Editor.
dc.identifier.citedreferenceKamnitsas K, Baumgartner C, Ledig C. 2017: 597 - 609. Editors.
dc.identifier.citedreferenceFrangi AF, Tsaftaris SA, Prince JL. Simulation and Synthesis in Medical Imaging. IEEE Trans Med Imaging. 2018; 37: 673 - 679.
dc.identifier.citedreferenceMahmood F, Chen R, Durr NJ. Unsupervised Reverse Domain Adaptation for Synthetic Medical Images via Adversarial Training. IEEE Trans Med Imaging. 2018; 37: 2572 - 2581.
dc.identifier.citedreferenceShin HC, Tenenholtz NA, Rogers JK. Medical Image Synthesis for Data Augmentation and Anonymization Using Generative Adversarial Networks. In: Gooya A, ed. Simulation and Synthesis in Medical Imaging. 2018: 1 - 11. Editors.
dc.identifier.citedreferenceSandfort V, Yan K, Pickhardt PJ, Summers RM. Data augmentation using generative adversarial networks (CycleGAN) to improve generalizability in CT segmentation tasks. Sci Rep. 2019; 9: 16884.
dc.identifier.citedreferenceMcMahan HB, Moore E, Ramage D, Hampson S, Arcas BAY. Communication-Efficient Learning of Deep Networks from Decentralized Data. In: Singh A, Zhu J, eds. Artificial Intelligence and Statistics. 2017: 1273 - 1282. Editors.
dc.identifier.citedreferenceKonecny J, McMahan HB, Yu FX, Richtarik P, Suresh AT, Bacon D, Federated learning: strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492, 2016.
dc.identifier.citedreferenceChang K, Balachandar N, Lam C, et al. Distributed deep learning networks among institutions for medical imaging. J Am Med Inform Assoc. 2018; 25: 945 - 954.
dc.identifier.citedreferenceRieke N, Hancox J, Li W, et al. The future of digital health with federated learning. Npj Dig Med. 2020; 3: 119.
dc.identifier.citedreferenceLi X, Gu Y, Dvornek N, Staib LH, Ventola P, Duncan JS. Multi-site fMRI analysis using privacy-preserving federated learning and domain adaptation: aBIDE results. Med Image Anal. 2020; 65: 101765.
dc.identifier.citedreferenceMcClure P, Zheng CY, Kaczmarzyk JR. Distributed Weight Consolidation: a Brain Segmentation Case Study. In: Bengio S, ed. Advances in Neural Information Processing Systems 31. 2018: 4093 - 4103. Editors.
dc.identifier.citedreferenceKairouz P, McMahan HB, Avent B, et al. Advances and open problems in federated learning. arXiv preprint arXiv:1912.04977, 2019.
dc.identifier.citedreferenceGrossberg S. Adaptive Resonance Theory: how a brain learns to consciously attend, learn, and recognize a changing world. Neural Netw. 2013; 37: 1 - 47.
dc.identifier.citedreferenceParisi GI, Kemker R, Part JL, Kanan C, Wermter S. Continual lifelong learning with neural networks: a review. Neural Netw. 2019; 113: 54 - 71.
dc.identifier.citedreferenceFrench RM. Catastrophic forgetting in connectionist networks. Trends Cogn Sci. 1999; 3: 128 - 135.
dc.identifier.citedreferenceGoodfellow IJ, Mirza M, Xiao D, Courville A, Bengio Y, An empirical investigation of catastrophic forgetting in gradient-based neural networks. arXiv preprint arXiv:1312.6211, 2013.
dc.identifier.citedreferenceMetz CE. Roc Methodology in radiologic imaging. Invest Radiol. 1986; 21: 720 - 733.
dc.identifier.citedreferenceChakraborty DP, Winter LHL. Free-response methodology - alternate analysis and a new observer-performance experiment. Radiology. 1990; 174: 873 - 881.
dc.identifier.citedreferenceGallas BD, Chan H-P, D’Orsi CJ, et al. Evaluating Imaging and Computer-aided Detection and Diagnosis Devices at the FDA. Acad Radiol. 2012; 19: 463 - 477.
dc.identifier.citedreferenceDoi K, MacMahon H, Katsuragawa S, Nishikawa RM, Jiang YL. Computer-aided diagnosis in radiology: potential and pitfalls. Eur J Radiol. 1999; 31: 97 - 109.
dc.identifier.citedreferenceHarrell FE, Califf RM, Pryor DB, Lee KL, Rosati RA. Evaluating the yield of medical tests. J Am Med Assoc. 1982; 247: 2543 - 2546.
dc.identifier.citedreferenceMantel N. Evaluation of survival data and two new rank order statistics arising in its consideration. Cancer Chemotherap Rep. 1966; 50: 163 - 170.
dc.identifier.citedreferenceTherasse P, Arbuck SG, Eisenhauer EA, et al. New Guidelines to Evaluate the Response to Treatment in Solid Tumors. J Natl Cancer Inst. 2000; 92: 205 - 216.
dc.identifier.citedreferenceEisenhauer EA, Therasse P, Bogaerts J, et al. New response evaluation criteria in solid tumours: revised RECIST guideline (version 1.1). Eur J Cancer. 2009; 45: 228 - 247.
dc.identifier.citedreferenceCohen PR. Empirical methods for artificial intelligence. The MIT Press; 1995.
dc.identifier.citedreferenceZhou X-H, Obuchowski NA, McClish DK. Statistical methods in diagnostic medicine. Wiley; 2002.
dc.identifier.citedreferenceSchober P, Bossers SM, Schwarte LA. Statistical Significance Versus Clinical Importance of Observed Effect Sizes: what Do P Values and Confidence Intervals Really Represent? Anesth Analg. 2018; 126: 1068 - 1072.
dc.identifier.citedreferenceAickin M, Gensler H. Adjusting for multiple testing when reporting research results: the Bonferroni vs Holm methods. Am J Public Health. 1996; 86: 726 - 728.
dc.identifier.citedreferenceRajpurkar P, Irvin J, Ball RL, et al. Deep learning for chest radiograph diagnosis: a retrospective comparison of the CheXNeXt algorithm to practicing radiologists. Plos Med. 2018; 15: e1002686.
dc.identifier.citedreferenceEsteva A, Kuprel B, Novoa RA, et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017; 542: 115 - 118.
dc.working.doiNOen
dc.owningcollnameInterdisciplinary and Peer-Reviewed


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.