Canadian Medical Association Journal – Evaluation of machine learning solutions in medicine
KEY POINTS
Evaluation of machine-learned systems is a multifaceted process that encompasses internal validation, clinical validation, clinical outcomes evaluation, implementation research and postimplementation evaluation.
Approaches to clinical validation include comparisons of model performance with those of clinician experts and silent deployment of systems with comparisons of predictions to actual patient outcomes; clinical outcome evaluation can be done through randomized controlled trials, cohort studies, interrupted time series analyses and before-and-after studies.
Implementation research includes qualitative and quantitative components and formative assessments and is attentive to the context in which the system is being deployed while evaluation frameworks can help teams structure their studies and analyses.
Postimplementation evaluation is necessary to monitor for and account for threats to system performance after deployment, which may necessitate retraining and recalibration of machine-learned systems.
A multidisciplinary team comprising data scientists, clinician experts and implementation scientists (qualitative and quantitative expertise) can help ensure that a comprehensive evaluation is undertaken before, during and after deployment.