Addressing Bias in AI

August 27, 2020

Radiologists, machine learning scientists, industry experts, and policymakers are increasingly confronted with ethical considerations related to the development and implementation of AI tools in healthcare. Among these ethical challenges is the imperative to identify and mitigate sources of unwanted bias that may be reflected in AI algorithms. Here are a few relevant points to consider when striving to create and use fair AI algorithms for clinical practice.

Can AI algorithms be biased?

The presence of human biases, both conscious and unconscious, is well known. Similarly, if care is not taken, AI algorithms can explicitly and implicitly encode those same biases. When AI models identify statistical patterns in human-generated training data, it is no surprise that our biases can be reflected in these algorithms. If these biases go undetected before AI tools are implemented in clinical practice, they can lead to harmful results. Therefore, recognizing potential sources of bias in AI algorithms is critical to ensuring their safe use.

How do biases make their way into AI algorithms?

Unwanted bias may be incorporated unwittingly into AI models at points throughout an algorithm’s lifecycle — including the creation of training datasets, selection of model architecture, and refinement of the algorithm post-deployment. While the potential to introduce significant biases exists in each of these phases, I will focus primarily on examples of bias affecting the training data.

Data points used to train AI algorithms are drawn from the results of human decisions. Therefore, these algorithms may reflect the effects of historical or systemic inequities. An article published in Science last year illustrated how an AI algorithm used in population health management reflected systemic inequities embedded in the delivery of care — and showed the resulting unfair results.¹The algorithm in the study used healthcare costs as a proxy for disease complexity — making the assumption that patients with the highest healthcare expenditures would benefit most from certain interventions. However, due to unequal access to care between white and Black patients, the healthcare costs for Black patients are lower than those of white patients with similar disease complexities. Therefore, using this algorithm’s output to guide treatment decisions, Black patients would receive comparatively less care — reinforcing existing inequities.

Underrepresentation of a given sub-population in the training data is another potential source of bias — and is particularly relevant to healthcare. When it comes to training an AI algorithm, fewer data points may lead to less accurate predictions. If not explicitly accounted for, the routines used to train machine learning models will often optimize over the entire population — resulting in unequal impacts from majority and minority sub-populations. Therefore, the model’s performance on underserved patients, who have historically faced greater challenges accessing care, including imaging, may be silently deprioritized if corrective measures are not taken — resulting in lesser performance relative to the general population.

Data points used to train AI algorithms are drawn from the results of human decisions. Therefore, these algorithms may reflect the effects of historical or systemic inequities.

What are some strategies to mitigate algorithmic bias?

Before we can effectively assess AI algorithms for potential bias, we should strive for a better understanding of the factors that lead a model to reach a certain output. We also need to agree upon measurable and relevant fairness metrics. Both of these goals are not without their own challenges, including the black-box nature of AI models and the existence of multiple competing definitions of fairness. As ongoing work is done to address these tasks, we can begin establishing practices to identify and mitigate bias. These include:

Processing the data to address biases before moving forward with training
Embedding techniques to ensure fairness is built into the model development process
Assessing the algorithm’s outputs for bias and fairness before operationalizing by validating against a diverse data set from multiple institutions and geographic locales prior to use in clinical practice
Monitoring algorithms after deployment to ensure they function as expected in actual clinical practices with heterogeneous imaging equipment and patient populations

Just as radiologists perform quality assurance on all our imaging modalities, we should aim to incorporate bias evaluation in routine and ongoing assessment of AI algorithms. A culture of AI fairness should ensure that vulnerable or underrepresented populations remain adequately protected. Our ability to provide the best possible care to patients depends on it.