Dostoevsky's Piano

The best way to understand how AI models can be biased in their outputs is through the case of Amazon’s failed AI recruiting tool. In 2014, Amazon developed a tool designed to screen candidates' resumes and serve as the first-pass reviewer. The dream is that eventually this would be able to select the top candidates from a pool of applications without intervention. However, the tool ended up discriminating against female applicants.

How does this happen? At a fundamental level, AI and machine learning models learn by analyzing a dataset where the outcomes are already known. Typically, there is a set of “training data” used to train the model, along with some “testing data” to evaluate its performance. In the case of resumes, the model is trained on a set of resumes labeled as either “accepted” or “rejected.” It then attempts to identify patterns in the resumes based on these outcomes.

In Amazon's case, the model picked up on patterns in the training data, particularly noting that most accepted applications came from men. As a result, the model learned that men must simply be better suited for the roles being filled. This is absurd but highlights the limitations of these models. As Emily Bender put it, AI models are stochastic parrots.

We can expand on the idea that models can learn in unforeseen ways. For instance, if a model is trained exclusively on data from individuals in Western nations, it might identify misleading patterns. In a scenario where a model finds that applicants with shorter names are more likely to be hired, it is a fact that names are typically shorter in Western countries than in India or the Middle East. Our model now begins arbitrarily discriminating against people with long names because it learned this odd pattern in its training data. We agree the length of someone's name has no bearing on their ability in a role, but the model hasn't been trained on a sufficient level of data, or been tuned, to adjust to this pattern.

Moving beyond resumes, a major bias in current AI models stems from the language of content available worldwide. Approximately 59% of all websites are in English, and nearly 66% of content used for training language models is also in English. This dominant presence of English means that models are likely to develop unintended patterns. For example, the differences between Japanese and English illustrate this issue well. English is not very context-dependent. We speak explicitly in subject-verb-object (SVO) for most sentences. In contrast, Japanese is highly context-dependent, allowing subjects or objects to be omitted frequently. They also don’t follow a SVO sentence model and instead opt for subject-object-verb model. This small fundamental change has reverberating effects that we aren’t able to predict well. This doesn’t even cover that fact that Japan maintains an honorific system that impacts the speech patterns between two individuals based on their relationship.

Congrats! You now have a better understanding of model bias. However, addressing this issue is more complex than simply adjusting the model to ignore certain data points or patterns. When training on datasets with millions or billions of data points, how do you identify the "right" ones to modify?