As if we didnโt have enough in-person racism to deal with, it turns out the chatbots might be racist too. Troubling reporting from the Washington Post revealed that one of the most widely-used data sets used to train AI chatbots contains a on of right-wing content.
Suggested Reading
For folks unfamiliar with artificial intelligence, AI programs like ChatGPT canโt literally think for themselves. Instead, companies feed AI programs a massive amount of data scraped from all over the internet. The AI uses this data set to mimic human thought. So if youโre robot friend starts trying to share 9/11 conspiracy theories, chances the data set had a little too much of Alex Jones.
And that, my friends, is precisely where the problem begins. According to theย Washington Post investigation, the news websites used in one of the most widely used AI data sets include a ton of far-right and non-reputable sources. The data set in question is Googleโs C4 data set, which powers some of the largest AI models in the world, including Facebook and Googleโs AI models.
So where exactly are they getting their news from? Well, Breitbart is definitely on the list. The Russian-state propaganda website RT.com is also on there, alongside the anti-immigration group Vdare.com.
You donโt have to take our word for whether Breitbart is pushing racism. In 2016, right-wing commentator Ben Shapiro expressed disdain for the website, saying it pushed โwhite ethno-nationalismโ content. And if youโre too far right for Ben Shapiro... you might want to start asking some tough questions. A massive concern is that AI programs donโt always cite their sources, which means you could ask an AI a question and not know the idea that the answer is coming from a right-wing site spewing hate.
MSNBCโs Sarah Posner, who covers the right, called attention to just how dangerous having these inputs in the algorithm can be:
Anyone who has searched the web for information on a topic knows that it can sometimes land them on a site spewing bigoted content or disinformation. The building blocks of chatbots have been scraped from the same internet. An offended user can navigate away from a toxic site in disgust. But because the data collection for LLMs is automated, such content gets included in the โinstructionโ for them. So if an LLM includes information from sites like Breitbart and VDare, which publish transphobic, anti-immigrant and racist content, that information โ or disinformation โ could be incorporated in a chatbotโs responses to your questions or requests for help.
The problem with AI (other than the inevitable day it takes over the world) is that itโs a product of our own biases and judgments. And until we get a much better handle on that or at least put up better guard rails, the racist chatbots might be here to stay.
Straight From
Sign up for our free daily newsletter.