Defeating Racism in Artificial Intelligence

Like a dog or a child, AI systems are black boxes. You don’t really know what goes on inside their brains. You train them all the same. Repetition. You feed in your inputs or commands and tell them what the expected result should be and given a large enough sample set, hopefully the Neural Net or pet learns the pattern and can predict new correct outputs for never-before-seen inputs.

AI is here, it’s real, its very effectively at dealing with some problems that old style programming can’t deal with and it’s sticking around.

If victims of systematic or subtle racism currently have a hard time proving existing non IT systems are racist (such as job interviewing, flagging down a taxi) they may have an even greater problem with upcoming AI systems and ignorant (aka wilfully complicit) companies who will pretend AI takes racism off the table.

I foresee Mortgage websites where you fill in your details to get mortgage offers and car dealerships websites all trained with racist data collected from the 70s and onwards where the data could be in places racially biased. Obviously we won’t tell the AI what race the applicants are, we won’t even ask on the form, but like a human with good intuition, the computer will do what it needs to, to hit its targets, and the guy named Jamal gets the red cross while Richard gets the green tick. The computer just imitates the patterns it’s seen before.

Race can be inferred from name, location, occupation, college names and many other subtle clues. Even if you can’t see it, doesn’t mean the computer can’t either.

I wasn’t going to write on this subject until I saw a fascinating talk at Scala Exchange 2017. There’s this program called Matryoshka that can analyse decision trees. The woman who gave the talk showed us the complete list of passengers on the Titanic. Their names, gender, age, some other bits and finally, whether they survived or died. Matryoshka allowed her to see which factors on the input played the largest part in deciding the final outcome. Were the women and children prioritised? It’s quite simple really, you group the data by a field (eg Name) and look at the how close to 50% the outcomes were. I kind of lie, It actually gets really tricky with non binary data and dozens of discrete inputs but the point is, it can be understood and built to a certain point. Certainly by people smarter than me.

When it comes to race and ensuring equality, you achieve nothing if you sweep race related questions under the table. You must capture everyone’s race and then retrospectively look back to see if a system is racist or not. This is the practice used by the UK government when you apply for jobs there.

Matryoshka and similar tools could be a decisive tool in helping us to understand the AI we build, to keep it transparent and going forwards to ensure we build the kind of fair, racism-free systems we all want to see.

Disclaimer: I have not actually worked on or studied AI, nor have I actually used Matryoshka! I just sit in conferences and feel the sound waves bounce off my forehead.