Better, Less-Stereotyped Word Vectors -Conceptnet Blog

Bias and Disenfranchisement Conversational interfaces learn from the data they have been given, and all datasets based on human communication encode bias. In 2013, researchers at Boston University and Microsoft discovered what they characterized as “extremely sexist” patterns in “Word2Vec,” a commonly used set of data based upon three million Google News stories.18 They found, among other things, that occupations inferred to be “male” included Maestro, Skipper, Protégé and Philosopher, while those inferred to be female included Homemaker, Nurse, Receptionist and Librarian. This is more than a hypothetical risk for organizations; Word2Vec is used to train search algorithms, recommendation engines, and other common applications related to ad targeting or audience segmentation. Organizations building chatbots based on common data sets must investigate potential bias and design for it upfront to prevent alienating and disenfranchising customers and consumers. The good news is that these and other researchers are working on methods to audit predictive models for bias.

Stereotypes, Gender Bias, AIBrigitte BellanJune 29, 2017Conceptnet, Open Data, Word Vector