Art Intelligence

Data Is What We Make of It—For Good or Ill

Catherine D’Ignazio standing in front of fluted building pillars. D’Ignazio shows how data can expose inequity—or contribute to it.
Written by Noah Roy

A few years ago, if I used an online program to translate the phrase “She won a Nobel Prize” from English into Hungarian, which includes gender-unrelated pronouns, and then back into English, the result would be “He won a Nobel Prize.” why? Because of the automatic assumption in the program that if a pronoun is needed in English, it must be masculine.

But if you do it now with, say, Google Translate, the phrase will simply return “I won a Nobel Prize”. True, it’s embarrassing – but someone has learned a lesson.

That’s one problem with all the data flowing around us – implicit biases in how it’s collected and used – but some people are trying to fix that.

One of those people is Catherine Degnazio, J97. She is an assistant professor of urban science and planning at the Massachusetts Institute of Technology, and co-author of the latest book feminist dataWith Lauren F. Klein. It’s a deep dive into the world of data and how it intersects with feminism, bias and justice.

D’Ignazio is not only trying to fix biases, but he is actively working to harness data to work toward justice. She and her colleagues run a project that helps nonprofits trying to combat femicide in the Americas, have led the reproductive justice hackathon and designed global news recommendation systems. Her work extends to art, too – on February 11, she will be giving a public lecture at the Art Datathon hosted by Tufts.

Tariffs are on hand here. “Data is information that is systematically collected and tabulated,” says D’Ignazio. Only one fact is not the data, until it is grouped with other observations of the same type.

She says feminism includes three things. “First, a belief in gender equality. Second, it also means a political commitment, because if you believe all genders are equal, you are committed to taking action to realize that belief. Third, it’s an intellectual legacy, which means learning from all the studies and amazing feminist work over time” .

Data can be problematic in many ways. One example: facial recognition systems. Four years ago, Joy Polamweni, an MIT fellow at D’Ignazio, used facial recognition software to figure out how it worked on herself, a black woman — and the system failed even to detect her face. Then she put on the white stage mask, which was instantly recognizable. “For dark-skinned women in particular, there were about 35% error rates,” says D’Ignazio.

How did this happen? It turns out that the dataset the app used to learn human faces was “extremely pale and male,” she said. Even before the seemingly biased algorithms were created, someone made a decision about what data should be collected in the first place – and which should not. “Although data and algorithms are not human things, they are the products of these collective human decisions and reflect our biases as a society,” she says.

Dialogue in 2000 script divided by gender.  On the left, 100% of the words were written by males;  On the right, 100% females speak.  Photo: Matt Daniels for puddingIn response, many companies have provided a more diverse data set to train facial recognition software. But that’s not enough, says Degnazio. “He’s also thinking about who these technologies serve – for whom are they being developed?” She notes that the black population is among the most closely watched areas in the United States; It’s not clear if a better facial recognition data set would actually be useful for black women.

“Are these outcomes ultimately in the service of justice and rebalancing inequality, or will they be used disproportionately to the detriment of?” She asks. “You can have a beautifully representative data set that is eventually disseminated in ways that are very harmful to communities.”

Investigating Algorithms

D’Ignazio says it’s particularly important to understand this complexity, particularly in areas such as artificial intelligence. She says AI is permeating many aspects of our lives now, from social work to medicine to law. “One of our principles is to challenge authority,” she says. One way to do this is to audit the algorithms that are at the core of AI programs.

She commends the journalists and computational social science researchers who are “at the forefront of this work to scrutinize algorithms in the public interest”. They do not have access to the source code of the proprietary program, but they can carefully judge the AI ​​program by its results, exposing its flaws.

“If data is power, how can we democratize that power and put that power in the hands of a more dispersed group of actors, rather than very big, elite corporations and only well-resourced governments?” – Catherine Degnazio

For example, a few years ago, reporters from ProPublica discovered that the artificial intelligence software that many judges rely on to decide judgments based on potential recidivism rates is deeply flawed. It was presented as free from prejudice, but in fact deeply biased against black defendants. For example, the program recommended terms more lenient for violent white offenders than for nonviolent black offenders, and reporters found that whites, upon release, were more likely to return to prison than blacks.

D’Ignazio says the AI ​​investigation is about “pulling out the technical interiors and putting them in the public arena and saying, ‘We should have this conversation together'”. “It can’t just be a big tech decision. It also can’t just be a big government decision. We all need to come together and have a public process on this.”

Data can also be powerful in highlighting inequalities. This is where data visualization plays a key role, helping us to understand information in a deeper way than written descriptions can achieve.

in a feminist dataD’Ignazio and Klein includes a stunning graphic depicting 2,000 screenplay dialogues given by male speakers and the amount by female speakers. It shows how men dominate on-screen conversations – and seeing them presented visually is more effective than reading about them.

“We have this perceptual and cognitive power, which is that we can discern a lot of diverse information with our eyes,” says D’Ignazio. The psychology class at Tufts University credits Professor Holly Taylor as her first introduction to the idea of ​​the importance of visualization.

anxious learner

D’Ignazio may be focusing on data now, but she was a major in international relations at Tufts University. “I’ve always loved foreign languages, learning and understanding different cultures,” she says. After graduation, she worked in technology for her father’s company, and started computer programming.

Soon she was working at a startup, learning Java and Perl scripting, and then headed off on her own as a freelance programmer. Always a restless learner, she added an MA in Media Arts and Computer-Driven Art, then started teaching in addition to her work in programming and art making.

But three jobs became unpopular with the birth of her first child. D’Ignazio is back in school, this time to MIT’s Media Lab for another master’s degree.

An example of effective data visualization, according to Catherine Degnazio - Maternity and paternity leave policies by country.  Photo: Women's Atlas, Oxford, UK: Myriad Editions 2018;  Used with permission from the author Johnny SeagerShe consolidated those leads at Emerson College, where she was an assistant professor of data visualization and civic media in the Department of Journalism. Some students were intimidated by numbers and data, but D’Ignazio taught them to be more confident in their skills and to be skeptical of data provided by others.

In 2019, she joined the Department of Urban Studies and Planning at MIT, where she now also teaches and directs the Data + Feminism Lab, “which uses data and computational methods to work toward gender and racial equality, particularly with regard to space and place,” she says.

Data tools for good reasons

For example, Data Against Feminicide works with community and nonprofit organizations that monitor gender-based violence in the United States, Latin America, and Canada. The project is a collaboration with Rahul Bhargava, Associate Professor at Northeastern University.

Project members interview staff at the nonprofit about the methods they use to collect data, and then “build tools and techniques to support them and reduce the effort involved in data collection, which is often very manual, copied and pasted, but also very emotionally challenging,” says D’Ignazio .

The groups experimented with data tools, built on artificial intelligence and machine learning but with simple user interfaces, “and they gave us ideas, then we iterate, build new tools and publish them,” she says. “We think of it as participatory AI, where society informs us every step of the way what should be the feature or the next thing we should do.”

She and Barjava have also developed a set of tools called DataBasic, which they use to train journalists, nonprofits, librarians, community organizations, and artists on how to use data—with an emphasis on basic data analysis and visualization techniques. “We also help people challenge their belief in numbers,” she says. All too often, non-technical people consider the numbers to be objective, although often the way they are arrived at is not.

She insists that we all need to understand how this data gets into the world, who collects it, and why. “What is the set of interests at play? They don’t have to be outrageous — it’s just about collecting data with a specific purpose in mind, which will pick up on some things and ignore others.”

A few years ago, data was called “the new oil” in business publications, and that’s convenient, says Degnazio, even if not in the way it was originally intended. There are certainly similarities between oil and data: we talk about data mining, data refining, data as a vital resource, and data as a source of new power and wealth.

“I think strength is the core,” says D’Ignazio. “If you look at the companies that have the most money, they are the ones that have the ability to collect, store, maintain, analyze and publish very large sets of data – Google, Facebook, Microsoft. It really is a way in which power is very unevenly concentrated.”

That’s why it also works on data literacy. “If data is power, how can we democratize that power and put that power in the hands of a more dispersed group of actors, rather than very big, elite corporations and only well-resourced governments?”

Taylor McNeil can be reached at taylor.mcneil@tufts.edu.

About the author

Noah Roy

Leave a Comment