Skip to main content
SearchLoginLogin or Signup

Chapter One: Bring Back the Bodies

Why do data science and visualization need feminism? Because bodies are missing from the data we collect, from the decisions made about their analysis and display, and from the field of data science as a whole. Bringing back the bodies is how we can right this power imbalance.

Published onNov 01, 2018
Chapter One: Bring Back the Bodies

This chapter is a draft. The final version of Data Feminism will be published by the MIT Press in 2019. Please email Catherine and/or Lauren for permission to cite this manuscript draft.

Chapter One: Bring Back the Bodies

When Serena Williams disappeared from Instagram in early September, 2017, her six million followers thought they knew what had happened. Several months earlier, in March of that year, Williams had accidentally announced her pregnancy to the world via a bathing suit selfie and a caption that was hard to misinterpret: “20 weeks.” Now, they assumed, her baby had finally arrived.  

But then they waited, and waited some more. Two weeks later, Williams finally re-appeared on Instagram, announcing the birth of her daughter and inviting her followers to watch a video that welcomed Alexis Olympia Ohanian Jr. to the world. A montage of baby bump pics interspersed with clips of a pregnant Williams playing tennis and cute conversations with her husband, Reddit cofounder Alexis Ohanian, segued into the shot that her fans had been waiting for: the first of baby Olympia. Williams was narrating: “So we’re leaving the hospital,” she explains. “It’s been a long time. We had a lot of complications. But look who we got!” The scene fades to white, and ends with a set of stats: Olympia’s date of birth, birth weight, and number of grand slam titles: 1. (Williams, as it turned out, was already eight weeks pregnant when she won the Australian Open earlier that year).   

Williams’s Instagram followers were, for the most part, enchanted. But a fair number of her followers-- many of them Black women like Williams herself-- fixated on the comment she’d made as she was heading home from the hospital with her baby girl. Those “complications” that Williams mentioned-- they’d had them too.  

On Williams’s Instagram feed, the evidence was anecdotal--women posting about their own experience of childbirth gone horribly wrong. But a few months later, Williams returned to social media--Facebook, this time--armed with data. Citing a 2017 study from the US Centers for Disease Control and Prevention (CDC), Williams wrote that “Black women are over 3 times more likely than white women to die from pregnancy- or childbirth-related causes.”

A Facebook post by Serena Williams responding to her Instagram followers who had shared their stories of pregnancy and childbirth-related complications with her.

Credit: Serena Williams

Source: https://www.facebook.com/SerenaWilliams/videos/10156086135726834/

While these disparities were well known to Black women-led reproductive justice groups like Sister Song, the Black Mamas Matter Alliance, and Raising Our Sisters Everywhere, as well as to feminist scholars across a range of disciplines, Williams helped to shine a national spotlight on them. And she wasn't the only one. A few months earlier, Nina Martin of the investigative journalism outfit ProPublica, working with Renee Montagne of NPR, had reported on the same phenomenon. “Nothing Protects Black Women From Dying in Pregnancy and Childbirth,” the headline read. In addition to the study also cited by Williams, Martin and Montagne cited a second study from 2016 which showed that neither education nor income level-- the factors usually invoked when attempting to account for healthcare outcomes that diverge along racial lines-- impacted the fates of Black women giving birth. On the contrary, the data showed that Black women with college degrees suffered more severe complications of pregnancy and childbirth than white women without high school diplomas.

But what were these complications, more precisely? And how many women had actually died as a result? ProPublica couldn’t find out, and neither could USA Today, which took up the issue a year later to see what, after a year of increased attention and advocacy, had changed. What they found was that there was still no national system for tracking complications sustained in pregnancy and childbirth, even as similar systems have long been in place for tracking things like, for instance, teen pregnancy, hip replacements, and heart attacks. They also found that there is also still no reporting mechanism for ensuring that hospitals follow national childbirth safety standards, as is required for both hip surgery and cardiac care. “Our maternal data is embarrassing,” stated Stacie Geller, a professor obstetrics and gynecology at the University of Illinois, when asked for comment. The Chief of the CDC’s Maternal and Infant Health Branch, William Callaghan, makes the significance of this “embarrassing” data more clear: “What we choose to measure is a statement of what we value in health,” he explains. We might edit his statement to add: it’s a measure of who we value in health, too.

 


 

The lack of data about maternal health outcomes, and its impact on matters of life and death, underscores how it is people who end up affected by the choices we make in our practices of data collection, analysis, and communication. More than that, it’s almost always the bodies of those who have been disempowered by forces they cannot control, such as sexism, racism, or classism--or, more likely, the intersection of all three--who experience the most severe consequences of these choices. Serena Williams acknowledged this exact phenomenon when asked by Glamour magazine about the statistics she cited in her Facebook post. “If I wasn’t who I am, it could have been me—” she said, referring to the fact that she had to demand that her medical team perform additional tests in order to diagnose her own postnatal complications, and because she was Serena Williams, 23-time grand slam champion, they listened. But, she told Glamour, “that’s not fair.”

It is absolutely not fair. But without a significant intervention into our current data practices, this unfairness--and many other inequities with issues of power and privilege at their core-- will continue to get worse. Stopping that downward spiral is the real reason we wrote this book. We wrote this book because we are data scientists and data feminists. We think that data science and the fields that rely upon it stand to learn significantly from feminist writing, thinking, scholarship, and action.1 As we explain in Why Data Science Needs Feminism, feminism isn’t only about women. It isn’t even only about issues of gender. Feminism is about power--about who has it, and who doesn’t. In a world in which data is power, and that power is wielded unequally, feminism can help us better understand how it operates and how it can be challenged. As data feminists--a group that includes women, men, non-binary and genderqueer people, and everyone else--we can take steps, together, towards a more just and equal world.

A good starting point is to understand how power operates on bodies and through them. “But!” you might say. “Data science is premised on things like objectivity and neutrality! And those things have nothing to do with bodies!” But that is precisely the point. Data science, as it is generally understood in the world today, has very little to do with bodies. But that is a fundamental misconception about the field, and about data more generally. Because even though we don’t see the bodies that data science is reliant upon, it most certainly relies upon them. It relies upon them as the sources of data, and it relies upon them to make decisions about data. As we discuss more in depth in a couple of pages, it even relies on them to decide what concepts like “objective” and “neutral” really mean. And when not all bodies are represented in those decisions-- as in the case of the federal and state legislatures which might fund data collection on maternal mortality--well, that’s when problems enter in.

What kind of problems? Structural ones. Structural problems refer to problems that are systemic in nature, rather than due to a specific point (or person) of origin. It might be counterintuitive to think that individual bodies can help expose structural problems, but that’s precisely what the past several decades--centuries, even--of feminist activism and critical thought has allowed us to see. Because many of the problems that individual people face are often the result of larger systems of power, but they remain invisible until those people bring them to light. In a contemporary context, we might easily cite the #MeToo movement as an example of how individual experience, taken together, reveals a larger structural problem of sexual harassment and assault. We might also cite the fact that the movement’s founder was a Black woman, Tarana Burke, whose contributions have largely been overshadowed by the more famous white women who joined in only after the initial--and therefore most dangerous--work had already taken place.

Burke’s marginalization in the #MeToo movement is only one datapoint in a long line of Black women who have stood on the vanguard of feminist advocacy work, only to have their contributions subsumed by white feminists after the fact. This is a structural problem too. It’s the result of several intersecting differentials of power--differentials of power that must be made visible and acknowledged before they can be challenged and changed.

To be clear, there are already a significant number of data scientists, designers, policymakers, educators, and journalists, among others, who share our goal of using data to challenge inequality and help change the world. These include the educators who are introducing data science students to real-world problems in health, economic development, the environment, and more, as part of the Data Science for Social Good initiative; the growing number of organizations like DataKind, Tactical Tech, and the Engine Room, that are working to strengthen the capacity of the civil sector to work with data; newsrooms like ProPublica and the Markup that use data to hold Big Tech accountable; and public information startups like MuckRock, which streamlines public records requests into reusable databases. Even a commercial design firm, Periscopic, has chosen the tagline, “Do Good With Data.” We agree that data can do good in the world. But we can do only do good with data if we acknowledge the inequalities that are embedded in the data practices that we ourselves rely upon. And this is where the bodies come back in.

In the rest of this chapter, we explain how it’s people and their bodies who are missing from our current data practices. Bodies are missing from the data we collect; bodies are extracted into corporate databases; and bodies are absent from the field of data science. Even more, it’s the bodies with the most power that are ever present, albeit invisibly, in the products of data science. Each of these is a problem, because without these bodies present in the field of data science, the power differentials currently embedded in the field will continue to spread. It’s by bringing back these bodies--into discussions about data collection, about the goals of our work, and about the decisions we make along the way--that a new approach to data science, one we call data feminism, begins to come into view.

Bodies uncounted, undercounted, silenced

One person already attuned to certain things missing from data science, and to the power differentials responsible for those gaps, is artist, designer, and educator Mimi Onuoha. Her project, Missing Data Sets, is a list of precisely that: descriptions of data sets that you would expect to already exist in the world, because they describe urgent social issues and unmet social needs, but in reality, do not. These include “People excluded from public housing because of criminal records,” “Mobility for older adults with physical disabilities or cognitive impairments,” and “Measurements for global web users that take into account shared devices and VPNs.” These data sets are missing for a number of reasons, Onuoha explains in her artist statement, many relating to issues of power. By compiling a list of the data that are missing from our “otherwise data-saturated” world, she states, we can “reveal our hidden social biases and indifferences.”

Onuoha’s list of missing datasets includes “People excluded from public housing because of criminal records,” “Mobility for older adults with physical disabilities or cognitive impairments,” and “Measurements for global web users that take into account shared devices and VPNs.” By hosting the project on GitHub, Onuoha allows visitors to the site to suggest additional missing datasets that she might include.

Credit: Mimi Onuoha

Source: https://github.com/MimiOnuoha/missing-datasets




The lack of data about women who die in childbirth makes Onuoha’s point plain. In the absence of U.S. government-mandated action or federal funding ProPublica had to resort to crowdsourcing to find out the names of the estimated 700 to 900 U.S. women who died in childbirth in 2016. So far, they’ve identified only 134. Or, for another example: In 1998, youth of color in Roxbury, Boston, were sick and tired of inhaling polluted air. They led a march demanding clean air and better data collection, which led to the creation of the AirBeat community monitoring project. Just south of the U.S. border, in Mexico, a single anonymous woman is compiling the most comprehensive dataset on femicides – gender-related killings. The woman, who goes by the name "Princesa," has logged 3,920 cases of femicide since 2016. Her work provides the most up-to-date information on the subject for Mexican journalists and legislators--information that, in turn, has inspired those journalists to report on the subject, and has compelled those legislators to act.

Princesa has undertaken this important data collection effort because women's deaths are being neglected and going uncounted by the local, regional, and federal governments of Mexico. But it’s not better anywhere else. The Washington Post and The Guardian US currently compile the most comprehensive national count of police killings of citizens in the United States, and not the U.S. federal government. But it’s powerful institutions like the federal government that, more often than not, control the terms of data collection--for several reasons that Onuoha’s Missing Data Sets points us towards. In the present moment, in which the most powerful form of evidence is data--a fact we may find troubling, but is increasingly true--the things that we do not or cannot collect data about are very often perceived to be things that do not exist at all.

Even when the data are collected, however, they still may not be disaggregated or analyzed in terms of the categories that make issues of inequality apparent. This is, in part, what is responsible for the lack of data on maternal mortality in the United States. While there is (as of 2003) a box to check on the official U.S. death certificate that indicates whether the person who died, if female, was pregnant at the time or within a year of death, it would require a researcher who was already interested in racial disparities in healthcare to combine those data with the data collected on race for the “three times more likely” stat that Serena Williams cited in her Facebook post to be revealed.

As feminist geographer Joni Seager states, "If data are not available on a topic, no informed policy will be formulated; if a topic is not evident in standardized databases, then, in a self-fulfilling cycle, it is assumed to be unimportant." Princesa's femicide map is an outlier, a case when a private citizen stood up and took action on behalf of the bodies that were going uncounted. ProPublica solicited stories and trawled Facebook groups and private crowdfunding sites in order to compile their list of the women who would otherwise go uncounted and unnamed. But this work is precarious in that it relies upon the will of individuals or the sustained attention of news organizations in order to take place. In the case of Princesa, this work is even more precarious in that it places herself and her family at risk of physical harm.

Sometimes, however, it’s the subjects of data collection who can find themselves in harm’s way. When power in the collection environment is not distributed equally, those who fear reprisal have strong reasons not to come forward. Collecting data on the locations of undocumented immigrants in the United States, for example, could on the one hand be used to direct additional resources to them; but on the other hand, it could send ICE officials to their doors. A similar paradox of exposure is evident among transgender people. Journalist Mona Chalabi has written about the challenges of collecting reliable data on the size of the transgender population in the U.S. Among other reasons, this is because transgender people are afraid to come forward for fear of violence or other harms. And so many choose to stay silent, leading to a set of statistics that does not accurately reflect the populations they seek to represent.

There is no universal solution to the problem of uncounted, undercounted, and silenced bodies. But that’s precisely why it’s so important to listen to, and take our cues from, the communities that we as data scientists, and data feminists, seek to support. Because these communities are disproportionately those of women, people of color, and other marginalized groups, it’s also of crucial importance to recognize how data and power, far too often, easily and insidiously align. Bringing the bodies back into our discussions and decisions about what data gets collected, by whom, and why, is one crucial way in which data science can benefit from feminist thought. It’s people and their bodies who can tell us what data will help improve lives, and what data will harm them.2

Bodies extracted for science, surveillance, and selling

Far too often, the problem is not that bodies go uncounted or undercounted, or that their existence or their interests go unacknowledged, but the reverse: that their information is enthusiastically scooped up for the narrow purposes of our data-collecting institutions. For example, in 2012, The New York Times published an explosive article by Charles Duhigg, "How Companies Learn Your Secrets," which soon became the stuff of legend in data and privacy circles. Duhigg describes how Andrew Pole, a data scientist working at Target, synthesized customers’ purchasing histories with the timeline of those purchases in order to detect whether a customer might be pregnant. (Evidently, pregnancy is the second major life event, after leaving for college, that determines whether a casual shopper will become a customer for life). Pole’s algorithm was so accurate that he could not only identify the pregnant customers, but also predict their due dates.

But then Target turned around and put this algorithm into action by sending discount coupons to pregnant customers. Win-win. Or so they thought, until a Minneapolis teenager's dad saw the coupons for maternity clothes that she was getting in the mail, and marched into his local Target to read the manager the riot act. Why was his daughter getting coupons for pregnant women when she was only a teen?!

It turned out that the young woman was, indeed, pregnant. Pole's algorithm informed Target before the teenager informed her father. Evidently, there are approximately twenty-five common products, including unscented lotion and large bags of cotton balls, that, when analyzed together, can predict whether or not a customer is pregnant, and if so, when they are due to give birth. But in the case of the Minneapolis teen, the win-win quickly became a lose-lose, as Target lost a potential customer and the pregnant teenager lost far worse: her privacy over information related to her own body and her health. In this way, Target’s pregnancy prediction model helps to illustrate another reason why bodies must be brought back to the data science table: without the ability of individuals and communities to shape the terms of their own data collection, their bodies can be mined and their data can be extracted far too easily--and done so by powerful institutions who rarely have their best interests at heart.

At root, this is another question of power, along with a question of priorities and resources-- financial ones. Data collection and analysis can be prohibitively expensive. At Facebook's newest data center in New Mexico, the electrical cost alone is estimated at $31 million annually. Only corporations like Target, along with well-resourced governments and elite research universities, have the resources to collect, store, maintain, and analyze data at the highest levels. It’s the flip side of the lack of data on maternal health outcomes. Put crudely, there is no profit to be made collecting data on the women who are dying, but there is significant profit in knowing whether women are pregnant.

Data has been called “the new oil” for, among other things, its untapped potential for profit and its value once it’s processed and refined. But just as the original oil barons were able to use that profit to wield outsized power in the world--think of John D. Rockefeller, J. Paul Getty, or, more recently, the Koch brothers-- so too do the Targets of the world use their data capital to consolidate control over their customers. But it’s not petroleum that’s extracted in this case; it’s data that’s extracted from people and communities with minimal consent. This basic fact creates a profound asymmetry between who is collecting, storing, analyzing and visualizing data, and whose information is collected, stored, analyzed, and visualized. The values that drive this extraction of data represent the interests and priorities of the universities, governments, and corporations that are dominated by elite, white men. We name these values the three S’s: science (universities), surveillance (governments) and selling (corporations).3

In the case of Target and the pregnant teen, the originating charge from the marketing department to Andrew Pole was: "If we wanted to figure out if a customer is pregnant, even if she didn’t want us to know, can you do that?" But did the teenager have access to her purchasing data? No. Did she or her parents have a hand in formulating any of the questions that Target might wish to ask of its millions of records of consumer purchases? No. Did they even know that their family’s purchasing data was being analyzed and recorded? No no no. They were not invited to the design table, even though it was one on which their personal data was put out on (corporate) display. Instead, it was Target--a company currently valued at $32 billion dollars--that determined what data to collect, and what questions to ask of it.

The harms inflicted by this asymmetry don't only have to do with personal exposure and embarrassment, but also with the systematic monitoring, control, and punishment of the people and groups who hold less power in society. For example, Paola Villareal's data analysis for the ACLU reveals clear racial disparities in the City of Boston's approach to policing marijuana-related offenses. (Additional analyses have found this phenomenon to be true in cities across the United States). In Automating Inequality, Virginia Eubanks provides another example of how the asymmetrical relationship between data-collecting institutions and the people about which they collect data plays out. The Allegheny County Office of Children, Youth, and Families, in Pennsylvania, employs an algorithmic model to predict the risk of child abuse. Additional methods of detecting child abuse would seem to be a good thing. But the problem with this particular model, as with most predictive algorithms in use in the world today, is that it has been designed unreflexively. In this case, the problem is rooted in the fact that it takes into account every single data source that it can get. For wealthier parents, who can more easily access private health care and mental health services, there is simply not that much data. But for poor parents, who primarily access public resources, the model scoops up records from child welfare services, drug and alcohol treatment programs, mental health services, jail records, Medicaid histories, and so on. Because there is far more data about poor parents, they are oversampled in the model, and disproportionately targeted for intervention. The model “confuse[s] parenting while poor with poor parenting,” Eubanks explains-- with the most profound of results.

Ensuring that bodies are not simply viewed as a resource, like oil, that can be “extracted” and “refined,” is another way that data feminism can intervene in our current data practices. Like the process of data collection, this process of extracting bodies is one that disproportionately impacts women, people of color, low-income people, and others who are more often subject to power rather than in possession of it. And it’s another place where bringing the bodies back into discussions about data collection, and its consequences, can begin to challenge and transform the unequal systems that we presently face.

Bodies absent from data work

One place where these conversations need to be happening is in the field of data science itself. It’s no surprise to observe that women and people of color are underrepresented in data science, just as they are in STEM fields as a whole. The surprising thing is that the problem is getting worse. According to a research report published by the American Association of University Women in 2015, women comprised 35% of computing and mathematical occupations in 1990, but this percentage dropped to 26% in 2013.4 They are being pushed out as “data analysts” have become rebranded as “data scientists,” in order to make room for more highly valued and more highly compensated men.5 We identify this later in the book as what we call a “privilege hazard,” one in which discrimination becomes hard-coded into so-called "intelligent systems,” because the people doing the coding are the most privileged-- and therefore the least well-equipped-- to acknowledge and account for inequity.6

This privilege hazard is a risk that can rear its head in harmful ways. For example, in 2016, MIT Media Lab graduate student Joy Buolamwini, founder of the Algorithmic Justice League, was experimenting with software libraries for the Aspire Mirror project. This project used computer vision software to overlay inspirational images (like a favored animal or an admired celebrity) onto a reflection of the user’s face. She would open up her computer and run some code that she’d written, built on a free JavaScript library that used her computer's built-in camera to detect the contours of her face. Buolamwini’s code was bug-free, but she couldn’t get the software to work for a more basic reason: it had a really hard time detecting her face in front of the camera. Buolamwini has dark skin. While her computer’s camera picked up her lighter-skinned colleague’s face immediately, it took much longer for the camera to pick up Buolamwini’s face, when it did at all. Even then, sometimes, her nose was identified as her mouth. What was going on?

Joy Buolamwini had to resort to "white face" to get a computer vision algorithm to detect her face. Many facial detection algorithms have only been trained on pale and male faces.

Credit: Joy Buolamwini

Source: https://medium.com/mit-media-lab/the-algorithmic-justice-league-3cc4131c5148

Permissions: Pending

What was going on was this: facial analysis technology, which uses machine learning approaches, learns how to detect faces based on existing collections of data that are used to train, validate, and test models that are then deployed. These datasets are constructed in advance, in order to present any particular learning algorithm with a representative sample of the kinds of things it might encounter in the real world. But problems arise very quickly when the biases that already exist in the world are replicated in these datasets. Upon digging into the benchmarking data for facial analysis algorithms, Buolamwini learned that they consisted of 78% male faces and 84% pale faces, sharply at odds with a global population that is majority female and majority non-pale.7 How could such an oversight have happened? Easily, when most engineering teams have 1) few women or people of color; and 2) no training to think about #1 as a problem.

Oversights like this happen more often than you might think, and with a wide range of consequences. Consider a craze that (briefly) swept the internet in Spring 2018. In order to promote awareness of its growing number of digitized museum collections, Google released a new feature for its Arts and Culture app. You could take a selfie, upload the image, and the app would find the face from among its millions of digitized artworks that looked the most like you. All over Facebook, Twitter, and Instagram, people were posting side-by-side shots of themselves and-- for instance, the Mona Lisa, American Gothic, or a Vermeer.

Well, white people were. Because most of the museums with collections that Google had helped to digitize came from the U.S. and Europe, most featured artworks from the Western canon. And because most artworks from the Western canon feature white people, the white users of the Arts and Culture app found really good matches for their faces. But some Asian users of the app, for example, found themselves matched with one of only the handful of portraits of Asian people included in those collections.

On Twitter, the response to this inadequacy was tellingly resigned. One user, @pitchaya, whose Tweet was quoted in a digg.com article on the subject, tweeted sarcastically: “If you do that whole Google Arts & Culture app portrait comparison as an Asian male, it gives you one of 5-6 portraits that hardly resembles you but, hey, looks Asian enough.” Another user, @rgan0, also quoted in the piece, called out Google directly: “The Google Arts and Culture app thinks I look like a “Beautiful [Japanese] Woman”! :p get more Asian faces in your art database, Google.”

And if the disparities of representation in Western art museums weren’t enough of a problem, some Art and Culture App users worried about something more insidious taking place. For app users to upload their images for analysis, they had to agree to allow Google to access those images. Were their images also being stored for future internal research? Was Google secretly using crowdsourcing to improve its training data for its own facial recognition software, or for the NSA? A short-lived internet uproar ensued, ending only when Google updated the user agreement to say: “Google won’t use data from your photo for any other purpose and will only store your photo for the time it takes to search for matches.”

But what if they had been? The art selfie conspiracy theorists weren’t actually too far from reality, given that earlier that year, Amazon had briefly been contracted by the Orlando Police Department to use its own proprietary facial recognition software, trained on its own proprietary data, to help the police automatically identify suspects in real time. How representative was Amazon’s training, benchmarking, or validation data? Was it more or less representative than the data that Buolamwini explored in her research? There was no way to know. And while a best match of 44% between Asian Art and Culture App users and Terashima Shimei’s Beautiful Woman (which is the painting @rgon0 matched with) might earn RTs of solidarity on Twitter, a best match of 44% between a suspected criminal and a random person identified through traffic camera footage--the image source for the Amazon project--could send an innocent person to jail.

Who any particular system is designed for, and who that system is designed by, are both issues that matter deeply. They matter because the biases they encode, and often unintentionally amplify, remain unseen and unaddressed--that is, until someone like Buolamwini literally has to face them. What’s more, without women and people of color more involved in the coding and design process, the new research questions that might yield groundbreaking results don’t even get asked--because they’re not around to ask them. As the example of facial analysis technology, or the Google Arts and Culture app help to show, there is a much higher likelihood that biases will be designed into data systems if the bodies of the system’s designers themselves only represent the dominant group.

Bodies invisible: The view from nowhere is always a view from somewhere

So far, we’ve shown how bringing the bodies back into data science can help expose the inequities in the scope and contents of our data sets, as in the example of the hundreds of unnamed U.S. women who die in childbirth each year. We’ve also shown how bringing back the bodies can help avoid their data being mined without their consent, as in the example of the Minneapolis teenager who Target identified as pregnant. And we’ve also shown how bringing bodies that are more representative of the population into the field of data science can help avert the increasing number of racist, sexist data products that are inadvertently released into the world, as in the example of the Google Arts and Culture app, or of the facial recognition software that is the focus of Joy Buolamwini’s research. (We’ll have more to say about some of the worst applications of computer vision, like state surveillance, in the chapters to come).

But there are other bodies that need to be brought back into the field of data science not because they’re not yet represented, but because of the exact opposite reason: they are overrepresented in the field. They are so overrepresented that their identities and their actions are simply assumed to be the default. An example that Yanni Loukissas includes in his book, All Data are Local, makes this point crystal clear: Marya McQuirter, a former historian at the Smithsonian Institution’s National Museum of African American History and Culture, recalls searching the Smithsonian’s internal catalog for the terms "black" and "white.” Searching the millions of catalog entries for “black” yielded a rich array of objects related to Black people, Black culture, and Black history in the US : the civil rights movement, the jazz era, the history of enslavement, and so on. But searching for “white” yielded only white-colored visual art. Almost nothing showed up relating to the history of white people in the United States.

McQuirter, who is Black, knew the reason why: in the United States, it’s white people and their bodies who occupy the “default” position. Their existence seems so normal that they go unremarked upon. They need not be categorized, because-- it is, again, assumed-- most people are like them. This is how the perspective of only one group of bodies--the most dominant and powerful group --becomes invisibly embedded in a larger system, whether it’s a system of classification, as in the case of McQuirter’s catalog search; a system of surveillance, as in the case of Amazon and the Orlando police; or a system of knowledge, as reflected in a data visualization, as we’ll now explain--

Whose perspective are we seeing when we see a visualization like this one of global shipping routes?

Time-based visualization of global shipping routes designed by Kiln based on data from the UCL Energy Institute.

Credit: Website created by Duncan Clark & Robin Houston from Kiln. Data compiled by Julia Schaumeier & Tristan Smith from the UCL EI. The website also includes a soundtrack: Bach’s Goldberg Variations played by Kimiko Ishizaka.

Source: https://www.shipmap.org/

We are not seeing any particular person's perspective when we look at this map (unless you are an astronaut on the space station and you have weird blue glasses on that make all the continents blue). In terms of visualization design, this is for good reason - it is precisely this impossible, totalizing view which makes any particular visualization so dazzling and seductive, so rhetorically powerful, and so persuasive.8 This image appears to show us the “big picture” of the entire world. Because we do not see the designers of this image, nor can we detect any visual indicators of human involvement, the image appears truthful, accurate, and free of bias.

This is what feminist philosopher Donna Haraway describes as “the god trick.” By the “god” part, Haraway refers to how data is often presented as though it inhabits an omniscient, godlike perspective. But the “trick” is that the bodies who helped to create the visualization – whether through providing the underlying data, collecting it, processing it, or designing the image that you see–have themselves been rendered invisible. There are no bodies in the image anymore.

Haraway terms this “the view from nowhere.” But the view from nowhere is always a view from somewhere: the view from the default. Sometimes this view comes into focus when considering what isn’t revealed, as in the case of McQuirter’s search query. But when we do not remind ourselves to ask what we are not seeing, and about who we are not seeing--well, that is the most serious body issue of all. It’s serious because all images and interactions, the data they are based on, and the knowledge they produce, comes from bodies. As a result, this knowledge is necessarily incomplete. It’s also necessarily culturally, politically, and historically circumscribed. Pretending otherwise entails a belief in what sociologist Ruha Benjamin, in Race After Technology: Abolitionist Tools for the New Jim Code, describes as the "imagined objectivity of data and technology,” because it’s not objectivity at all.

To be clear: this does not mean that there is no value in data or technology. What this means for data science is this: if we truly care about objectivity in our work, we must pay close attention to whose perspective is assumed to be the default. Almost always, this perspective is the one of elite white men, since they occupy the most privileged position in the field, as they do in our society overall. Because they occupy this position, they rarely find their dominance challenged, their neutrality called into question, or their perspectives open to debate. Their privilege renders their bodies invisible– in datasets, in algorithms, and in visualizations, as in their everyday lives.

Ever heard of the phrase, “History is written by the victors”? It’s the same sort of idea. Both in the writing of history and in our work with data, we can learn so much more-- and we can get closer to some sort of truth-- if we bring together as many bodies and perspectives as we can. And when it comes to bringing these bodies back into data science, feminism becomes increasingly instructive, as the rest of the chapters in this book explain.

In On Rational, Scientific, Objective Viewpoints from Mythical, Imaginary, Impossible Standpoints, we build on Haraway's notion of the god trick, exploring some reasons why emotion has been kept out of data science as a field, and what we think emotion can, in fact, contribute. We talk about emotional data, among data of many other forms, in What Gets Counted Counts--a chapter that emphasizes the importance of thinking through each and every one of the choices we make when collecting and classifying data. The next chapter, Unicorns, Janitors, Ninjas, Wizards, and Rock Stars, challenges the assumption that data scientists are lone rangers who wrangle meaning from mess. Instead, we show how working with communities and embracing multiple perspectives can lead to a more detailed picture of the problem at hand. This argument is continued in The Numbers Don’t Speak for Themselves, in which we show how much of today’s work involving “Big Data” prioritizes size over context. In contrast, feminist projects connect data back to their sources, pointing out the biases and power differentials in their collection environments that may be obscuring their meaning. We turn to the contexts and communities that ensure that the work of data science can take place in Show Your Work, a chapter that centers on issues of labor. In The Power Chapter, it’s, well, power, privilege, and structural inequality that we take up and explore. Teach Data Like an Intersectional Feminist provides a series of examples of how to implement the lessons of the previous chapters in classrooms, workshops, and offices, so that we can train the next generations of data feminists. And in Now Let's Multiply, we speculate about other approaches that might enrich a conversation about data science, its uses, and its limits.

 


There is growing discussion about the uses and limits of data science, especially when it comes to questions of ethics and values. But so far, feminist thinking hasn’t directed the conversation as it might. As a starting point, let’s take the language that is increasingly employed to discuss questions of ethics in data and the algorithms that they support, such as the computer vision and predictive policing algorithms we’ve described just above. The emerging best practices in the field of data ethics involve orienting algorithmic work around concepts like "bias," and values like "fairness, accountability, and transparency." This is a promising development, especially as conversations about data and ethics enter the mainstream, and funding mechanisms for research on the topic proliferate. But there is an additional opportunity to reframe the discussion before it gathers too much speed, so that its orienting concepts do not inadvertently perpetuate an unjust status quo.

Consider this chart, which uses Benjamin’s prompt to reconsider the “imagined objectivity of data and technology” in order to develop an alternative set of orienting concepts for the field. These concepts have legacies in intersectional feminist activism, collective organizing, and critical thought, and they are unabashedly explicit in how they work towards justice:

Concepts Which Uphold “Imagined Objectivity”

Because they locate the source of the problem in individuals or technical systems

Intersectional Feminist Concepts Which Strengthen Real Objectivity

Because they acknowledge structural power differentials and work towards dismantling them

Ethics

Justice

Bias

Oppression

Fairness

Equity

Accountability 

Co-liberation

Transparency

Reflexivity

Understanding algorithms

Understanding history, culture, and context

The concept of "bias," for example, locates the source of inequity in the behavior of individuals (i.e. a prejudiced person) or in the outcomes of a technical system (i.e. a system that favors white people or men). Under this conceptual model, a technical goal might be to create an "unbiased" system. First we would design a system, use data to tune its parameters and then we would test for any biases that result. We could even define what might be more "fair," and then we could optimize for that.

But this entire approach is flawed, like the imagined objectivity that shaped it. Just as Benjamin cautions against imagining that data and technology are objective, we must caution ourselves against locating the problems associated with “biased” data and algorithms in technical systems alone. This is a danger that computer scientists have noted in relation to high-stakes domains like criminal justice, where hundreds of years of history, politics, and economics, not to mention the complexities of contemporary culture, are distilled into black-boxed algorithms that determine the course of people’s lives. In this context, computer scientist Ben Green warns about the narrowness of computationally conceived fairness, writing that "computer scientists who support criminal justice reform ought to proceed thoughtfully, ensuring that their efforts are driven by clear alignment with the goals of justice rather than a zeitgeist of technological solutionism." And in keynoting the Data Justice Conference in 2018, design theorist Sasha Costanza-Chock challenged the audience to expand their concept of ethics to justice, in particular restorative justice which recognizes and accounts for the harms of the past. We do not all arrive in the present moment with equal power and privilege. When "fairness" is a value that does not acknowledge context or history, it fails to acknowledge the systematic nature of the “unfairness” perpetrated by certain groups on other groups for centuries.

Does this make fairness political? Emphatically yes, because all systems are political. In fact, the appeal to avoid politics is a very familiar move for those in power to continue to uphold the status quo. The ability to do so is also a privilege, one held only by those whose existence does not challenge that same status quo. Rather than designing algorithms that are "color blind," Costanza-Chock says, we should be designing algorithms that are just. This means shifting from ahistorical notions of fairness to a model of equity. This model would take time, history, and differential power into account. Researcher Seeta Peña Gangadharan, co-lead of the Our Data Bodies project, states, "The question is not 'How do we make automated systems fairer?' but rather to think about how we got here. How might we recover that ability to collectively self determine?"

This is why bias (in individuals, in data sets, or in algorithms) is not a strong enough concept in which to anchor ideas about equity and justice. In writing about the creation of New York’s Welfare Management System in the early 1970s, for example, Virginia Eubanks describes: "These early big data systems were built on a specific understanding of what constitutes discrimination: personal bias." The solution at the time was to remove the humans from the loop, and it remains so today: without potentially bad--in this case, racist-- apples, there would be less discrimination. But this line of thinking illustrates what Robin DiAngelo would call the "’new’ racism": the belief that racism is due to individual bad actors, rather than structures or systems. In relation to welfare management, this often means replacing the women of color social workers, who have empathy and flexibility and listening skills, with an automated system that applies a set of rigid criteria, no matter what the circumstances.

Bias is not a problem that can be fixed after the fact. Instead, we must look to understand and design systems that address oppression at the structural level. Oppression, as defined by the comic artist Robot Hugs, is what happens "when prejudice and discrimination is supported and encouraged by the world around you. It is when you are harmed or not helped by government, community or society at large because of your identity," they explain. And while the research and energy emerging around algorithmic accountability is promising, why should we settle for retroactive audits of potentially flawed systems if we could design for co-liberation from the start? Here co-liberation doesn't mean "free the data," but rather "free the people." And the people in question are not only those with less power, but also those with relative privilege (like data scientists, designers, researchers, educators; like ourselves) who play a role in upholding oppressive systems. Poet and community organizer Tawana Petty defines what co-liberation means in relation to anti-racism in the U.S.: "We need whites to firmly believe that their liberation, their humanity is also dependent upon the destruction of racism and the dismantling of white supremacy." The same goes for gender – men are often not even thought to have a gender, let alone prompted to think about how unequal gender relations seep into our institutions and artifacts and harm all of us. In these situations, it is not enough to do audits after-the-fact. We should be able to dream of data-driven systems that position co-liberation as their primary design goal.

Designing data sets and data systems that dismantle oppression and work towards justice, equity, and co-liberation requires new tools in our collective toolbox. We have some good starting points – building more understandable algorithms is a laudable, worthy research goal. And yet, what we need to explain and account for are not only the inner workings of machine learning, but also the history, culture, and context that lead to discriminatory outputs in the first place. Did you know, for example, that the concept of homophily which provides the rationale for most contemporary network clustering algorithms in fact derives from 1950s-era models of housing segregation? (If not, we recommend you read Wendy Chun). Or, for another example, did you know that the “Lena” image used to test most image processing algorithms is the centerfold from the November 1972 issue of Playboy, cropped demurely at the shoulders? (If not, Jacob Gaboury is the one to consult on the subject). These are not merely bits of trivia to be pulled out to impress dinner party guests. On the contrary, they have very real implications for the design of algorithms, and for their use.

How might we design a network clustering algorithm that does not perpetuate segregation, but actively strives to bring communities together? (This is a question that Chun is pursuing in her current research). How might we ensure that the selection of test data isn’t ever relegated to happenstance? (This is how the “Lena” image, which encoded sexism into the field of image processing, is explained away). The first step requires transparency in our methods as well as the reflexivity to understand how our own identities, our communities, and our domains of expertise are part of the problem. But they can also be part of the solution.

When we start to ask questions like: "Whose bodies are benefiting from data science?" "Whose bodies are harmed?" "How can we use data science to design for a more just and equitable future?" and "By whose values will we re-make the world?" we are drawing from data feminism. It’s data feminism that we describe in the rest of this book. It’s what can help us understand how power and privilege operate in the present moment, and how they might be rebalanced in the future.

Comments
123
Francis Harvey:

Another important point: more broadly considered, we make choices, we have responsibilities, we have values. All three have multiple dimensions that we (should) consider in our actions.

Francis Harvey:

True, and important. The argument to pursue (as with the breast pump) is maybe the question of how institutions and elite men find, use and abuse their power?

Francis Harvey:

What is made invisible and what is veiled and what we choose not to see are what undermines claims to objectivity. The last point, which involves (self-)reflection, is equally of importance.

Francis Harvey:

invisible like the royal “we” for the queen? or institutionalised as, for example, a corporation?

Francis Harvey:

Valuable insight, yet the _misinformed_ policy that arises from a lack of a data or uncritically considered data is, from experience, potentially even far worse

?
?
Kathleen Kenny:

Works by Indigenous Data Sovereignty movements could be useful here too. But maybe you have included cites about this in another chapter?

https://fnigc.ca/sites/default/files/docs/indigenous_data_sovereignty_toward_an_agenda_11_2016.pdf

https://nnigovernance.arizona.edu/good-data-practices-indigenous-data-sovereignty?fbclid=IwAR3FVotCxC34JUQPnlTKbNxYcaaFkt5KKD_l0S1ZG9XW31G07alD5O5ubAQ

Goda Klumbyte:

The work of Helena Suarez Val can be useful here too. She analyses the tracking of feminicides in Uruguay and around Latin America in general. https://warwick.ac.uk/fac/cross_fac/cim/people/helena-suarez-val/, https://sites.google.com/view/feminicidiouruguay

?
Nikki Stevens:

to be fair, the image itself didn’t encode sexism into the field - the image’s choice and persistence was a manifestation of the sexism of those involved.

?
Nikki Stevens:

perhaps i missed a clear definition of how you’re using the word “bias.” This sentence implies that it can be fixed “before” but not after. Depending on the use it either cannot be fixed at all (because it’s a philosophical problem) or - as I would argue with biased algs - it can be fixed after the fact with a complete technical restructuring or reevaluation.

?
Nikki Stevens:

there’s some sexism coded here - the user doesn’t seem to want to be compared to a woman either. Worth calling out the gender implications of this face match (even in a footnote)

?
Nikki Stevens:

technically, no codebase is bug-free, though I do understand the point. Perhaps something like “her code was written to the library’s specifications, yet…”

?
Nikki Stevens:

also LGBTQIA2S+ folks

?
Nikki Stevens:

we don’t yet know what “reflexive design” would entail. I’d love to see a clearer - and more pointed - word here. The Allegheny algorithm is racist.

?
Pratyusha Kalluri:

I would encourage including a section at the end of this chapter (and every chapter!) that bullet points some concrete ways of enacting the chapter title “bringing back the bodies”. Such a list would not need to claim to be comprehensive but could leave the reader with tangible starting points to move from theory to better data science practices!

?
Pratyusha Kalluri:

Want to leave this comment somewhere: I would love to see more of Mimi and Mother Cyborg’s “A People’s Guide to AI” incorporated! They discuss this so well! https://www.alliedmedia.org/peoples-ai

?
Pratyusha Kalluri:

“that we give it”? I think current wording reinforces a false and fear-based narrative that current AI systems literally search the web for anything they can find about you, and while that could become more common in the future the public deserves to know the current reality, un-exaggerated.

?
Diane Mermigas:

One of the more formidable, but elusive elements of your premise is conditioning recipients to more astutely filter and evaluate information and its sources so that tightly focused data is put to better use. It is part of the vast “rebalancing” of power, privilege, systems, structures and other factors you so doggedly pursue. You deal with many critical concepts in this book that have broad implications - which can be both a blessing and a curse. Off to a great start.

?
Aristea Fotopoulou:

I am not sure I understand this table or the division between real and imagined objectivity. Also as someone who writes about feminist ethics I am troubled to see that a preoccupation with ethics is considered un-real and uncontextual or individual-focused. I generally don’t get this table, sorry I won’t join in the enthusiasm of other commentators!

Lauren Klein:

Thanks for voicing your concerns, Aristea. You’re right that feminist ethics is an important conversation, and we’ll be sure to acknowledge that in our expansion of this table as we revise.

?
Aristea Fotopoulou:

Could expand on what Benjamin adds to the debate about imaginaries of big data?

?
Aristea Fotopoulou:

It is not entirely clear to me how/why you get back to the argument about the necessity of data feminism - and whether the book will recount instances of data feminism at a later stage. Also, I would like to see some further clarification as to how the examples provided above are about bodies rather than well, structural inequality.

?
Aristea Fotopoulou:

I find the jump from maternal health to women who are dying quite abrupt - surely there are many nuances and aspects in maternal health and illness.

?
Aristea Fotopoulou:

These are great examples. I wonder if it would be useful to signal the US focus?

?
Aristea Fotopoulou:

Great examples.

Elizabeth Losh:

Usually readers learning about future chapters get at least a paragraph per chapter as a preview to the work. I realize that you want to challenge academic prose style (and its long windedness), but you might want to show how your themes represent both related and contrasting strains of thought.

Elizabeth Losh:

Todd Presner’s early work on aerial perspective and GIS might be useful here, since he places it in a European intellectual heritage.

Lauren Klein:

Would love to include Jacque’s work here!

Elizabeth Losh:

Jacque Wernimont’s work on life counts and death counts seems obviously important for this chapter, which joins them together in the concept of “maternal mortality”

Elizabeth Losh:

Because there are a few projects about missing or murdered indigenous women, it might be useful to call out how different actors respond to a perceived lack in data reporting.

Elizabeth Losh:

Because you are calling out another case of metadata activism with #MeToo (as opposed to the dataviz activism that seems to be the assumed subject of this book), it looks like you need to say something about “raw data” (and its mythologies) sooner in the text.

Elizabeth Losh:

Like others, I am not sure if the rhetorical address is calling too much attention to itself. Maybe you want to think about emphasizing questions about your potential audience (data scientists AND feminists) more in your introduction.

Elizabeth Losh:

I wonder if more could be said about the difference between the mothering stats and the sports stats, since the baby book stats are obviously feminized and sports stats are obviously masculinized, as a way to make connections in this chapter. Perhaps look at Jill Walker Rettberg on mother apps in _Seeing Ourselves Through Technology_?

Lauren Klein:

Good point! We’ll see if we can work in an additional analysis along these lines.

Momin M. Malik:

maybe “model outputs”? These are different kinds of bias, though: interpersonal bias, sampling bias, and biased estimation (which may or may not matter, but there’s also unequal distribution of errors, which can be called a bias but doesn’t have a formal label). You somewhat go through those in the remainder of the paragraph, since the lack of clarity and multiplicity is your entire point, but also add in institutional bias, which is missing in this sentence. Maybe make this multiplicity more explicit?

Momin M. Malik:

See also Keiran Healy, “The Performativity of Networks.” Network models create the reality they purport to describe. I try to do an empirical version of this critique in my own work, https://www.mominmalik.com/malik_chapter2.pdf

Momin M. Malik:

See also “Shirley” from “Shirley cards”: https://www.cjc-online.ca/index.php/journal/article/view/2196, which I’ve linked to facial recognition in presentations but I’m sure somebody has done systematically. See also Vox’s video on it, https://www.youtube.com/watch?v=d16LNHIEJzs.

Momin M. Malik:

Great point! There’s also a long history here, I think it’s Elizabeth Yale who works on automation being intimately connected to desire to replace lower classes who could rise up since the middle ages, and fears of the lower class being projected onto automation taking over.

Momin M. Malik:

While you can’t include my heresay in the book, I can confirm this is a problem with all the attempted technical solutions to fairness. They all preserve fairness as an aspect of the data as it is, none of them have the ability to incorporate historical inequity and injustice (understandable, since it would have to be quantified to include in a model, but it reveals the limitations of technical fixes).

Momin M. Malik:

Ben has a paper with Lily Hu about this as well, I believe, who also has fantastic work on this topic. She has a paper, I don’t think out quite yet, critiquing the counterfactual framework for fairness (i.e., “if this person were white, would they have received the same score?” as though we could “toggle” race as separate from everything else).

Momin M. Malik:

This could be a few things, but I think “mathematical formalism” or “modeling the world” would be better in this cell than algorithms

Lauren Klein:

Great suggestions!

Momin M. Malik:

Maybe use the example of the reaction to Sonia Sotomayor’s “wise Latina” comment?

Lauren Klein:

I like this suggestion!

Momin M. Malik:

Did that come from Haraway or Thomas Nagel? It sounds like you’re attributing it to Haraway, and if it did indeed come from Nagel and Haraway is critiquing it or using it as a critique, this phrasing is misleading. Maybe even “Haraway critiques this so-called ‘view from nowhere.’”

Momin M. Malik:

Couple of things here.

First, pet peeve, I don’t like how people in machine learning casually conflate “algorithm” and “model.” It’s common usage to say “train an algorithm” but I would say that it’s an algorithm that trains a model. Anyway, until I publish on this you can only go with common usage, but this is an issue for you below when you say “algorithmic model.” That brings in confusion. Are these algorithms? Models? Data? Are there algorithms that are not (statistical) models? (Yes, the vast majority of algorithms.) Are there models that are not algorithms? (Yes, models can be abstracted away from implementing algorithms, although to be used they need some implementation.) Are there data that are not models? (Debatable, but all data is theory-laden, as in “experimenter’s regress”. Henry Collins, 1981, “Son of Seven Sexes: The Social Destruction of a Physical Phenomenon.”) Having some breakdown/explanation of terms might help.

Second, the anecdote is compelling, but how many people were similarly targeted who didn’t turn out to be pregnant? How many people were predicted to be pregnant, but who had purchased pregnancy kits 9 months prior? (Which is a strong signal) Out of people *without* such strong, obvious signals, how many identifications were accurate?

Third, language. “Infer” is a technical term in statistics, and refers to “statistical significance.” Specifically, statistics posits a hypothetical underlying “truth” that produces data with some noise, designs functions (“estimators”) that can take in data and reverse-engineer properties of that underlying “truth”, which is called “estimation”. Inference is about whether the signal we get from data is strong enough to make conclusions about that underlying “truth.”

”Inference” is sometimes used in machine learning in a more colloquial way, and used to describe exactly the tasks that are statistical “estimation.”

I would recommend: “detect” pregnant customers. Later down: “when analyzed together, can detect whether or not a customer is pregnant, and if so, give a prediction of when they are due to give birth.“ Instead of “pregnancy prediction algorithm”, perhaps “pregnancy model”.

Momin M. Malik:

I suggest explicitly giving examples of listening to and reading narratives, forming coalitions, and consuming theory produced by marginalized people is an alternative way to have reliable, “true” knowledge about experiences outside of our own that does not reduce to needing data.

Momin M. Malik:

Same point; this is a great descriptive point, but what it the prescription? Accept this state of affairs, and gather “counter-data”? (To take a term from Morgan Currie, Britt S Paris, Irene Pasquetto and Jennifer Pierre, “The conundrum of police officer-involved homicides: Counter-data in Los Angeles County”, Big Data & Society, https://dx.doi.org/10.1177/2053951716663566) I would say the strategy should be a mix. For police homicides, one would be to believe black people when they say the police are a force of terrorism in their communities, and have been since police forces first formed. Maybe forming coalitions will require “proving” racism to non-black people with data, but as data feminists we should recognize both that this may be necessary but is still wrong.

Momin M. Malik:

Chemist turned philosopher of science Michael Polanyi is quite influential in history of science and in STS. He wrote this in _The Tacit Dimension_ (1966):
“The declared aim of modern science is to establish a strictly detached, objective knowledge. Any falling short of this ideal is accepted only as a temporary imperfection, which we must aim at eliminating. But suppose that tacit thought forms an indispensable part of all knowledge, then the ideal of eliminating all personal elements of knowledge would, in effect, aim at the destruction of all knowledge. The ideal of exact science would turn out to be fundamentally misleading and possibly a source of devastating fallacies.”

This remains an important critique of the view that science should (or even can) be “objective.” Lorrain Daston and Peter Galison, on the other hand, have a whole book about how objectivity came to mean what it does.

Momin M. Malik:

Recalling the Toni Morrison quote, and “your demand for statistical proof is racist” in my comment for the Introduction: the black women who reached out to Williams know what happens to them. But Williams still had to cite a statistic to be credible. We can acknowledge that this is the state of things, but still critique it. Maybe data is power, and we should try to share this power, but also, maybe data should not be power. This also applies to something you write further down, “In the present moment, in which the most powerful form of evidence is data--a fact we may find troubling, but is increasingly true”, but must it be true?

Lauren Klein:

Good point. This is an issue we’re hoping to address with more nuance in the revision.

?
Yanni Loukissas:

Is this the right word?

Francis Harvey:

agreed. Does ‘it’ refer to data or the “god”?

Momin M. Malik:

+1. Extracting market value from bodies?

?
Yanni Loukissas:

This is another permutation of the strange metaphor. Are the bodies extracted or is the data extracted? I’m not sure that either one is right…

?
Yanni Loukissas:

I wonder if there is a missed opportunity here to address all the metaphors around data and how they mislead us from learning about where data come from?

Lauren Klein:

We thought we addressed this here and in the intro, but perhaps we’ll need to make it more explicit, since it sounds like it’s not coming through to all readers.

?
Yanni Loukissas:

Mixed metaphors here. I don’t specifically object to the use of the term “bodies,” which I know has a history in gender and race studies, but let’s talk about the specifics of how bodies are datafied, rather than using the terminology of industry (i.e. mining, extraction and so forth).

Lauren Klein:

Datafication is also a term that has been developed in response to industry. We hope that our implicit critique of these terms—and datafication, too, which we discuss in the intro— comes through.

?
Yanni Loukissas:

Perhaps I’m overthinking this sentence, but it seems to imply that bodies inherently hold information that is waiting to be extracted. This obscures processes of datafication and how they torque bodies to conform to preexisting categories. See Sorting Things Out for more on these processes.

Momin M. Malik:

Agree about the relevance of Bowker and Star’s book, which I think is incredibly relevant to modeling and which I don’t see connected to it often enough, although I’m not sure if this is the right place to bring it in.

+ 1 more...
?
Aristea Fotopoulou:

I wouldn’t make this argument as yet because the example is mainly about race.

Lauren Klein:

Hm. I do see this example as one of class as well, so it seems like we’ll need to clarify that a bit more.

?
Aristea Fotopoulou:

Would it be possible to compare with or indicate what happens elsewhere in the world (non-US)?

?
Aristea Fotopoulou:

I am wondering about the links provided in the script - I am sure you have thought about this carefully - how is this going to work with the printed copy? Are these going to appear as footnotes?

Lauren Klein:

The links will appear in the ebook version but not in the print version. We will have more substantial footnotes in the final version, though.

?
Heather Krause:

I might consider removing this sentence. It’s likely that they did have to agree at some point. It would have been a totally terrible manner, with out truly informed consent - but they probably agreed somewhere. And leaving this loophole weakens the entire argument. For example in the EU now people definitely have to agree - but the first two points are still valid in the EU. I’m not saying that this third sentence is invalid but I think it weakens the argument.

?
Yanni Loukissas:

I could use some more elaboration on this. Catherine Borgman argues that all “alleged evidence” is data. How are you characterizing data here: as a specific kind of evidence?

?
Yanni Loukissas:

To further Shannon’s point, it is not so much that people and bodies are missing. We just don’t explicitly acknowledge their importance to data practices: what kinds of bodies are creating contemporary data systems and how are those systems handling —categorizing, classifying, and otherwise “torquing” (see Star and Bowker) bodies?

Lauren Klein:

Thanks, This seems to be a sticking point for a lot of people. We’ll need to be more explicit about how we are using the term “bodies” in a conceptual sense.

+ 1 more...
?
Yanni Loukissas:

I think it might be worth considering, as well, how “challenging” data can help change the world. Data will always be collected unevenly, with more data being created and processed by institutions in power. Can you also give readers the conceptual tools to question the dominance of data collection over other ways of knowing?

Marian Dörk:

to me this chapter is primarily about making the case that data feminism is not some kind of abstract theory, but that it is practically concerned with power and privilege impacting the bodily experience of many people in the physical/social/cultural world - so as concrete as it can be, in my view. while i found the emphasis on the term ‘bodies’ a bit problematic at times, i can see the value of making this point early on in the book to establish that data and algorithms do not belong to some kind of disembodied domain, but that the politics of our data (analysis) have concrete consequences. in this vein, i would recommend sharpening the chapter a bit more on this theme. maybe Haraway’s godtrick and Chun’s work on homophily could be mentioned in a later chapter? (though they really need to be in the book)

Marian Dörk:

maybe it’s a personal bias, but i am not a big fan of extended footnotes that serve as a secondary argument, when it’s often actually more important. to me these references and their contributions could/should be woven into the main thread.

Marian Dörk:

to me this paragraph does not flow too well after the intricate recognition of intersecting differentials… maybe close previous paragraph with acknowledgement that these differentials also pervade data practices?

Marian Dörk:

to me this reads as if this subsumption was deliberate. my impression - in good faith - is that Burke’s marginalization was exclusively structural in that white feminists have a privileged, more powerful platform

Lauren Klein:

While I take your point, I’m not sure that Burke’s marginalization (or any) could be said to be “exclusively structural.” Certainly the nineteenth century (which I’ve studied in detail) has many examples of white women explicitly excluding black women from their organizing projects. I’m not sure of the specific actors involved here, but I’d be hesitant not to suggest a possibility that personal politics were also at play.

Marian Dörk:

i am not sure whether your argument really needs this/such interjection. i assume that you might have many readers who are familiar with data practices that involve the quantification of bodies: from historic examples in phrenology and criminology to more contemporary cases in medicine, quantified self, surveillance, etc…

Marian Dörk:

i found this recent study that looked at types of complications as well as the factor of race - may be worth including: https://www.hcup-us.ahrq.gov/reports/statbriefs/sb243-Severe-Maternal-Morbidity-Delivery-Trends-Disparities.jsp

?
Zara Rahman:

👏🏾👏🏾👏🏾

?
Zara Rahman:

Aren’t you saying earlier in this chapter though that ‘objectivity’ is impossible? As in - isn’t it, ‘if we truly care about accuracy’ or something else, then ‘we must pay close attention…’ ?

Momin M. Malik:

Agreed. I think you should say, what is objectivity supposed to achieve? And then use those target values, rather than objectivity itself. I’m sure there’s tons of literature about this, I only know critiques of objectivity as a coherent goal in science.

?
Zara Rahman:

As per my comment above - I feel like this is a little overly-simplified when it comes to the complexity of the problems described here.

?
Os Keyes:

Agreed. I wrote a paper on this precise topic and technology that makes clear the problem is the technology, not its representativeness https://ironholds.org/resources/papers/agr_paper.pdf

?
Zara Rahman:

This feels like a slightly over-simplified two-step solution - to my understanding, even if you solve both of these issues, one huge problem (which some think of as a good thing!) when it comes to getting training data for facial recognition systems is that there’s not enough high-quality data of non-Caucasian faces to use for training data. This historical issue can’t be solved with just more diverse system designers – it’s a bigger systemic issue that is more about coverage/spread of digital technologies IMHO.

?
Yanni Loukissas:

Agreed. Moreover, I’m not sure it is useful to think about this as a global problem. Facial recognition algorithms, like all algorithms, are built and used in contexts that matter. In other words, they are deemed to fail or succeed in highly localized ways that change over time. Buolamwini’s story, set at an elite technical university in the United States during a period of high (public) racial tension, seems like good evidence of that. Can we see this as a story about the particular ways in which algorithms are enrolled in the way we think about our bodies, rather than just a problem of objective correctness on a global scale?

+ 1 more...
?
Zara Rahman:

Hypothetically – would it have been any better if Target had carried out focus groups/done co-design with teenagers + parents, but still ultimately with the same ‘originating charge’? (ie. isn’t the problem here that they started with an inappropriate design question, rather than that they weren’t collaborative at the design table?)

?
Zara Rahman:

This seems like a very capitalist approach to understanding the value of data! I think it’s worth mentioning this understanding of it - but worth also critiquing it a little?

Momin M. Malik:

I was surprised to to see what I saw as a natural extension: talk about how exploitative extraction of nonrenewable resources has caused a global catastrophe.

+ 1 more...
?
Zara Rahman:

What do you mean by data ‘at the highest levels’? If it’s storing huge amounts of data’ it might be worth stating that explicitly (though it’s worth noting that the financial cost of data storage continues to drop) - or is it collecting data from vast amounts of the population?

?
Zara Rahman:

in case you want any - more examples/case studies here http://civicus.org/thedatashift/learning-zone-2/case-studies/

?
Zara Rahman:

a new approach to “working with data” ?

?
Zara Rahman:

i feel like data scientists who have trained in statistical/scientific methods would be (i hope!) among the first to recognise statistical biases – whereas people who know less about data, might say things like this but perhaps more about ‘data’ rather than ‘data science’. Is calling it ‘data science’ here potentially putting non-data-scientists off from understanding that these points apply more broadly, too?

Momin M. Malik:

A tangential comment on this to say that data scientists may, in fact, not recognize or maybe may not care about bias. I have been impressed with a vast literature of statisticians being thoughtful and reflective about insurmountable, fundamental limitations in statistics (although not to the point of being critical, or losing faith in statistics). Machine learning, from which data science comes out more immediately (from what I observe), is re-discovering some of these critiques, but slowly. In machine learning (and even modern statistics), we are told, “classical statistics cared about unbiased estimators. But we now recognize that sometimes, biased estimators can predict better.” The “bias” here is a technical word, and means “inaccuracy” more than anything like institutional or interpersonal bias (although inaccuracies have implications for bias when it comes to demographic study). The point being, being “unbiased” is already dismissed in technical terms towards an instrumentalist goal (I can give literature that expands on this, this is a pretty important point but understudied and under-acknowledged), and I feel like this affects how people think about other sources of bias as well.

Rebecca Michelson:

I think there is something to be said about the inequities in data literacy as well. People are constantly pressured to accept jargon-filled terms and conditions related to data privacy and tech use. Most people are actively giving away lots of data without any understanding of the implications—but also without much of a real choice, in order to use most ubiquitous technologies.

Rebecca Michelson:

Yes- I was thinking of this example earlier. I’m glad you use it in this book!

Rebecca Michelson:

A related issue here seems to be increased surveillance and lack of data privacy/protection which puts marginalized groups at risk.

Rebecca Michelson:

I like how you already illustrated this point with Christine Darden’s story in the introduction.

Amanda Makulec:

The time gap between when Burke originally coined the #metoo phrase and when it was commandeered for wider use and popularized by white feminists is significant here. There’s increased name recognition of her contributions, but the ten year gap between the founding of her movement and nonprofit and the launch of #metoo is worthy of note. https://www.nytimes.com/2017/10/20/us/me-too-movement-tarana-burke.html

Lauren Klein:

Glad to know about this phrase. We discuss another project of Nafus’s, the Atlas of Caregiving, in the Labor chapter.

?
Yoehan Oh:

I found one article that addressed the very similar concern in terms of “undone Science,” which might be referred here, though it deal with not racial, gender, or feminism issues, but general inequality issues.

Nafus, Dawn. "Exploration or Algorithm? The Undone Science Before the Algorithms." Cultural Anthropology 33, no. 3 (2018): 368–374. https://doi.org/10.14506/ca33.3.03

“In these projects, there was a good deal of undone science that got done precisely because the goal was not specifically to end in algorithm design. Clues about the sources of stress or illness were surfaced. In science and technology studies, the concept of undone science points to the choices made about which research questions are asked and which go underinvestigated, such as the many unasked questions in environmental health (Frickel et al. 2010). Even if the new knowledge we were creating was social and cultural, not necessarily scientific, this notion encourages us to think about how critique can take the form of knowledge production that opens up or elaborates a particular line of inquiry, rather than simply identifying problems with current technical systems. Both these projects pointed to undone science that needed doing. They surfaced alternative lines of inquiry by appropriating datasets that were originally designed to fit very different categories and by giving the subjects of that data the opportunity to reframe and reconsider its meaning.”

+ 1 more...
?
James Scott-Brown:

Gregory Piatetsky is skeptical of this claim: https://www.kdnuggets.com/2014/05/target-predict-teen-pregnancy-inside-story.html

?
James Scott-Brown:

Femicides are specifically killings of women or girls because they are female/on account of their gender.

In constrast, ‘gender-related killings’ is a much broader category that could include someone who was killed because they were a man, or transgender or non-binary.

Lauren Klein:

Oops! Thanks for catching that!

?
James Scott-Brown:

In this screenshot, the Facebook chat bar in the bottom-right corner obscures some of the text of Serena’s message.

?
Yoehan Oh:

This list is fascinating! And I hope that the screen capture date is noted in the image caption. As the strikethrough at the first item showed, this list is not fixed, but keeps updated.

?
Yoehan Oh:

As “you” was commented in the above place, “we” as a group I think needs to be specifically used. Whose work is “our work”? I don’t think it means two authors’ work. Then does it mean society at large including data scientists? And the usage of “one ‘we’ call data feminism“ add more confusion in identifying who we are.

Lauren Klein:

Good point. We need to clarify both the times when we use “we” to refer to Lauren and Catherine, and also when we use “we” to refer to the field of data science, which we’re hoping to carry along with us in this journey.

?
Yoehan Oh:

I think authors need to make explicit what they have in mind when saying “data science.“ It is a little bit vague term so depending on the definitions, it can be an indicator to encompassing enterprises or to a specific set of fields. I think authors might mean the former. Without proper definitions, it might take a risk of reifying and objectifying “the” data science. And in the Introduction I found authors already gave an example of data visualizations, which "the data science underlies."

Momin M. Malik:

Agree. See my comment on the Introduction about defining data science.

?
Yoehan Oh:

The sentence structure of “Because (…), but (…)” is confusing. I think authors might mean “Even though (…), (…).”

?
Yoehan Oh:

I feel difficult to imagine what you try to describe. How can data science relies upon bodies or (bodies data?) “to make decisions about data”?

Rebecca Michelson:

It could be helpful to name a few concepts related to biases in data and research.

?
Yoehan Oh:

I think it would be helpful and help make your point precise if some stereotypical examples, which render data science irrelevant to bodies, are presented after this sentence .

?
Yoehan Oh:

Is there any bibliographic information of this (perhaps) book?

Marian Dörk:

+1

Also the footnote mentioning “Now let’s multiply” needs bibliographic info.

+ 1 more...
Lauren Klein:

A nice counterpoint to work in: https://www.smithsonianmag.com/history/remembering-howard-university-librarian-who-decolonized-way-books-were-catalogued-180970890/

?
Shannon Mattern:

Perhaps acknowledge Genevieve Yue, too, who published a concurrent piece on the “China Girl": https://www.academia.edu/15365886/China_Girls_on_the_Margins_of_Film

Lauren Klein:

Yes!

?
Shannon Mattern:

Some readers might imagine exceptions, like weather data: a weather map registers environmental forces, not people! But even weather data is, of course, harvested via instruments *designed by people*, modeled using software *designed by people*, impacted by climatic forces transformed by humankind.

Lauren Klein:

Good point. We should elaborate this a bit more.

?
Shannon Mattern:

Maybe, to complete the shipping map example, you could close by telling us what bodies we *could* be seeing if this map were rendered at different scales: dockworkers, pilots, ship officers and engineers, truck drivers, gas station attendants, diner waitstaff, etc.

Lauren Klein:

Yes! We actually do this in the “Show Your Work” chapter, but I think you’re right about the point you make elsewhere that we should not repeat examples in the book, even if we elaborate them later.

?
Shannon Mattern:

Great, punchy sentence :)

?
Shannon Mattern:

This is a big concern in mapping, too. When does rendering something visible also render it vulnerable? This Twitter thread offers a great list of examples: https://twitter.com/shannonmattern/status/1052731087317815296 Given that cartographers and their partners in environmental resource management, endangered species advocacy, archaeology, etc. have *long* been thinking about the “paradox of exposure,” perhaps their work is worth an endnote?

Lauren Klein:

For sure. We intend to amp up our references to critical cartography, and our references in general, in the final version.

?
Shannon Mattern:

The phrasing here is a bit awkward. Perhaps instead: “…within a year of death, we’d need a researcher who was already interested in racial disparities in healthcare, who could then combine those data with data collected on race, to reveal the “three times more likely” stat that Williams cited in her Facebook post.” ?

Gabriela Rodriguez Beron:

It also seems that everywhere maps on femicides seems to be coming from private citizen. There is an issue in the collection of data inside the justice departments and patriarcal visions of what a feminicide is.

?
Shannon Mattern:

Really glad to see that you’re incorporating examples from the art and design worlds!

Lauren Klein:

We love Mimi’s work, and it was a major source of inspiration for us.

?
Shannon Mattern:

Again, I would say that there are notable exceptions, including data-driven medicine any anything that employs biometrics (e.g., customs and immigration). Datafied bodies seem to be front and center in these fields — in practice, in public perception and media representations of that practice, etc.

?
Michelle Doerr:

Could you expand on your reasoning behind data-driven medicine being an exception?

?
Shannon Mattern:

I’d say precision medicine and other biometric applications are key exceptions here.

Lauren Klein:

Thanks for calling this out. Our point here wasn’t that data science doesn’t operate on human bodies, or affect them; but rather that data and DS doesn’t always consider the whole person (and context) behind the data— even in applications like health (where medical data derives mostly from white men) and certainly biometrics. A helpful clarification to make as we elaborate what we mean about “bodies” in this chapter.

Jaron Heard:

I found this very powerful.

Jaron Heard:

I’d love to see an example a couple of orders of magnitude higher here. I think this under-represents the financial investment in data collection and analysis by companies like Facebook. And thinking about this example in particular, one might think that only a fraction of the costs of the data center are related to collection and analysis.

Lauren Klein:

Good point. Now I’m wondering if the stats about the “big five” being worth more than the (pre-Brexit) UK GDP made it into this draft anywhere.

Jaron Heard:

Missing a period. (Also, I have another organization for your list 😉)

Rebecca Michelson:

In addition to considering naming more organizations, it might be relevant to mention the civic tech field and groups like Code for America.

Anne Pollock:

The chapter as a whole is really rich, and the language refreshingly accessible. As you revise, there are a few fundamental elements that should be better articulated: handling the heterogeneity of examples; the stakes of “bringing back bodies;” and the simultaneity of the promise and peril of visibility.

First, the heterogeneity. There is a disconcerting flattening, in the chapter, between African American women’s life-threatening experiences during childbirth and Asian American Twitter gripes about getting lousy matches on a game. Not all racisms are equally deadly. This isn’t to suggest that you should cut any of the examples, but you should treat the differences among them with more care. Especially since you specifically credit Black feminism as an inspiration, it’s worth thinking through how anti-Black racism in particular is foundational to the United States, and that centring Black women has a distinctive value for intersectional feminisms.

Second, the bodies. As you surely already know, bodies are not simply givens about which data can be straight-forwardly extracted. For example, for Haraway, the location that a view from somewhere comes from is not just physical but also social and historical. Knowledge-makers learn how to see with the assistance of technologies ranging from microscopes to taxonomical categories, and so situated knowledges are products of minds as well as eyeballs. If you are committed to the idea that bodies are themselves at stake, you need to make clear what paying attention to bodies gives you that, say, listening to Black women doesn’t.

Third, the promise and perils of data. The beginning of the chapter seems to operate on the assumption that more data is a good thing, but later in the chapter, we learn that being overexposed to surveillance is also a problem. How might you hold that tension throughout the chapter, rather than treating the elements in turn?

Lauren Klein:

Thanks so much for these broad comments, Anne, which I’m just seeing now. I appreciate each of the issues you raise, and they’re ones we’ll take to heart (and mind and typing hands) as we revise.

Anne Pollock:

As of now, the chapter doesn’t really come full circle. Might either de-emphasize Serena Williams’ birth story at the opening or come back to it at the end?

Momin M. Malik:

Agree.

Anne Pollock:

:) Here’s Ruha! Might still cite her other work above, but glad to see it here.

Anne Pollock:

This wording leaves it ambiguous whether Haraway herself argues that the view from nowhere is always a view from somewhere or whether that’s your addition - reword to make clear that it’s the former.

Lauren Klein:

Noted.

Anne Pollock:

I would caution against naturalizing this assumption. It is perfectly possible for someone of one race to resemble someone of another race. The decision to sort faces into races in art as in life is social and historical not simply physical.

Lauren Klein:

Good point. We should say something more like: “But *some* Asian users of the app…”

+ 1 more...
Anne Pollock:

Would recommend wording with more care - you are painting with a pretty broad brush here.

Lauren Klein:

Yes. Point taken.

Anne Pollock:

I don’t understand why this is bodies at the table rather than people. To cite an aphorism that Ruha Benjamin is fond of quoting, if you aren’t at the table, you are on the table. (Speaking of which, #citeblackwomen, including Ruha Benjamin if you haven’t already. Including the question she highlights: "Why am I in such demand as a research subject, when no one wants me as a patient?”)

?
Shannon Mattern:

I think I have a similar question. While I do realize that struggles over inequality and exertions of power do often play out on individual, physical bodies, I also wonder if bodies are the right “unit” for this chapter. The term “bodies” seems to suggest that non-compliant data subjects are most powerfully represented as their phenotypical, empirically observable selves. What about subjectivities, interiorities, personal histories? What alternative term might captures the totality of the human subject — encompassing both the body and all the ineffable stuff that shapes it?

Or, maybe you simply need to provide a capacious definition of “bodies” early in the chapter, to explain that you’re not talking only about the corporeal.

+ 3 more...
Anne Pollock:

Do we know this? Did they boycott Target or something?

Anne Pollock:

Why people and their bodies? Is there a separation meant to be implied?

?
Daniel Kopf:

Agreed. This chapter is terrific. But I find the use of bodies rather than people confusing throughout.

+ 1 more...
Anne Pollock:

Might nuance this with reference to Steve Epstein’s inclusion and difference paradigm

?
Yoehan Oh:

The expression "not all bodies are represented in those decisions" seems unqualified. Rather, it makes sense that not *all* seven billions of bodies can be represented under time and budgetary constraints in data practices. I think “not all kinds of bodies are represented” or “a homogeneous kind of body is represented” might sound realistic.

+ 1 more...
Anne Pollock:

The second person is tricky. I would never say this. Who, exactly, is your imagined “you”?

Lauren Klein:

Good point. We’ve been discussing this throughout the writing process, and will likely add something about our use of the second person in the intro.

I’d welcome other thoughts about this issue from others as well. On the one hand, I like the informality of the direct address. On the other hand, it can’t help but imply certain things about our imagined audience.

+ 2 more...
Anne Pollock:

This is surely a fundamental issue discussed throughout the book as a whole, but I’ll flag here that I am sceptical about the implied cause-and-effect. We have lots of data about racial health disparities — how can we justify hope that more data would help to ameliorate them?

?
Shannon Mattern:

Agreed. I think it’s important to acknowledge here, especially in the intro, and throughout the book, that data practices are only one element in an assemblage of services, infrastructures, etc., that, via their *own* intersectional interactions, have the power to exacerbate or mitigate inequities. Perhaps you could frame data practices as worthy of singling out here because they “index” these other sectors?

+ 4 more...
Anne Pollock:

This is an awesome phrasing that might merit a closer reading.

Anne Pollock:

Well put

?
Shannon Mattern:

Agreed!

Anne Pollock:

What is the basis for describing the announcement as accidental? The fact that she took it down quickly might suggest a change of heart about disclosure rather than accidental disclosure, no? Distracting and unnecessary.

Lauren Klein:

I read an interview where I thought she said it was an accident, but I will check and confirm.

?
Carol Chiodo:

Even though you cite Wendy Chun and Jacob Gaboury, I think you can briefly synthesize their conclusions as a means of further underscoring how history, culture and context play into the argument you are making. Unpack this a little more - it will give the questions you pose in the next two paragraphs a little more punch.

Catherine D'Ignazio:

Thank you Carol — excellent suggestion.

?
Carol Chiodo:

You probably have already read Sara Banet-Weiser’s work on popular feminism and popular misogyny (her book is entitled “Empowered”). Her take on feminism’s commitment to visibility in the public sphere and the repercussions of that commitment in an online environment gave me pause. I’d love to hear what you two think.

?
Zara Rahman:

I really love this table, too!

?
Yanni Loukissas:

I also found this really helpful. Great addition.

+ 4 more...
?
Sarah Yerima:

On a structural note, I think having the chapter outlines/guides in the introductory chapter would be useful. Not only would it provide your readers with a map to the rest of the text, but it would also provides you, the authors, with the opportunity to dig more deeply into “data feminism,” your central concept.

Catherine D'Ignazio:

Thanks Sarah - this is a great point. This chapter and the introduction were previously combined which I believe is how the chapter outlines ended up here.

+ 1 more...
?
Nick Lally:

Yes!

Francis Harvey:

indeed. A political-economic consideration can help deepen this anchoring