Skip to main content
SearchLoginLogin or Signup

Chapter 7 – Co-Designing for Counterdata Science

Published onNov 02, 2022
Chapter 7 – Co-Designing for Counterdata Science

What does it look like to design technology in the service of liberation? Let us start with a counterfactual – what it does not look like. Here is an imagined brief news article from WIRED Magazine for an alternate version of this project that I could have done:

MIT Professor Uses AI To Solve Gender Violence

MIT Professor Catherine D'Ignazio has developed an advanced automated system using artificial intelligence and machine learning to sift through massive amounts of news and social media data to detect articles about feminicide - the gender-related killings of women and girls. Based at MIT, her project creates a centralized and comprehensive global archive of feminicide at a scale never before seen. According to D'Ignazio, the machine learning classifier she trained can detect news articles about feminicide with an accuracy of 92%.

– WIRED

What is the takeaway from this (fictional) blurb? The hypothetical system I built is, from a technical perspective, incredibly sophisticated. The database is large – "massive" in fact. It is global, and thus ostensibly captures "all the data" in a comprehensive fashion. It is framed as an authoritative central repository. The project is credited to me, individually, and my affiliation with an elite institution is mentioned several times. This narrative works well for my own social capital; as an academic I'm incentivized towards individual impact and collaboration is viewed suspiciously by tenure committees. This narrative works well for my institution – MIT saves the world again! Thank you, MIT. This narrative works well for funders who are eager to "solve" complicated social problems like gender-related violence with efficient and scalable technologies rather than investing in people and relations. This narrative works well for white supremacy (as well as settler colonialism, patriarchy, and imperialism) since it reinforces white saviorism both from a benevolent white individual and from a white-dominated institution.

So what does this narrative erase and exclude? The labor of the organizations – often women-led and Indigenous-led and queer-led – already doing the actual work of collecting these data, usually in a more situated and culturally appropriate way. It erases and excludes the asymmetry of resource allocation in the space. It erases and excludes the agency of families, communities and social movements. The technological and social difficulty of monitoring feminicide across culture, language, and race/ethnicity/class. It obscures the scale of the missing data that we produce by monitoring from afar, without grounded context. In other words, it erases all of the rich complexity of grassroots data activism practices that I've spent the past six chapters telling you about. It's important to remember that the academy, like the data economy, is set up for and incentivizes extraction.

So, how do we design against (or in the midst of) these structural barriers? This chapter is a case study in participatory design for counterdata science and action. Throughout this book I have talked about the Data Against Feminicide team, and our work interviewing and learning from grassroots data activists. In this chapter I want to explore one very specific aspect of that work: our attempt to co-design, develop and deploy tools that support activists' informatic practices. How can we design interactive technologies to support and sustain the labor of counterdata science about feminicide? This chapter details the Data Against Feminicide team's attempt to answer that question in community, working collaboratively with the activists themselves. Guided by data feminism, we made many intentional decisions along the way to center its principles such as elevating emotion and embodiment, embracing pluralism, considering context, and challenging power.

I led this chapter with the heroic counterfactual because such hero stories are often what we academics and technologists are pressured to tell - in academia, at conferences, to existing and potential funders. In these stories, the community is always lacking in one way or another; the academic or the technologist, in turn, always fills that void, usually with some kind of tech. The reality of what our team did is much more humble, more relational and also more fraught. It was a sound process, and a process I am proud of, and a community with whom I continue to be in relation. But it was riddled with learnings and fumblings, and it isn't finished. It has left me with many questions about the role of academics and technology in activist labor. I want to tell you the vision for what our tools might enable in the hands of human rights activists, and yet I want us to reflect carefully about the structural barriers that exist to actually putting research and technology development in the service of movements for liberation.

Design for counterdata science

As a case study in co-designing tools to support counterdata science, this chapter is in dialogue with the field of human-computer interaction (HCI), a subfield of computer science that studies how people use digital systems as well as the design of novel interactive technologies to meet the needs of individuals and groups. In the tech industry, this is typically called user experience (UX) design. Prior work in HCI helped to orient our design process and surface key debates and tensions. HCI has increasingly been discussing questions of participatory design, social justice, and the role of technology in social change. One of the things I appreciate about the HCI community is how scholars and designers have been trying to integrate critical and urgent ideas – such as feminism, or critical race theory, or intersectionality, or feminist solidarity, or Afrofuturism, or decolonial methods – into the process and products of design1. For example, feminist HCI, a framework originally elaborated by Shaowen Bardzell in 2010, draws from feminist theory to suggest key qualities that can challenge patriarchal approaches to computing and contribute to more liberatory design practices2. In fact, feminist HCI directly influenced me and Lauren F. Klein as we articulated the data feminism principles (see Chapter 8). Thus, HCI designer-researchers don't only study what elements should be on a screen or where this or that button should go3. They additionally look at theories and values (how to approach and conceptualize a design situation), and process concerns (how to equitably design digital technologies) and impact concerns (how design artifacts affect their information environments and their users) and more. Outside of academia, there are communities of practitioners like the Design Justice Network that are creating similar spaces of convergence for liberatory ideas and digital design practices4.

There is a small but growing body of work in HCI that addresses gender-related violence. Recent work by Teresa Almeida and her colleagues has pointed towards the need for HCI researchers to pay greater attention to gender-related violence as a public health issue5. A number of projects have focused on designing safe and supportive digital spaces for women and LGBTQ+ people, on violence prevention, and on designing with impacted populations6. One study in this vein of prevention and direct support to impacted individuals has addressed the topic of feminicide. In response to high rates of feminicide in Brazil, Silveira, dos Santos and da Maia designed a web application to aid Brazilian women in abusive relationships in getting support7. In a 2021 paper, sociologist Renee Shelby analyzes the rise of safety technologies that seek to prevent gender violence by mobilizing responses from peers rather than from law enforcement. One example is a wearable panic button which notifies friends and community when pressed by a person in danger of sexual assault. While Shelby lauds the "abolitionist sensibility" of such tech to bypass the state, she critiques these tools for "perpetuating gendered rape myths, commercializing assault, and disproportionately placing the burden of prevention on women."8. Thus, there are tensions in using tech to address gender-related violence including (1) whether they may introduce "carceral creep", i.e. fortify punitive and state-based responses to violence that perpetuate mass incarceration and (2) focus on individual behavioral changes that, in order to cope with unjust systems, reduce rights and access9.

Other HCI studies have focused less on individual-scale designs and more on digital methods that support activism and collective action around gender-related violence. For example, Morgan Vigil-Hayes and colleagues showed that MMIW (Missing and Murdered Indigenous Women) consciousness-raising represented a significant amount of Native American activist use of Twitter in 201610. Several years later, Angelika Strohmeyer and colleagues collaborated with sex workers in North East England to commemorate lives lost to violence by co-organizing a march and reflecting on the use of digital technology during it and thereafter11. The case study I describe in this chapter is situated in this latter vein of work which does not provide direct services to women or people who experience violence, but rather to support activists who have already created their own digital strategies to combat this structural problem. In our case, these are the data activists who are working in concert with social movements to visibilize feminicide as a public issue through the production of counterdata12.

Data activists have specific informatic needs. Recent case studies, both inside and outside of HCI, have highlighted the growth in data activism as well as the informatic needs and practices of citizens and residents undertaking it. The growing literature on data activism shows that counterdata production may be undertaken by activists, journalists, nonprofit organizations, librarians, citizens, and other groups13. Data activism does not only happen outside of mainstream institutions. For example, Pine and Liboiron have shown that data activism may also come from insiders and experts looking to reframe political problems (see Chapter 3 for more on measurement as political action14). Research on violence as the object of data practices has explored how citizen organizers mobilize the affective and narrative potential of data through “agonistic data practices” and scholar-activists recording fatal violence have also reflected on their own practices, including the data challenges and vicarious trauma of recording homicide15. In a 2017 paper, Adriana Alvarado García and colleagues discuss data about feminicide and sex crimes against children as areas where human rights organizations attempt to combat data gaps through community-based data practices in Mexico16. They describe existing data practices and speculate about how HCI may work to support these practices using design, including addressing infrastructure concerns, designing for safety, and supporting community data production and circulation.

Before I describe what we did, there is a final related body of work out of HCI that I want to lift up for our conversation which relates to the efforts to think through the standpoints of academics vis-a-vis activists and vice versa. Work in HCI that explores social justice, participatory methods, feminist and decolonial approaches to design has also had to navigate the ethics – and ethical pitfalls – of design which aims for transformative change17. This often intersects with work in participatory design (PD), which is also called "co-design", and has a rich history of using participatory methods for developing democratically controlled software and technological systems. One recent and salient contribution to this literature is a 2022 paper called "On Activism and Academia" in which Débora de Castro Leal and colleagues untangle some of the tensions and possibilities of working at the intersection of academia and activism. They distinguish between action-oriented research and activist research. While the former seeks to directly address social problems, "activist research aims to understand the causes of oppression, inequality and violence. It works directly with collectives of people who are affected by these conditions."18 While the latter effort may sound noble in intention, de Castro Leal and colleagues warn against what they call community fetishism – the tendency for academics to reap career benefits from working with marginal groups. They also enumerate the various ways in which academics may fail to do the "good" they aim for – and, in fact, may perpetuate harm against activist communities through extractive practices, invalidation of community know-how, and a focus on narrow research products over process. These and other potential harms arise, no surprise, because of the ways in which oppressive forces – white supremacy, patriarchy, and colonialism to name a few – permeate all aspects of Western academia. As a result, de Castro Leal discusses ways that working with activists can help researchers recognize those forces as well as challenge such forces within academia itself.

One goal of the present case study is to begin to consolidate design for data activism as an area in HCI, to think through how we may infrastructure counterdata science practices not just with digital tools and computation, but also with ethical frameworks that help us navigate some of the complexities of working across sectors (such as activism and academia) and contexts (geographic, cultural, racial). To that end I will show how we integrated data feminism's principles into designing for counterdata science, but also how principles – just like good intentions – can only get you so far. Practice and design and relationships are messy and specific and don't always conform to general principles. That said, the friction is something to embrace because the friction is where the actual work happens19.

Resources and relationships are design questions

This case study emerges from the ongoing work of Data Against Feminicide, the South-North collaboration led by myself, Silvana Fumega and Helena Suárez Val. We are situated in academia, civil society and activism. I am a professor and Helena is completing her PhD. Silvana is the Director of Research for the Latin American Initiative for Open Data (ILDA), a nonprofit organization that works on open data across the region. And Helena founded and runs Feminicidio Uruguay, an activist website that produces data about feminicide in Uruguay. As I mentioned earlier, our coming together happened fortuitously and has since bloomed into collaborative work and friendships forged mostly on Zoom. Data Against Feminicide, as we have seen, has three goals: to foster an international community of practice that thinks critically about feminicide data; to develop digital tools to support and sustain the work of activists; and, where appropriate, to help standardize the production of feminicide data.

This chapter is about the second pillar of our collaboration – the development of digital tools and technologies to support the counterdata production work of activists and civil society organizations. The way that I think about this is that the community-building work that we do is about co-creating grassroots social infrastructure around anti-feminicide data activism and the tool development is about co-creating grassroots technical infrastructure. (And the third pillar – standardization – is in essence a combination of social and technical infrastructure.) In both cases – social and technical – the infrastructure is deliberately small-scale and consists of tools and relations that support and sustain (not outsource, not automate, not centralize) the difficult labor of activist data production about feminicide.

Participatory design often gets discussed in terms of artifacts and outputs – what we can see or touch or use. This might include process outputs like mock-ups and sketches, or product outputs like digital apps and data visualizations. Yet there is less documentation and examination of the intentional decisions made around resources, governance and relations to engineer healthy setup conditions for a co-design project. These are like the soil from which co-design for liberation may grow (or be stunted). These draw from Mariam Asad's formulation of "prefigurative design" in which she asserts that redistributing resources and transforming social relationships are key opportunities for academics to engage in justice-oriented research20. In that spirit, I want to tell you about three foundational decisions about resources and relationships that we made guided by the principles of data feminism.

First, we worked collaboratively and in community with grassroots activists. This aligns with the data feminism principle embrace pluralism which asserts that the most complete knowledge comes from synthesizing multiple perspectives, with priority given to local, Indigenous, and experiential ways of knowing. This was a very intentional decision that runs counter to where the money is concentrated for technologies related to gender violence – namely, in carceral technologies that bolster police and law enforcement budgets21. These technologies are deployed without community consultation, effectively centralizing the singular, monolithic perspective of state surveillance. For example, the Markup recently reported on police around the globe who are looking to predict domestic violence with algorithms22. And in a stunning example of carceral fortification around gender-related violence, US-based corporation Honeywell partnered with the Indian city of Bengaluru on a $67 million dollar project to reduce sexual violence in the city through the creation of "an integrated Command & Control Center to manage a state-of-the-art video system that features more than 7,000 video cameras deployed at more than 3,000 locations across the city."23 The system uses facial recognition, drones and apps in the service of detecting and prosecuting sexual harassment and gender-related violence. There are so many alarming aspects of this project that it should be the subject of a whole other case study, but one question I will posit here is this: Instead of handing over $67M to police – who have consistently proved themselves not only ineffectual and inept in responding to gender-related violence but who are also often perpetrators of it – what if such millions had been given to the grassroots organizations that are already providing expert community defense, preventative services, and access to support and healing for survivors?24

The Data Against Feminicide project sought to place our own modest time and resources in coalition and alliance with such community defense, not least because Helena is herself a data activist and brings this political orientation and lived experience into the collaboration. Thus, in contrast to typical research, the “we” that undertakes this project is not disinterested. That is, we “have allowed [ourselves] to become interested” and to use “solidarity as a method” to engage with activist data practices around feminicide and feminicide data25 . This is in line with recent calls from senior scholars in participatory design for researchers to "team up with partners to fight for shared political goals"26 . Our shared commitment is to collaborative knowledge production and technological development, but also to actively caring for the already existing community of data activists by supporting practitioners' work and connecting activists to each other. Puig de la Bellacasa has written that critical interventions “shouldn’t merely expose or produce conflict but should also foster caring relations.”27 At a more personal level, this involves elevating emotion and embodiment as tools to foster such caring relations. Angelika Strohmeyer wrote that her academic training had not prepared her for how: "the boundaries between research partner, colleague, and friend started to blur. I started to care for and love those with whom I collaborate."28 In the Data Against Feminicide project we have embraced the love and and friendship that we hold for each other, and we have supported each other in numerous ways inside and outside the collaboration – traveling together, co-writing together, venting together, snarking together, crying together, reflecting together29.

Second, more about money. Decisions about funding structure all downstream relationships and products. A key way to challenge power is to think upstream about funding – to re-engineer flows of social and financial capital through an institution or a research project, before it begins. This is part of what Asad describes as "redistributing resources"30. To get Data Against Feminicide off the ground, we did not seek highly structured grants from foundations or federal agencies. We used operational funds provided by ILDA to underwrite our community events and courses, and funded the majority of the research and tool development with my start-up funds from joining MIT in 2020. This was a way to maximize flexibility and autonomy as the three co-leaders developed our relationships with each other and as we developed relationships with data activists31. Trust takes time, and requires pivoting and reflection, especially when navigating differences of language, culture, geography, ethnicity and race32. Funders don't often see the value in such open-ended exploration. The risk of seeking large grants, especially at the incipient phase of a project before you are working in community, is that the seeker sets up the goals, timeline, and outcomes which may in fact not be desirable to the community. These lead to urgency, a key feature of white supremacy culture and a key way to damage relationships with a community before they are well-formed33. Moreover, we recognized that data activists' labor is gendered, racialized, always under-resourced, almost never paid, and that participation is work. Thus, we compensated all interviewees and co-design partners for their time34.

Finally, at no point did Data Against Feminicide request to see, seek to use, or aim to centralize activist data. Why do I state this so explicitly? First of all, grassroots efforts like MundoSur are already underway to harmonize activist data (see Chapter 5) in sensitive and participatory ways. And second, Silvana, Helena and I were aware of the extractivist tendency in academia; the all-too-common pattern whereby researchers, as Castro de Leal and colleagues put it, go to communities and "simply ask for data without giving much in return"35. This has led to deep and very warranted distrust of academia by marginalized communities, evidenced by Linda Tuhiwai Smith's (Ngāti Awa and Ngāti Porou, Māori) affirmation that the word research "is probably one of the dirtiest words in the Indigenous world’s vocabulary."36 We did not enter the community with the (arrogant!) idea that we could take activists' data and synthesize their work better than they could. We entered with admiration for their labor and commitment (and in the case of Helena, lived experience of doing the work herself), and an offer to explore together how to support and sustain it. This represents a way of making labor visible – seeing and appreciating the tremendous collective efforts that result in feminicide counterdata and offering a vehicle to create social and technical infrastructure around them37. Our aim was to manifest a design-as-service orientation rather than a design-as-hero orientation.

Together these three setup decisions – collaborating with activists, keeping funding flexible while still paying participants, and deliberately not requesting activist data – paved the way for the co-design process that followed.

The co-design process and products

Designing digital tools with data activists began in parallel with our interviews with them in 2020. In fact, one of the main reasons we started interviewing grassroots activists was to understand their informatic needs and what kind of tools might serve those needs. As I outlined in Chapter 2, the in-depth interviews aimed to understand the workflow, data collection process, and conceptual categories through which activists identified and documented feminicide and fatal gender-related violence, as well as their reflections on lessons learned from their monitoring work. We asked activists about their informatic challenges and also about their ideas for tools that could help mitigate these challenges38. From these interviews, we developed the process model that I have used throughout this book which describes the workflow stages of a feminicide counterdata science project (see figure 2.4 or Table 7.1 below).

These interviews also surfaced numerous informatic challenges that activists face in producing data about feminicide. Two of these challenges emerged again and again, and so we tried to directly address them in the co-design process. First, as I detailed in Chapter 4, all groups have to reckon with missing data, which is inevitable due to the negligence, inaction and bias of the state and media, and which makes the researching stage of counterdata work very challenging. In the face of such missing information, many activists use news media articles to source cases which they may then triangulate with other sources. The hurdles at this stage are more acute for activists monitoring violence against Black women, Indigenous women, rural areas, and/or LGBTQ+ people. As described in Chapter 1 and Chapter 4, across countries and contexts, the media systematically underreports the killings of racialized and marginalized people and the state disproportionately misclassifies or neglects to investigate these deaths. Thus, learning about new cases and acquiring information about them is especially difficult. This challenge is clearly structural, and thus cannot be resolved by a tool, but it seemed clear to us that digital tools might at least help to surface information that activists are seeking. We soon started talking about some sort of case detection system that could scan news or social media, especially hyperlocal media, to detect cases of feminicide; and/or notification systems that could alert activists to new cases.

Second, all groups face resource constraints in terms of time, money, mental health burden and emotional labor. All of the groups and individuals that we interviewed faced significant resource constraints. The majority were volunteer-led efforts with no funding source, though some were supported by small grants or crowdfunding. Across all projects, activists noted how counterdata work was time-intensive and also emotionally challenging because of the continuous exposure to violence. Tools that help groups anticipate and plan for some of these challenges might therefore be useful in the resolving stage of work. They have various strategies for navigating such burdens which Helena and our team have written about in an extended blog post on "Feminicide data, emotional labor and self-care."39 From these challenges, we can extrapolate some design implications, such as any new or adapted digital tool should be free, easily maintainable, collaborative and easy to learn by newcomers to accommodate activists' volunteer and sometimes ad-hoc labor. Another approach for tools in the researching stage could be case detection systems that seek to reduce the number of non-relevant violent results that counterdata activists are exposed to and that they need to filter manually. This could reduce overall time spent seeking cases as well as some of the emotional burden of the work. At the same time, activists do not want to fully eliminate this labor – they see the emotional labour of caring for murdered people’s lives, stories, and families as an essential part of public witnessing and memory justice.

Resolving →

Researching →

Recording →

Refusing & Using Data

Activist Activities

Starting a monitoring effort

Seeking & finding cases & related info

Information extraction & classification

Where data go, who uses them

Design Example

Tools to map out feasibility of counterdata effort

Tools to detect relevant cases, e.g. DAF Email Alerts System

Tools to record and categorize cases, e.g. DAF Highlighter

Table 7.1 Process model describing the stages of a feminicide counterdata science project as well as tools that could be useful at different stages. For example, the DAF Email Alerts tool helps activists detect cases of feminicide during the stage of Researching.

The more we talked with activists, the more ideas surfaced. It became clear that design could be useful in many ways – that there were likely tools that would aid with all four stages of counterdata work (see Table 7.1). It seemed only fitting to workshop these design possibilities directly with activists. We began a participatory design process in June 2020 with two partners. Because the project was on a limited budget, we decided to draw from the activist expertise already on the leadership team: Helena Suárez Val of Feminicidio Uruguay, whose experience comes out of Latin American conversations about feminicide; and then to invite the first North American activist that we interviewed, Dawn Wilcox of Women Count USA. We ran six co-design sessions with Dawn, Helena, myself and a number of student researchers between June 2020 and Feb 2021 – roughly one session a month. Since this was the height of the COVID-19 pandemic, and we were all in different locations, these sessions took place on Zoom and there were usually around five to seven of us on the call. Our sessions were scheduled for an hour but often went longer, as we got to brainstorming, reacting to wireframes, and proposing hand-sketched changes (figures 7.1a & b). Design ideas fell into five categories: detecting and recording cases, enabling collaboration, storing sources, archiving databases, and data analysis and visualization (figure 7.1c).

Figure 7.1 (a & b) Screenshots from our co-design sessions which took place on Zoom and often involved reviewing hand-drawn sketches and generating new ideas
Figure 7.1 (c) The final Miro board cataloging and categorizing ideas generated during six months of co-design work with data activists. Courtesy of the author.

From our six months of brainstorming sessions we emerged with a list of more than fifty potential ideas, which ranged from the relatively simple (a bar chart with photos) to the more technically challenging (data repository where activists share the data they produce and get recognized for their efforts) to the non-technical (activist events to share knowledge around managing volunteers). From these ideas, the activists and our team mapped out which to carry forward into development based on two questions: according to our interviews and our partners' knowledge of other data activists' work, how widespread was the need that this tool met? Additionally, how technically feasible was this tool to develop? We decided to move forward with two tools that help activists detect and record feminicides – i.e. that address the researching and recording phases of a counterdata science project.

Our team developed and built the first version of the tools in early 2021 and then piloted them later in the year40. For the pilot, we wanted to test both a Spanish and English language version of the tools, so we recruited seven groups from Argentina, Uruguay and the United States (see Appendix 2) from those that we had previously interviewed. Groups participating in the pilot filled out a weekly survey and participated in two 2-hour focus groups over a period of two months.

The Data Against Feminicide Highlighter

Figure 7.2 Screenshot of the Data Against Feminicide Highlighter, a browser extension for the Chrome browser. (a) Shows what news articles look like when they are highlighted to facilitate activist scanning and (b) shows the interface of the Highlighter. Courtesy of the author.

The Data Against Feminicide Highlighter is an extension for the Chrome browser, which aids in the recording stage of a counterdata science project. This is the stage when activists have found relevant information and need to extract it into structured spreadsheets or databases. When an activist opens a news article about feminicide or fatal gender-related violence, the extension auto-highlights names, places, dates and numbers with different highlighting colors (figure 7.2a). Data activists can also put in custom words to highlight, such as "gun", "husband" or "boyfriend" (figure 7.2b). The Highlighter has a link (“Open Database”) which activists can customize to open their own database or spreadsheet for easy copy-pasting between the browser and the spreadsheet.

The idea for the Highlighter emerged from an early participatory design session with our co-design partners where we were discussing how, for a given case, activists need to scan a huge amount of information in order to fill all of the fields for their database. Our interviews put the figure at anywhere between 3 to 50 news articles per case. Activists move back and forth, copying and pasting between the browser and their databases; the process is incredibly time consuming, and also emotionally intensive. The goal of the Highlighter is to reduce activists’ overall time spent scanning violent news articles by visually highlighting the key pieces of information they are seeking for their databases. During the pilot, groups also suggested many useful additions to the Highlighter that we have now implemented, including the ability to email an article to a colleague (to support collaboration) and a feature for highlighting specific words in specific colors.

The Data Against Feminicide Email Alert System

Figure 7.3. Sample email alert delivered from the Data Against Feminicide Email Alerts System to Women Count USA.

The Data Against Feminicide Email Alerts System supports activists in detecting new cases of feminicide and fatal gender-related violence, and in following the development of existing cases. This is part of the researching stage of a counterdata science project, wherein activists are seeking information about relevant cases or observations for their database (see Table 7.1). Our system is designed to be similar to Google Alerts, but with a few significant tweaks. An activist sets up a project in a particular geography. Then, they input keywords for finding news media articles related to feminicide or gender-based killing, and pick the frequency with which they wish to receive email alerts. Many groups we interviewed had attempted to use Google Alerts to find cases, but most had stopped because the search results were too broad, the system returned cases from outside their geography of interest, and/or the system repeated a single case or article many times over, making it hard to distinguish between new and old information.

Given that so many activists had tried and stopped using Google Alerts, in early co-design sessions, we iterated on the idea of an improved Google Alerts system and discussed everything from full automation (the system would monitor the media, extract relevant information from articles and put all results into a database) to partial automation (the system surfaces alerts and the activist chooses which are relevant for their database). Helena pushed back on the idea of full automation and described the central importance of her emotional labor of witnessing and caring for the people she logged in her database. Because of this, as well as the fact that definitions of feminicide vary across cultures and contexts, we stayed with partial automation where it is ultimately up to the data activists to decide whether a case is relevant for their database or not. The overall goal of the system is to reduce activists’ time spent searching for new cases as well as to reduce the emotional burden of reading violent news articles which are not relevant for their databases.

The idea of the system is simple - it surfaces alerts for relevant cases of gender violence. Yet what's happening behind the scenes is a bit more complicated. The system draws news content from MediaCloud, an open-source platform for media analysis and also an academic research project I have participated in41. An organization using the Email Alerts system can customize a search query and set of place-based media sources to best suit their project needs. Media Cloud then retrieves matching articles from its continually updated database of global news stories, which are run through a machine learning model we developed that predicts the probability that the article will be relevant to the organization (i.e., the article describes an instance of feminicide). Articles above a particular probability threshold (which defaults to 0.75) are sorted by the probability of feminicide and delivered in a daily email digest (figure 7.3) and can also be viewed in an online dashboard.

The first machine learning models we developed involved participatory annotation of training datasets – meaning, our team manually labeled several hundred news articles about feminicide in both English and Spanish42. This is because we needed a reliable set of news articles describing feminicide in order to teach the machine how to detect it via natural language. These data were used to train two language-specific logistic regression models to predict the probability of feminicide from the text of an article. The English and Spanish models achieved 84.8% and 81.6% accuracy, respectively. Further details about data collection, annotation, and model performance for this initial iteration can be found in our paper for the MD4SG community in 202043. Since then, we have developed models for different types of fatal gender-related violence (which I discuss further in the next section) as well as begun work on a Portuguese language model with Brazilian activists.

Thus, some key differences between DAF Email Alerts and Google Alerts are that 1) DAF Email Alerts has more geographic precision because it relies on curated media sources organized into geographic collections by MediaCloud; 2) While Google doesn't reveal its sources and has clear omissions around local and regional news sources as well as blogs, DAF Email Alerts is more transparent. Activists have the list of sources are scanned for each project and can add new media sources to the scan; 3) DAF Email Alerts filters all news articles through a machine learning classifier that has been trained to predict whether the article is a case of feminicide or not, which results in fewer false hits that are returned; 4) DAF Email Alerts groups articles that are related to the same case, making the results easier to scan. For example, in Figure 7.3, under the first headline "Judge to decide Wednesday whether James Prokopovitz can ever be released from prison in wife's homicide", you can see that there are four articles from different news outlets about an upcoming sentencing in one specific case.

The tensions of co-design

For the most part, these two tools received positive reviews during our two-month pilot. More than half of the groups in the pilot reported that the tools saved them time, helped them detect new cases, and made their work easier. Rosalind Page, who runs Black Femicide US, expressed that the best part about the Highlighter was that she didn't have to read the whole news article to see if it was relevant for her database. Members of Mumalá said that while not all articles delivered by the email alerts tool were relevant, the system was delivering several cases a week that they would not have otherwise found out about. They were also receiving alerts about feminicide attempts so it made those easier to detect and track. Five of the seven groups in the pilot continued using the two tools after the pilot ended, so this is a good indicator of overall performance and utility. After doing a round of improvements following the pilot, we have continued to maintain the tools and, in November 2021, launched them for use by the Data Against Feminicide community and continue to run workshops so that activist and civil society groups can learn how to use and adapt them for their workflows.

While these were relatively positive results, there were three tensions that surfaced in the design process, pilot and maintenance periods. These warrant some reflection because they point to larger structural concerns for other counterdata design projects and raise questions that the field will need to navigate as we build towards design for data activism as an area of study and practice in HCI44.

Tension #1: Modeling politically contested concepts

We have seen, again and again, the significant variation in how the concepts of femicide and feminicide are elaborated in laws, in the media and by civil society. This poses challenges for the DAF Email Alerts system. What constitutes a feminicide? How does the news media describe such an event in a given context? How can a machine learning model learn to detect that event using only the text of the article? In the Americas, there is significant language variation encompassing, at the very least, media in Spanish, Portuguese, French, English, Quechua and Guaraní. Beyond language, there are variations in legal definitions of feminicide at the country level (see Table 1.1). And beyond legal variations, there are significant differences in media ecologies at the country level; feminicide is reported on with different language in Peru than in Argentina, though both countries' dominant language is Spanish. For example, the Ahora que sí nos ven observatory recounted to us how, since the #NiUnaMenos uprising, femicide has become a hot public issue in Argentina; the media will often use the term as clickbait in a headline even when the event under consideration does not constitute a femicide45. In contrast, a study in Peru found that while feminicide is becoming more widely used in the media, it is typically used to describe episodes of interpersonal violence and not used to describe the larger structural phenomenon or public policy approaches to the issue46. In some places, such as the US and Canada, the terms are very rarely used in the media and terms such as "MMIW" and "domestic violence" have more media prevalence. Even within a given country, different groups – government agencies, journalists, nonprofits, activists – may define and count feminicide differently47.

This amount of variation poses a significant challenge for a technical system seeking to detect news articles describing cases of feminicide. Abebe et al argue that one role for computing in social change is to act as "formalizer", a way to codify the definition and understanding of a social problem, as well as its measurement48. Yet because of the contextual variation, at multiple scales, across multiple stakeholders, such formalization is premature – it would constitute an attempt to bypass the social and political contestation taking place around feminicide and preemptively bake it into a one particular model. Often, computer science, being a somewhat conservative field, is inclined to side with larger, credentialed, status quo institutions for definitions and formalizations. But doing this is at odds with the Data Against Feminicide project's design goals of challenging power, centering the expertise of grassroots feminist activists, and using computing for liberation. So how do we operationalize the data feminism principle of considering context especially given that there is so much context to consider?

The way we navigated this in the Email Alerts tool is two-fold. First, we made the design decision to not try to fully automate case detection and information extraction. This was because, as mentioned earlier, Helena and other activists pushed back on the idea of full automation due to their commitment to witnessing and caring for the individual people in their database. And it was also because full automation was not technically feasible without having a much more formalized and rigid definition of feminicide. Instead, our system involves the activists and grassroots groups – human expertise and judgment – as the ultimate arbiter of whether a case is relevant for their database. Thus the system surfaces probable feminicide cases, and the humans monitoring in different contexts make the ultimate judgment about what counts. Such a system may still need different models for different languages or different types of violence, which will require additional labor of data annotation and software development, but it can then support wide variation in definitions and circumstances.

The second way that we considered context in the design of the Email Alerts system has to do with how we assembled and annotated training data sets. For our initial prototype, we collected and annotated two datasets of around 400 articles each: the first in English, in collaboration with Women Count USA, and the second in Spanish, in collaboration with Feminicidio Uruguay (Helena's project). Each article was annotated by three separate people (activists, myself, and/or student researchers) as to whether it described a feminicide and, if so, what type of feminicide. Where there were discrepancies, we held deliberation meetings to debate each case and determine the ultimate judgment. This participatory data annotation process is a way of embracing pluralism, supporting the integration of multiple knowers and positionalities into a machine learning model. This also mirrors activist processes of deliberation on cases that I described in Chapter 5, wherein activists often use WhatsApp channels to debate and make a collective determination as to whether a case should be included in their database. As I write, we are in the process of systematizing this participatory annotation approach and developing a Portuguese feminicide detection model in collaboration with activists who represent five different feminicide monitoring efforts across Brazil.

Finally, one future path we are exploring for considering context is custom tuning of feminicide classification models in the Email Alerts system, based on user-generated feedback. For example, if activists provide us with on-going feedback on articles surfaced by the system as relevant or not relevant for their project, we could tune their project's machine learning model to incorporate those as training data and, over time, "learn" what is more and less relevant at the scale of an individual monitoring project. This means we may end up with as many machine learning models as we have projects, but that each model could be more accurate, precise and appropriate to its context. This approach, while promising, remains to be tested. All of these tactics I have discussed represent ways to use computing and machine learning to support multiplicity and contextual difference across stakeholders, media ecologies, countries, and languages as opposed to the formalizing and reducing and centralizing functions that computing tends to play by default.

Tension #2: Co-liberation requires intersectional ML models

"It's not working for us," reported Michaela Madrid, a staff member at Sovereign Bodies Institute, "We are getting so many cases every day but we check every single one and none of them are Indigenous women." It was Spring 2021 and our team was conducting a pilot study for two tools that we had co-designed with activists and developed into functional prototypes. We were in a Zoom focus group where ten activists spanning four feminicide monitoring organizations based in the US were sharing their perspectives.

Among the four groups, we received dramatically different feedback about the Email Alert tool's performance. Women Count USA, which monitors all US femicides, reported that the results were overall very relevant and useful. Another organization, Black Femicide US, monitors femicides of Black women and reported mixed but still useful results, with around 4 out of every 10 articles the system sent being relevant. Both have continued using the system in their work. However, the system did not source relevant results for two organizations in particular, both of which monitor specific, racialized forms of feminicide: 1) Sovereign Bodies Institute (SBI), a group which tracks missing and murdered Indigenous women, girls, and two spirit people (MMIWG2) and 2) the African American Policy Forum (AAPF), which monitors police violence against Black women as part of the #SayHerName campaign.

Feedback from these two groups consistently showed a lack of relevant articles being returned by the system, despite modifying the search queries to add relevant terms. The groups’ frustration could be heard in comments in focus groups and weekly surveys—for example, an activist from SBI wrote, “The majority of articles are not relevant to our focus, which means I’m actually spending more time than usual trawling through potential additions because I’m reviewing so many more news articles than usual.” As we reached the conclusion of the pilot, it became apparent that groups dealing with all feminicides (i.e., all women killed in a specific region) derived far more utility from the Email Alerts tool than groups that monitored more intersectional forms of such violence.

Why is this? As described in prior chapters and Table 4.1, racialized groups face more missing data from the state and more underreporting from systemic bias in the news media (the converse of which is the "missing white woman syndrome" of extensive news coverage when the woman at risk is white49). Additionally, it is well-documented that the state systematically misclassifies Indigenous victims' race50. News articles often do not report the race or tribal identity of a killed person, and they regularly misgender trans people. Not surprisingly then, it is very difficult for a machine learning model to try to distinguish news articles based on language that is either (1) absent or (2) incorrect.

Navigating these biases and gaps in the information ecosystem surrounding racialized feminicide are part of a commitment to considering context, i.e. operating with the knowledge that "data are not neutral or objective. They are the products of unequal social relations, and this context is essential for conducting accurate, ethical analysis"51. With agreement from the two groups, we went back to the drawing board with the machine learning model, and undertook further iterative data collection, modeling and evaluation steps to deploy new classifiers to meet their needs.

Figure 7.4: Our data collection process involves iteratively collecting context-specific positives (e.g., by sourcing ground-truth articles from organizations’ existing databases) and context-specific negatives (e.g., by identifying and collecting types of negative examples close to the decision boundary, such as articles describing cases of Black men killed in police violence). Courtesy of Harini Suresh and co-authors.

This is work that was led by computer science PhD student Harini Suresh with significant contributions by Rajiv Movva, an undergraduate at the time, now a PhD student at Cornell. The two collaborators came up with the data annotation and training pipeline depicted in figure 7.4. Round 1 trains and tests a machine learning model for all feminicide cases – what we used in the pilot and what did not work for the two groups monitoring racialized feminicide. Round 2 supplements the training data with links to context-specific feminicide cases contributed from the organizations themselves, who typically store links to news articles about cases in their databases. When we tested the model developed from Round 2, it returned somewhat more relevant articles, for example, more articles were returned about police violence for AAPF. However, there were more false positives. For AAPF, we found that the list of returned articles was often dominated by police violence against Black men or cases where the police were investigating other violence, both of which are much more commonly reported in the media than Black women killed by the police. Thus, Round 3 augments that further by supplementing training data with context-specific negative cases such as these – training the model as to which cases are not relevant. Harini and Raj explored several ways of combining and chaining training data into the machine learning model and determined that a method they called "contextual hybrid" worked the best. In the case of AAPF, this meant first running a classifier to determine whether a news article described any kind of police violence, and then, as a second pass, training a feminicide classifier to recognize articles about police violence specifically against women52.

Here, I cannot offer you a hero story that "we fixed it" and successfully navigated the deeply biased information ecosystems for racialized feminicide. The evaluation of this work is multi-stage and it is still unfolding. As a method for embracing pluralism, our team considers the final test for these models' effectiveness to be a participatory evaluation that we are initiating with the groups as of this chapter's writing. But before burdening groups with evaluating new models (because, again, participation is work), we are performing two stages of internal evaluation. The first involves internal cross-validation – our team will do quantitative evaluation of the model against a test data set. The second involves an internal monitoring phase where our team will deploy these models into test projects in the DAF Email Alerts system and assess whether they are returning relevant news articles in a real-world deployment context. So far, our internal tests with these contextual models have yielded far more relevant results than the one used in the initial pilot, but the final proof will be in the groups' own assessment of relevance for their work.

I want to highlight one final design decision we made in the course of navigating this tension. We did not try to infer the racial identity of the victim described in a news article, even though that is a central feature of interest for the groups we are working with. While we annotated AAPF’s data with “police violence” and “feminicide”, we did not include a race annotation even though their focus is specifically on Black women. While technologies exist to infer race (e.g., from names or photos or language), they are often empirically wrong and ethically fraught53. Race and gender are not essential properties of an individual body but rather are their own kind of classificatory technology operating at the structural level – hence why a person may be white when they are in El Salvador and Latinx/Hispanic when they are in the US54. The individual didn't change, the racial classification system changed. A person doesn't "have" a race; rather, a person is racialized. The data feminism principle of rethinking binaries and hierarchies requires us to challenge the gender binary, along with other systems of counting and classification that perpetuate oppression. In this case, challenging those systems meant refusing to use reductive and erroneous racial inference tech.

Tension #3: Supporting and maintaining tools in the real world

It was January 2022, and the Email Alerts system was in a stable state. Our team had fixed bugs, and added new features based on recommendations from the activists who participated in the pilot. We introduced the tools to the larger Data Against Feminicide community via a panel with the pilot participants as well as hands-on workshops to learn how to install the Highlighter, set up a monitoring project and configure their alerts55. There were around thirty monitoring projects in the system created by activists, nonprofits and artist collectives. And then the system went down.

Well, the whole system didn't go down, but a key piece of its infrastructure did: Media Cloud, our main source of news articles by geographic region. Without a database of news articles to query against, our Email Alerts tool just couldn't function. Activists, especially those that had participated in the pilot and who had come to rely on our tools, were emailing us asking when the service would come back online.

Rahul Bhargava, MediaCloud's principal investigator and a professor at Northeastern, is a collaborator and partner on the Data Against Feminicide project. He wrote to us that the database had exceeded its capacity and they were doing everything they could to get it back online. Unfortunately, the technical glitch coincided with staffing issues and an urgent server migration and MediaCloud wouldn't return to service for almost three months. During that time period, Rahul and Wonyoung So, the Data + Feminism Lab's technical lead, searched for other large-scale news sources that we could query against. Google's news APIs had query length limitations (and activists have long and complicated queries) and severe geographic limitations (no news sources for a number of Latin American countries so that was a showstopper). After a couple months of testing and searching, Rahul and Wonyoung found NewsCatcher, a start-up company with a friendly founder who gave us a low-cost monthly subscription56. We integrated NewsCatcher right as MediaCloud was coming back online and we now have some redundancy in news databases in case of future outages.

Together with Helena, Silvana, and Rahul, I have reflected a lot on this outage and the ways in which it highlights some fundamental tensions in designing and deploying "real-world" tools from academia. First, novelty in design and research is incentivized in HCI as it is in academia more broadly. This leads to prototype proliferation in which there is funding for early-stage tools, and academics reap social and career benefits for doing that work with communities. Yet following such prototypes, there exists neither structural incentive, nor funding, nor accountability to ensure that relationships grow and deepen nor that prototypes blossom into services (if mutually agreed upon that they are beneficial). This is what Castro de Leal and colleagues call "community fetishism" and Bodker and Kyng call it the "least-effort strategy", i.e. the best way to quickly publish research papers57. This constitutes a kind of nominal adoption of the data feminism principle embrace pluralism but only when it serves an individual's career or the larger enterprise of academia. It's the same old academic extraction but with a veneer of "participation-washing"58. Moreover, the end result is a proliferation of presentist thinking which "leads to too many processes and products with no utility and no impact."59

Second, when our team tried to move from academic prototype to real-world tool, we made a fundamental mistake of optimism which, in retrospect, seems absurdly basic. The first lesson you learn as a software developer is that technology always breaks. Probably 80% of software development is building infrastructure to anticipate and address the ways in which it could stop functioning. This is the point of unit tests, redundant data sources, service monitoring, and more. We launched assuming the best case scenario of technological infrastructure: the services we rely on will always be running, we would have the capacity to address any bugs or issues as they arose, and so on. In practice, this wasn't the case - the services went down and we didn't have a large professional team ready to address outages and respond to users. HCI offers the crucial notion of non-intervention – considering when technological intervention is not appropriate in the first place60. This work describes how well-meaning interventions may worsen the situations they intend to help. While one of our original design goals was to reduce activist time spent monitoring and exposure to violent content, our team may have been inadvertently working directly against that goal by asking activists to spend their precious time and labor navigating "high-tech" solutions that were unreliable. Yet it is only possible to recognize this contradiction if we expand the design space to include designing the infrastructure to deliver the service in the real world. Most HCI work remains in the prototype and evaluate-the-prototype stages of design61. How do we ensure that the design of an artifact also includes the design of its deployment, infrastructure and maintenance?

Finally, work on HCI to support grassroots social movements has highlighted fundamental tensions between building alternative technologies and deploying them at a scale that can "work" for movements. As Sucheta Ghoshal and colleagues write, "While creating politically committed technologies is a necessary step in the right direction, there are still practical challenges that remain in realities of technology-use. For example, as we will see, the most popular and 'effective' solutions that 'work' (with significant implications in technological power) in movement settings are centralized technologies made and marketed by corporations like Google or Facebook."62 While there have been brilliant critiques of scalability as an unquestioned value in computing, it would seem that no alternative approach could get off the ground without buying into such Silicon Valley logic, since who among us has the resources for ensuring uptime and infrastructural soundness other than the Facebooks and the Googles of the world?63.

I don't have an answer. Currently, I am sitting with these tensions, talking with the collaborators and colleagues and activists, and pursuing pathways for more sustainable, reliable delivery of feminist technology services to counterdata and human rights activists. One path we are exploring is a strategic partnership with a justice-oriented software development agency wherein my lab would be responsible for co-design of prototypes and participatory machine learning models for new contexts; their group would be responsible for robust architecture, upgrades, maintenance, and security; and then we would collaborate on user support and customer service. But as with most things, these are fundamentally questions of resources–all of this involves labor and material infrastructure. We are discussing resourcing such an endeavor through grants plus a revenue model that would make the service free for activists and small organizations, and return profits back to the community which participated in the design process. But it remains to be seen if this could work in practice.

For me, these questions of infrastructure are a fundamental challenge to designing and deploying alternative technologies with counter-hegemonic values and visions. They illustrate the tensions inherent to designing, deploying, and sustaining liberatory informatic visions, particularly when those visions are not profitable (and may even be anti-capitalist in nature). And yet, if we require perfection in uptime and ginormo scale then we hand over tech to Big Tech and venture capital. I refuse that outcome, and call on HCI as a community to challenge power by thinking together with grassroots social movements about novel ways to infrastructure data activism and counterdata science.

Conclusion

In this case study I have described how the Data Against Feminicide team co-designed two specific tools to support and sustain the work of grassroots data activists. From this experience, I see many more opportunities for HCI to design for data activism and counterdata science about feminicide and beyond. The descriptive model outlined in Table 7.1 may help us structure efforts to design for counterdata science across various stages of project activities. The tools we built fell into the researching and recording stages but our co-design process generated ideas across other workflow stages. For example, in the resolving stage, there could be useful tools to collaboratively map out the data and information ecosystem and help evaluate the feasibility and labor required of any counterdata effort. Activists we worked with generated many ideas for the refusing & using data stage which related to novel visualization and memorialization tools and designs for navigating data communication ethics.

This chapter has also detailed some of the ethical decisions that our team made guided by the principles of data feminism, which led me to make a case that resources and relationships should be considered some of the most important elements of design. Even with strong guidance, numerous tensions arose around our design interventions: how to best consider context across language, geographic and cultural differences; how to address shortcomings of our initial designs that did not consider intersectional differences; and how to infrastructure academic projects that aspire to go beyond parachuting prototypes into communities. I end on these tensions because I want us to be able to tell design stories that are not hero stories, that leave us with open questions, and that open a door to both critically and practically challenging power, across our technologies and our institutions and ourselves.

Comments
26
J. Nathan Matias:

This is SUCH a good point!

Lily Irani & Six Silberman reflect on this in their article on Critical Infrastructure as distinct from Critical Technology:

https://dl.acm.org/doi/fullHtml/10.1145/2627392

In my experience, maintenance and reliability for communities exists in conflict/tension with the educational and research missions of universities.

If it’s something that will require ongoing maintenance, CAT Lab doesn’t ask students to develop it. That requires us to fundraise for professional staff— which still supports the educational mission through relatively high indirect costs. But the software is then reliable, documented, maintained.

That’s a high price tag for projects that work with communities, especially when the university in question is not serving those communities through education. In my view, compared to this cross-subsidy of education, it’s often best for projects that the work not originate in universities at all, and not be HCI in the academic sense.

J. Nathan Matias:

See also Jill Dimond’s 2012 dissertation “Feminist HCI for real” https://search.proquest.com/openview/9c3f38016001522c449f9d27214f85af/1?pq-origsite=gscholar&cbl=18750

J. Nathan Matias:

This is one area where you can draw from a deeper tradition. Muller & Druin’s 2008 chapter cites work in HCI on participatory design going as far back as the 1970s in Scandinavia, which arose from organized labor. I think it’s helpful to acknowledge that it’s always been a part of HCI, thanks to the work of labor activists.

https://userpages.umbc.edu/~skane/classes/is760/fall2011/papers/Muller2008.pdf

?
James Scott-Brown:

In the UK, the Femicide Census acknowledges support from Freshfields Bruckhaus Deringer LLP and Deloitte LLP (i.e., a large legal firm, and a large accounting firm).

So a company might possibly be interested in funding work under the banner of Corporate Social Responsibility, However, making grassroots organizations financially dependent on large corporations is not ideal.

?
James Scott-Brown:

?
James Scott-Brown:

Why is “ginormo scale” required? A previous paragraph seemed to imply that “scalability” is required for “ensuring uptime”.

But I’m not really sure this is true: you can probably achieve “good enough” uptime with a failover Virtual Machine or physical server in a different datacentre. Of course this requires skills and time to set up, but it doesn’t require centralized proprietary services from large corporations. Hetzner or OVH are much cheaper than Amazon AWS or GCP.

?
James Scott-Brown:

Reading this, I am unclear whether there a specific agency with which you are trying to form a partnership, or whether you are thinking about partnering with an as-yet-unidentified organisation.

?
James Scott-Brown:

As a counterpoint, see the work by Julian Oliver on self-hosted infrastructure for Extinction Rebellion. He had a talk at 36C3 in 2019: https://media.ccc.de/v/36c3-11008-server_infrastructure_for_global_rebellion

(and a later tweet listing some changes since then: https://twitter.com/julian0liver/status/1385832559599366145)

Of course, most grassroots organisations don’t have access to someone as skilled as him (though he does have a consultancy - https://nikau.io/).

?
James Scott-Brown:

These issues are not exclusive to academia, and also occur in some large technology companies. For example, it’s often said that the promotion process at Google requires people to have led on projects, which incentivises people to create new things; this leads to a proliferation of products, many of which are then abandoned or killed (especially if they don’t generate revenue, so there is no funding to keep them running indefinitely). Some of the killed projects are things that people had come to rely on (e.g., Google Reader).

?
James Scott-Brown:

To what degree do the two data sources overlap? Does/would using both provide greater coverage?

?
James Scott-Brown:

“set up a monitoring project in Email Alerts

If I’m understanding correctly, the Highlighter and the Email Alerts are two different systems, but this sentence currently suggests monitoring projects are set up in the Highlighter.

?
James Scott-Brown:

This looks like a placeholder rather than the final footnote text.

Also consider mentioning attempts to infer gender in the paragraph, rather than just the footnote.

?
James Scott-Brown:

who is doing the work of constructing the test dataset?

Also, it might help some readers to explicitly state that the test data set used for validation contains only data that was not included in the dataset used for training.

?
James Scott-Brown:

The figure suggests that it *replaces* the positive examples in the training data, rather than “supplementing” them.

?
James Scott-Brown:

In the caption of this figure, should “killed in police violence“ be “killed by police violence“?

?
James Scott-Brown:

“who was an undergraduate at the time and is now a PhD student at Cornell“

?
James Scott-Brown:

Can this be formatted as a numbered/bulleted list, rather than a paragraph that contains numbered clauses?

?
James Scott-Brown:

You could mention that this is an example of applying Named Entity Recognition.

?
James Scott-Brown:

if “mapped out” is changed to “decided” above, then this “decided” could be changed to “chose”.

?
James Scott-Brown:

“decided” ?

?
James Scott-Brown:

Six brainstorming sessions? “between June 2020 and Feb 2021“ is more than 6 months

?
James Scott-Brown:

Some of the text on this figure is very small.

?
James Scott-Brown:

Why do 3 of the 4 column headings end with an arrow?

?
James Scott-Brown:

corroborate? augment with additional details from other sources?

?
James Scott-Brown:

Another reference placeholder.

?
James Scott-Brown:

“It was a sound process, that I am proud of, within a community with whom I continue to be in relation.” ?

?
James Scott-Brown:

I think the paragraph might flow slightly better without these two words.