The vitality of language lies in its ability to limn the actual, imagined and possible lives of its speakers, readers, writers.
—Toni Morrison, Nobel Lecture
Toni Morrison was awarded the Nobel Prize in Literature, in 1993, for “novels characterized by visionary force and poetic import.” In her Nobel Lecture, Morrison noted how language, whether spoken or written, can limn - or describe and detail - life. Limn is a distinctive verb; derived from the Latin illuminare, meaning to “make light” or “to illuminate,” the word has been used throughout literary history to generally describe - and, in some cases, to convey the literal illustration of - a manuscript. Can annotation limn, or help to describe? And if so, what does such annotation do? Let’s return to our annotated copy of Beloved.
Perhaps as a student you highlighted key passages of Beloved, jotted notes in the margins, or repeatedly scribbled stars and question marks, all to better illustrate your growing literary knowledge. Although your annotation of Beloved may be idiosyncratic, such activity extends a lineage of historic reading practices evident throughout the early modern period (approximately 1500-1800). Heidi Brayman Hackel’s study of that period’s book use among “less extraordinary readers” - and that would be everyday folks, like us - suggests people added various types of marks while reading their books.1
Marks of active reading, like underlining, indicated sustained engagement. Marks of ownership, like a signature, distinguished books as valued, physical objects. And marks of everyday recording, perhaps unrelated to the book’s content, added ancillary information. In Brayman Hackel’s assessment, with varied annotation practices “the book takes on a different role: as intellectual process, as valued object, and as available paper.”2 Perhaps your annotation of Beloved illustrated burgeoning intellectual acumen, or how much you valued the book, or perhaps it was just some paper to record the latest hallway gossip. It was George Steiner, the literary critic, who once suggested an intellectual was “quite simply, a human being who has a pencil in his or her hand when reading a book.”3 However you may have used that pencil while reading Morrison’s Beloved, your annotation provided information.
Annotation is an informative literary practice. Annotation, as with our example of Beloved, can signal different information to different readers. Notes of active reading may have better aided your future essay-writing and study, or have indicated to your teacher that you’re no dummy. Notes of ownership or recording may have also provided a stranger, who subsequently acquires this “used” copy from a bookstore, with insight about private noticing and musing. We know from those who have studied the history of books and reading practices that annotation was both ubiquitous and habitual by the 1500s, not long after the invention of the printing press and the growth of print culture. Such Medieval annotation also included drolleries, or small decorative images, drawn in the margins; what information might we glean from killer rabbits?4
By the 1600s, “graffiti” among the early modern book routinely provided information by adding detail, indicating references (particularly to scripture and classical literature), emphasizing importance, translating or clarifying terminology, and organizing the text.5
As with annotation today, notes can provide information about the reader and the reader’s private or social reading practices, about the text and relevant scholarly or political contexts, or maybe about a flower once pressed between the pages of a book.
As we will discuss throughout this chapter, what kind of information annotation provides, the contexts in which this information is written and received, and the modes through which this information is interpreted collectively influence how annotation is authored, understood, and trusted. To more easily examine the forms of annotation that provide information - as well as to foreground terminology used throughout the book - it’s useful at this point to distinguish among a few types of notes that, in both historic and contemporary contexts, have been added to texts.
In her oft-cited Marginalia: Readers Writing in Books, Heather Jackson, a professor of English at the University of Toronto, presents an extensive study of book annotation from 1700 into the twentieth century. Jackson focuses exclusively upon marginalia, or discursive and responsive notes written by hand anywhere in a book (and not only in the book’s margin). While non-verbal markings, including evidence of readers’ attention and engagement via asterisks or characters, are not marginalia according to Jackson, we do consider this repertoire of signs and symbols an important form of annotation (particularly among digital texts and contexts). Among book marginalia, however, Jackson categorizes three “basic particles:” the gloss, the rubric, and the scholium. Glosses, rubrics, and scholia each provide different types of information.
Remarking upon the complex relationships among readers, text, and annotation, the scholar James Nohrnberg once suggested: “Everything we read is thus a gloss upon, or a translation of, some original improvement upon silence.”6 While that may be true, in our book we appreciate Jackson’s suggestion that a gloss serves the explicit purposes of translation and explanation. Glosses, in her assessment, were often added to assist book readers with foreign and obscure words. The utility of glosses is underscored by the historian Anthony Grafton, who observed that “the margins of manuscripts and early printed texts in theology, law, and medicine swarm with glosses which, like the historian’s footnote, enable the reader to work backwards from the finished argument to the texts it rests on.”7 Glosses are literal and instructive. And in their more expansive and curated form, glosses comprise what we know of today as a glossary.
In addition to the gloss, book marginalia may also include a second form of annotation known as a rubric. The Latin rubrica translates as “red” and is the traditional color of rubrics found throughout medieval manuscript annotation. Jackson describes rubrication as the “scribal practice of writing or marking certain words in red,” with rubrics often corresponding to chapters, sections, or headings.8 The scholar of medieval literature Stephen Nichols suggests rubrics are an “extradiegetic intervention,” intentionally written in a color of ink different from the main text (hence red) so as to provide “metacritical perspective.”9 Rubricated manuscripts, like the thirteenth century French manuscript in Figure 8 (also further proof that doodling in the margins is a time-honored phenomenon), demonstrate how such medieval annotation predated forms of book marginalia by centuries. Together, and in a more contemporary print context, a group of rubrics becomes a book’s index.
A third form of marginalia that provides information is the scholium. Often referred to using the plural scholia, this type of annotation introduces to the text a new note, according to Jackson, “From outside the work that some scholar (usually) has judged relevant to it.”10 Scholia can elucidate an idea, share a useful example, provide a historical reference, or either affirm or contradict the author. Among famous instances of scholia are those added to the oldest copy of Homer’s Iliad, known as the Venetus A. This 645-page parchment book, written by Byzantine Greek scribes in the tenth century, includes both the full text of the Iliad and two sets of marginal scholia based upon the scholarship of Aristarchus.11 Scholia, when gathered together as complementary to a source text, established the literary genre of commentary.12
As discussed, our interest in annotation extends beyond book marginalia and print culture. Accordingly, our survey of annotation forms that inform should also include more than glosses, rubrics, and scholia (and drolleries of killer rabbits). Recall that we introduced annotation as multimodal notes in our previous introduction. Annotation, even when written, may be added to texts that communicate in multiple complementary modes. The history of cinema and the development of the motion picture industry relied upon text that was filmed and then manually edited into a movie. Silent era films featured intertitles, or text that shared characters’ dialogue or explained narrative elements. One of the most famous intertitles in movie history is still featured in films produced today: “A long time ago in a galaxy far, far away….”
Other types of annotation added to movies and television include subtitles that provide translation from one language to another. Textual and graphical information - picture the always-crawling news ticker - appear via annotation in the lower third of many broadcasts. Furthermore, multimodal annotation is critical for information accessibility. Closed captioning is a form of annotation that provides needed visual information, including the ability to read along when watching a public television at the gym or bar. And Twitter now allows users to add text descriptions to a tweet’s image that can be read aloud by assistive screen reading technology.
Forms of annotation - like subtitles and closed captions - provide information. This is why we previously suggested annotation is an everyday activity. Among scholarly traditions, forms of marginalia provide intellectual lineage to substantiate claims and offer the academic receipts relevant to skeptics and stalwart researchers. And annotation provides information privately or for self-interest. Whether doodling atop assigned course materials or adjusting a recipe, annotation can elevate a text from generic to personally sacred. As Edgar Allen Poe wrote, in 1844, “In the marginalia, too, we talk only to ourselves; we therefore talk freshly - boldly - originally - with abandonnement - without conceit.”13
Another form of annotation that informs are labels. Labels provide various kinds of information. When you’re at the grocery store, a label informs you that the red stuff in that jar is strawberry jam, rather than cherry. If you’re training a computer algorithm using image samples, a label will inform the program “this is a cat,” and not a cheeseburger - despite some visual similarity and the former perhaps having the latter.
Labels are like a gloss, literal and instructive. As a note added to a text, labels provide useful and contextual information. Texting a friend with your iPhone? Apple’s Messages app allows you to label a message with symbols - like a heart or a thumbs up - to add information.
Writing computer code? The programming language Java includes a set of standard annotations, all of which begin with the @ symbol, to label code and provide additional information about how certain methods should be performed. You likely read labels as annotation added to graphs and other visualizations, as with the graph of global GDP featuring labeled historical events and geographic regions. These are just some examples of the plethora of labels appended to texts in everyday contexts.
When you text a friend, that message includes a time stamp, a confirmation that it was “delivered,” and - depending on whether your friend has the feature enabled - a confirmation that the recipient read the message. This addition of labeled information is increasingly automated. Similarly, when using social networks we often know when a message was sent, from what location, and even from what kind of device. This additional information is conveyed through words, numbers, and symbols - including emojis and GIFs - added as labels to the primary message. As discussed in our introduction, symbols like the “Like” buttons of Twitter and Facebook layer a label upon a text message. As everyday forms of annotation, these labels reflect the pervasiveness of notes attached to texts both automatically and also purposefully. This additional information imperceptibly shifts our relationship to texts and notes.
Let’s consider in more detail two important uses of annotation as labeling: machine learning and scientific research. Both cases underscore the ubiquity of annotation and the utility of labeling to make meaning and produce new knowledge.
Have you ridden in an autonomous vehicle? Maybe you only trust self-driving cars as far as you could throw one. Whatever the case, the inevitable growth in autonomous transportation - among ride-hailing services, long-haul trucking, and autopilot features in luxury cars - will be aided, in part, by annotation. How so?
Consider all the various road signs - and the myriad colors, shapes, lights, and electronic messages of those signs - that influence your ability to safely commute from home to work and back again. Then add to that all the other vehicles, unanticipated obstacles, cyclists, and pedestrians on the go. How do artificial intelligence (AI) systems that pilot an autonomous vehicle distinguish among these variables to make intelligent, safe, and accurate decisions? The AI needs to learn from lots and lots of data.14 And whether with AI that powers autonomous transportation, customer service bots, precision agriculture, product recommendations, or targeted advertising, that data need to be labeled.
In simplest terms, annotation in the area of machine learning is the process of labeling data. Data like text, images, or audio are routinely gathered from cameras, sensors, and social media. And these data are often unstructured. In order to train AI systems to make sense of and also make informed decisions based upon these data, it’s necessary for data to be identified. That’s were annotation plays a role. And so, too, many people whose jobs are primarily dedicated to labeling data. The manual labeling of data - that is, annotation by people - remains a preferred method, though it is time-intensive, not error-proof, and may be biased.15 Because automated annotation - annotation by AI - is much faster, much cheaper, and ever-more accurate, there is vigorous debate about the merits of manual versus automated data labeling. In some instances, people check samples of automatically labeled data to ensure AI accuracy, or people rely upon automated annotation suggestions to improve their labeling.16
There are now numerous data annotation companies specializing in all manner of services, technologies, and processes for labeling data. And business is booming, both in the United States and globally.17 Companies like Infolks and iMerit employ thousands of workers across India who manually label images using techniques like bounding box and contour annotation. The labeled images will train autonomous vehicle algorithms to better identify road obstacles and conditions.18 And techniques for semantically tagging data go beyond mere labeling to add new information so that AI can learn to establish links, filter, and make inferences or predictions. Billions of images, billions of text segments, and billions of social media data are now annotated each year to power the intelligent algorithms that many of us take for granted as we shop, work, commute, watch recommended movies, and listen to suggested playlists.
Just as labeling has helped advance business innovation, so too has data annotation fueled scientific research and discovery.
One of the most famous examples of data labeling is the Human Genome Project, the effort that successfully identified and mapped all the genes comprising human DNA. To do so, scientists used genome annotation processes that added information - such as the name, location, and function of genes - to labeled and sequenced genetic material.19 Techniques for the manual and automatic labeling of genetic information continue to improve the accuracy, speed, and scale of such annotation, as with RNA and proteins.20 Similar processes have established databases of annotated genomes for vertebrates like mice and chimpanzees,21 as well as for worms and Drosophilidae, the most commonly studied laboratory organism.22
As labeling methods improve, some scientists have advocated “democratizing genome annotation” through the use of open source technology.23 Genomic annotation - essentially the identification and labeling of genetic data - has contributed new scientific understanding of evolution and disease, guided improvements to medicine, and helped develop bioinformatics. Yet at the same time, critique about the role of automated annotation for AI suggests increased consideration of data ethics and the unintended consequences of human-machine automation24 – a concern we’ll revisit in Chapter 7.
For biologists, or those who have studied biology, another use of annotation-as-labeling can be found in the phylogenetic tree. Ever since Darwin published The Origin of the Species in 1859, the idea of an evolutionary “tree” - and the popular conception of the tree of life - has motivated visual depictions of the relationship among different species. A phylogenetic tree is, simply, is a branching diagram that displays the phylogeny, or the evolutionary relationships, among individuals or groups of organisms based upon their similarities and differences. Today, phylogenetic trees are a common tool in biological study and technologies assist scientists in creating interactive and annotated trees. One popular web-based technology for the annotation of phylogenetic trees is Interactive Tree of Life, a free tool that supports both manual and automated annotation, customizable visualization and labeling, and the ability to link annotations to related datasets.25
Recently, automated annotation of mammography records were found to quickly identify meaningful data about patient needs.26 Although the technology needs to improve before use, one day doctors might use these annotated clinical records to make more informed decisions about patient care. It is very encouraging that annotation as labeling plays a small role in potentially helping doctors to better identify and treat breast cancer.
To continue exploring how forms of annotation inform, the remainder of this chapter focuses on promising possibilities for democratizing annotation and information using digital tools and practices. But before we do so, it’s necessary that we recognize how the relationship between annotation and information is neither arbitrary nor incidental. Seldom are chemistry formulas annotated within Beloved, nor literary analysis inside a grandmother’s cookbook. The 6,106 annotations comprising “Your Biking Wisdom in 10 Words,” a series of interactive maps curated by The New York Times and annotated with advice and opinion, is - presumably - most useful for cyclists in a given city who desire information about a relaxing ride or a safe commute.27 Ideally, annotation provides information that is useful, relevant, and timely.
Whether with manuscript annotation from the Elizabethan era or human-assisted annotation of big data, annotation is dependent on context. Let’s revisit the early modern period to better appreciate how information needs context. Now considered required reading for English majors, John Dryden’s 1681 poem Absalom and Achitophel is satire that relies upon readers’ knowledge of seventeenth century English politics, like the Popish Plot, as well as the biblical story of Absalom and King David. Knowing that readers in his day would understand and appreciate the biting and satirical poem, Dryden offered an introductory explanation about the intent of his genre-specific writing: “The true end of satire is the amendment of vices by correction.”
Were you to read Absalom and Achitophel three-and-a-half centuries after publication, understanding the poem’s political and biblical context, as well as its satirical nature, might require additional information. Today, readers of the poem often turn to footnotes, explanatory essays, and even primers in biblical literature before coming to terms with the poem’s content and meaning. Whether for scholars or students cramming for an English exam, such annotation is particularly useful in adding information about the poem’s characters as Absalom and Achitophel is a veiled allegory about King Charles II. Readers, today, benefit from annotation providing contextual information about Absalom and Achitophel; and so, too, did Dryden’s contemporaries require annotation. Printed versions of the poem often featured an annotated “key,” as in Figure 12, handwritten by the manuscript owner associating biblical characters with corresponding members of English royalty.
The “keys” added to Absalom and Achitophel demonstrate how annotation can assist with, in the words of English professor William Slights, “Transporting information across the borders of the printed page.”28 Both readers of the past and present require contextual information - provided by annotation - to more fully comprehend Absalom and Achitophel. And so, too, does this information help contextualize the poem’s purpose in a given period of time. When Dryden wrote the poem, it functioned as palace intrigue; today, the poem is regarded as quintessential satire and lauded for the use of heroic couplets.
Absalom and Achitophel is a useful example of how annotation provides information, as well as how that information needs context so readers can make sense of both text and note. As a poem with layered meaning and complex wordplay, Dryden’s work required - and, to this day, still requires - annotation for readers to make sense of overlapping historical, political, biblical, and aesthetic contexts. And with Absalom and Achitophel regarded as a valued contribution to English literature, contemporary readers rely upon annotation as contextual information to understand both the poem’s content and its scholarly significance in the literary landscape. The utility and relevance of annotation providing information implies that annotation is written, read, and informative within particular contexts - and so, too, particular communities.
When forms of annotation inform, and when annotation of a text provides information within and about a given context, the value and meaning of annotation may be pertinent to the interests of a group. We’ve mentioned various groups in this chapter, including medieval scribes, data annotators, and scientists. The activities of these groups are broadly concerned with the consumption, production, and dissemination of knowledge. In this respect, we might refer to a certain group - and whether among religious, professional, or scholarly settings, and often across networks and geography - as a knowledge community. And one important social practice among knowledge communities is the use of annotation to provide information.
In this final section of Chapter 2 we survey two cases of annotation providing information among knowledge communities. The first concerns a particular annotation technology primarily associated in name and function with a specific group, whereas the second describes how an emerging digital annotation infrastructure - and an exemplar technology - has assisted multiple communities to meet their diverse needs.
First, think of your favorite song. Maybe it’s a golden oldie from yesteryear, a hip-hop classic, or a recent earworm. No matter the case, there’s a very good chance that other people are fascinated by the song and its lyrics. The company Genius - formerly Rap Genius - maintains one of the most robust and lively communities for music fans to gather, discuss, and debate information about popular music.29 Genius uses annotation authored by their users, called “scholars,” to provide information about songs and increase the community’s “collective music IQ.” With over 25 million songs and two million contributors and editors, Genius is filled with interpretations, clarifications, and confirmations about song lyrics from all music genres.30
Genius annotation provides all manner of information, from highlighting historical facts within a song, to personal interpretation, to disagreement over the intended meaning of a lyric. A distinctive feature of Genius is “verified” annotation authored by artists and performers who provide their expert perspective and background knowledge. For example, the second studio album from Pulitzer Prize-winner Kendrick Lamar begins with the track “Sherane a.k.a Master Splinter’s Daughter.” On Genius, Lamar has contributed a number of verified annotations that provide information about the meaning of his song, as with the lyric: “The parade music we made had us all wearin’ shades now.” Lamar’s annotation informs his audience:
It is just basically talking about just the vibe and the atmosphere, more the party vibe. When I say ‘parade’ I am talking about the actual party and everybody having a good time, and making this feel like we have grown. We are feeling like we are adults, but really we don’t know shit. We are just faking it and living carefree.31
Genius is the twenty-first century version of the Great American Songbook, transformed into an interest-driven social network whose hive mind collaboratively annotates lyrics to inform and make meaning of songs and pop culture.
Whereas Genius refers to a specific technology and community whose annotation provides information, there is a long history of other digital and, now, web-based approaches to annotation that serve similar purposes for diverse knowledge communities.32 Web annotation is a means of annotating anything anywhere online, extending a layer of information and interactivity atop the entire web. According the World Wide Web Consortium (W3C), the standards body of the web, web annotation “can be linked, shared between services, tracked back to their origins, searched and discovered, and stored wherever the author wishes; the vision is for a decentralized and open annotation infrastructure.”33 Thanks to the W3C, this standards-based and interoperable approach to annotation,34 referred to as open web annotation, draws from a history of open-source software development, follows principles of the open web like accessibility and transparency, and advances digital annotation that provides information - and interaction - to any user of the web. And the technology that perhaps best exemplifies open web annotation is Hypothesis.35
Projects and research featuring Hypothesis open web annotation will appear throughout this book, as when we discuss journalism and fact-checking in Chapter 3, scholarly dialogues and peer review in Chapter 4, and various educational efforts in our chapter about learning (and full disclosure; we have collaborated with Hypothesis on various programs and research efforts). As an initial example of how Hypothesis provides information to knowledge communities, let’s consider how open web annotation can make research more accessible and transparent.
Annotation for Transparent Inquiry (ATI) is a collaboration among the Qualitative Data Repository (QDR) at Syracuse University, Hypothesis, and Cambridge University Press.36 ATI has been described as a “new approach to achieving transparency in qualitative research in the health and social sciences.”37 When conducting a study, researchers may collect all types of qualitative data to support their scholarly investigations and interpretations, such as text from archival documents, audio, video, or images. Some data may then be included as illustrative examples or aggregate summaries in an article, yet seldom can a colleague or curious reader trace a published assertion back to its supporting evidence. While maintaining ethical and legal protections, ATI uses Hypothesis to openly connect data with literature, thereby providing a transparent layer of publicly accessible information. Articles associated with ATI feature a public Hypothesis annotation layer curated by QDR. Individual annotations include citation information, extensive analytic notes, and a link to data housed within QDR databases. ATI, powered by Hypothesis, demonstrates how knowledge communities in the health and social sciences can make research accessible and transparent using open web annotation to provide information.
Whether with a rubricated thirteenth century French manuscript or an openly annotated research article, annotation propels the flow of information across texts and contexts. Notes enable readers to be more informed when engaging with the ideas and meanings of a text. Digital technologies, in particular, have created new opportunities for public, openly networked, and collaborative annotation, helping to democratize who can add a note to a text, how people access annotation, and how this information is shared across networks. Yet the participatory qualities of annotation providing information do not lack complexity or concern, especially when - as we discuss next - annotation blurs the line between information and comment.