Methodologies
Our Methodology
Story Selection
We chose our set of stories for this project by examining the available Project Gutenberg texts and selecting the collection of stories that seemed most reasonable to mark up— a group of short (and therefore manageable) stories, already collected into a single book. We then split the stories among ourselves and marked them up according to the methodology listed below.
Markup Methods
In order to determine the perceived agency of characters in our selected stories, we examined two metrics: the number of words a character said, and the number of actions they performed.
After adding in the requisite structural tags (chapters,t paragraphs, and the like), we marked up our texts by:
- Using regular expressions to locate quotes (based on pairs of single or double quotes
with text between them) and replace the " or ' with
<q>
tags - Adding
@speaker
elements to each quote with the name of the character speaking - Manually reading and locating actions in the text
- For more information about how we determined what qualified as an action, visit this issue on our github.
- Locating the name or pronoun of the character performing the action and wrapping it in
a
<char>
tag with attributes for the@name
and@gender
of the character
Though our original research question called for additional markup denoting the alignment of a character, we decided to forgo that step in favor of focusing on the markup we kept.