Text Analysis as a Digital Tool

One of the readings for this week was called “The Remaking of Reading: Data Mining and the Digital Humanities” by Matthew Kirschenbaum. In this article he talks about the applications of data mining in the realm of digital humanities and specifically about how there are different types of reading and how digital tools can help with what he calls “distant reading”. The article says that distant reading is a way of finding out what is in a book without actually reading the whole thing, or even any of it. This is something that I and I’m sure most other people do when faced with reading things for classes or finding sources for papers. Trying to find the important parts of a reading by skimming over the body of the text, looking at the abstract before an article, using an in-text search to find certain keywords, or even searching other sources to see what they say about the source I’m trying to read are all things I have done when faced with readings in classes. As a history student, I think that this type of “predatory reading” (as another of my professors has put it) is especially valuable because of all of the primary sources that we are required to read and analyze. For most of these primary sources what is written within them is not what needs to be studied; it is the impact that those documents had that is being questioned and a great way to find that out is to look around at what other people are writing about this source. I’m sure that this method of “reading” is a form of laziness, but when faced with really dull and long passages to read it is a great way to efficiently understand the purpose behind that reading or whether or not it can be used in your paper.

So far what I’ve talked about is a basic description of what text analysis is, but the article mentions things that go way beyond the little picture I have painted above. The article states that there are more books than anybody could possibly read and that attempting to read these books closely (as opposed to distantly) is not the right approach. The digitization of books and other sources is part of text analysis and data mining projects that are attempting to benefit scholars and non-scholars alike.  Services like Google Books, or Project Gutenberg are making textual information widely accessible, and have come in handy countless times in my history studies. Once books are digitized they are then subjected to various projects and processes that categorize, search through, and organize the information written inside them so that scholars and others can quickly find what they need to find within these texts. This is a perfect example of a digital tool making the study of any of the humanities better for a student or scholar.

The article talks about a few projects being developed that have various capabilities in collecting and displaying data from text resources but they all display some aspect of the “distant reading” mentioned above. The information gained from these digital projects and tools can be applied to many avenues of analysis, such as the traits of certain writers, or even how many times certain keywords or phrases were used in books over a period of time. Information like this could be used to study societies as a whole through what topics were popular at a given time, and what people where writing about at certain time periods. Google Books Ngram Viewer is an example of one of these tools. You can type in a keyword and it will show you the frequency of that term being used in the sources that Google Books has access to. This is an interesting tool because you can see how  popular a certain term was at certain times. For example I searched “War” as a topic within English sources and you can see that a spike in frequency occurs around the time of the first and second world wars. Obviously this means that there were more people writing about wars in these times, but using the Ngram Viewer you could compare English sources with German or French sources to see if there is a difference in how each keyword was covered in different languages and in different parts of the world. This type of analysis would be nearly impossible without the implementation of digital tools into the study of humanities.

