Project Update: Text Analysis

I’ve been working on my project making sure that everything is running properly and everything that I wanted to include is included. For one of the sections on my website I am using the text analysis tools we learned about in class to compare the work of Geoffrey Chaucer to other contemporary English authors. As part of my research I’ve found that some of the other popular English authors from the late 14th and early 15th century were William Langland, John Gower, and Julian of Norwich. I’ve found some versions of each of these authors’ texts as well as some of Chaucer’s other work.  I think that is interesting that with quite a few authors writing in the English language around the same time, that Chaucer is the one that is most associated with the growth of vernacular literature in England. Maybe comparing these authors with these text analysis tools can give part of the answer.

In order to input these texts into Voyant Tools for the text analysis I have to take each of them and put them into notepad so I can get rid of all of the extra stuff (like editors notes, or anything that isn’t the original authors words). This process can be quite tedious. Most of these texts have hundreds or thousands of lines, and combing through each of these to get rid of any parts where the editor or translator provides definitions or context takes quite a while. After doing about 4 or 5 of these, I am beginning to think that the preparing of these texts for Voyant Tools is probably the most time consuming and least exciting part of the project so far.

Advertisements
Project Update: Text Analysis

Review for Phoebe’s Project Demonstration

On Monday Phoebe presented her project in class and it seemed like it was coming along very well. I was impressed by how good it looked and how far along it was in its development. She developed a game that is meant to educate the player on the history of crime and punishment in England through a series of decisions and clickable options. The player takes the role of a person in the court trying to put forth evidence that will bring the right verdict on the accused. Choosing the right option will continue the game, whereas a wrong answer will force you to go back and choose a different answer. Between these decisions there are information panels that provide the historical information that explains why the right answer is actually right. In her presentation Phoebe mentioned that she was playing around with leaving these screens up until you click them or having them be timed. When I was playing around on the game I really liked the click to advance method because it gave me as much time as I needed to read it, so I am glad that she eventually decided on that method. These information screens are a great idea to bring about the educational aspect to this project. Without these screens the game would end up being a memorization game and the player would end up playing and not learning anything from the experience. The look of the game, as well as the music and sound effects are impressive and makes the gameplay experience more enjoyable, and it is evident how much time was put into perfecting the look and feel of the game.

The only issue that I had while playing the game was that after I had completed both of the sections I could not access the third section that was unlockable. I remember in her presentation that Phoebe said that she was still working on some aspects of the game, so I am assuming that this is the reason why this issue occurred.

Overall I was impressed with Phoebe’s project and found the game interesting. I have taken the Crime and Punishment class that she mentioned as an inspiration for this project and can see how the information has carried over from that class into this assignment. Phoebe’s project is a good example of a digital humanities project because it combines the digital and the humanities aspect very well. One of the keys in the digital humanities is how to apply digital techniques in teaching and I think that this project achieves that. With more and more teaching being done online or on a computer it is safe to assume that educational games may become one of the more popular ways of teaching students in the near future.

Review for Phoebe’s Project Demonstration

Stephanie’s Project Presentation Review

I really liked Stephanie’s idea for her project. It’s based on one of her old essays so she already has a good understanding of the topic. Her main idea is to create a website where she will inform people about the Psychiatric Survivor Movement and educate them about what it stands for and what it means. I think that this is a great idea for a topic and is something that should be talked about and explored because I had never heard of this movement before, and it is such a serious topic that needs more awareness. Stephanie also mentioned how she would be linking visitors out to some current movements that are working to raise awareness for mental illnesses such as the Bell Let’s Talk campaign. This is a great idea because hopefully when people are reading the information on her website they will be interested in doing what they can to help, and providing links to places where they can is a good way to bring even more awareness to these causes.

In her presentation Stephanie also mentioned that there is a website that has digitized copies of the “Phoenix Rising” journal that were written in the 1960s-1980s. These journals were written by former patients of mental illness and give a unique perspective into the topic of how these patients were treated. She plans to display these journals on her website and to use them to map out the growth of the Psychiatric Survivor Movement across Canada and the United States. I think that this is a good way to show how the movement grew, as more and more people came forward with their stories which are outlined in these journals. Mapping out the spread of this movement using some of the tools discussed in class would be a very interesting way to portray this information. She also mentioned her plans to record audio reading out ex-patient testimonies of the abuses they faced in these mental institutions. I think this is a good idea for two main reasons, the first being simply because it gives an extra bit of accessibility for her audience, and the second reason being that I think it would make these people’s stories resonate with the audience more, hearing it being read to them rather than reading off of a screen.

Stephanie’s topic reminded me of a class I took last year about the history of criminal punishment, and one week we looked at mental institutions and the conditions that the inmates were forced to live in. We watched a film called “Titicut Follies” a documentary about the conditions at a mental illness institution in the 1960’s that was pretty hard to watch. I’m not sure if that film would be helpful for her topic, but it may be something worth looking at. In all, I liked the idea for her project and her plans for incorporating digital humanities tools, and am excited to see how it all turns out.

Stephanie’s Project Presentation Review

My Project Plan

For my project I have decided to create an online exhibit/website based around Geoffrey Chaucer and his contributions to the development of English vernacular literature, with particular attention to the Canterbury Tales. This is a topic that I know a bit about from an essay I wrote last year and in doing the research for this essay I found the topic to be surprisingly interesting. It combines several different areas of historical study including the history of languages, the history of the British Isles, and unsurprisingly the history of the Catholic Church (which most aspects of medieval history centre around). The main goal of this project is to create a place where people could go to get a better understanding of Chaucer’s contribution to English vernacular literature but also as a place where people could learn about the Canterbury Tales. Before I had done any research on this topic I had only a vague understanding of Chaucer and in doing preliminary research I discovered that he is considered by some to be the “father of English literature” due to his writing of the Canterbury Tales and its role in establishing literature in the English vernacular. Seeing as nothing is ever that clear-cut, diving deeper in to the topic shows that while Chaucer was an influential figure in developing English literature, there were many other contributing factors that helped to cement the Canterbury Tales as one of the most influential written works in the English language. These contributing factors (for example the movement to translate the Bible into English before Chaucer’s writings) will be the basis of one of the sections for learning about the Canterbury Tales.

I am imagining something like Sparknotes (Here is the Sparknotes for the Canterbury Tales) where there is different sections for different topics I would want to cover, such as links to the actual writing, plot summaries, historical context, lesson plans focusing on the writings, other resources people could check out if they were so inclined, etc.. Seeing as this would be very similar to a Sparknotes site or anything like it, I would be providing additional analysis (using some of the Digital Humanities tools we have experimented with in class) and information about the Canterbury Tales. Including things like the historical significance of Chaucer’s works, and things like a text-analysis of his writing, maps plotting out important areas mentioned in the writing, even modern day examples of how influential Chaucer’s writings still are would be things I plan on including. When it is completed, this project should be a place where somebody would go if they had to do research on Chaucer or the Canterbury Tales, or if they were a teacher looking for a way to teach this topic to their class, or even if it’s just somebody who is interested in Chaucer or English literature in general and looking for any information they can find. I think that creating this project within a Digital Humanities framework is beneficial because it allows for more people to access it, and makes the information presented about the topic easier to digest than if it was just simply written in a textbook.

There are some projects that are similar to the one I am imagining, and in planning for my own project, have looked at a few of them for inspiration (and to make sure I am not just copying other people’s work). As mentioned above, Sparknotes and other sites here, here, here, and here are examples of similar projects that focus on teaching about the Canterbury Tales and give similar resources like study questions, and lesson plans (some of which you have to pay for)  as well as other resources to check out. In looking for these similar projects, I have yet to find one that does exactly what I want to do, that is provide a deeper analysis of the text than just a study guide. I want to create a place where the text itself is secondary to the historical significance and analysis.

Some of the tools that I plan on using are website design, text analysis, interactive mapping, among others. I decided to create a website for this project because of the flexibility and simplicity it allows for the user. If I were to create a blog site for this project and included all that I want to, the finished product would not be ideal. A blog would be difficult and tedious to select different pages or articles that hold the information. If I was to access this project as a student looking to research Chaucer I would not want to go through different blog posts or archives just to find the piece of information I need, and would probably close that webpage and look for another resource. Creating a website would also allow me to be more flexible in how I present my information through different sections and pages. I also think that a website would be more visually appealing than any other format. The text analysis is something that interests me because it’s not something that you see very much. In class we have done several text analysis experiments with the Google N-gram viewer, world clouds, and text token ratios that I think would be an interesting thing to include in my project. Using some of these text-analysis tools would be a nice little feature to give extra information on Chaucer or Canterbury Tales. For instance using the N-gram viewer to chart the usage of Chaucer and Canterbury Tales in written sources over time would be a cool way to see how time has remembered these two key aspects of English literature. The text token ratio could be used to compare Chaucer’s writing with other prominent authors of English literature, which would also allow for a view of how literature in the English language has changed over time. The only issue with any information gathered through any of these text analyses is that it would rely on assumptions from either myself or the user of the site about what they mean. For example, if the text token ratio showed that Charles Dickens used more distinct text tokens in some of their works than Chaucer did in his, what would that mean? Does it signify that Chaucer is less proficient a writer than him and does that even matter? These are a few of the questions that I will have to answer when creating these text analyses in my project. The idea of using maps as a tool is another topic we have looked at in class and for this project, I think it would be a great fit. The Canterbury Tales is about a group of people from different areas of medieval society telling stories to each other as they make a pilgrimage to Canterbury Cathedral. Creating/using a map that would outline the route that these characters would have taken on this pilgrimage and pointing out important areas on a map would be a very interesting take on the study of the Canterbury Tales, putting it into the context of history and geography.

The new skills I will have to develop and use for this project are essentially the skills mentioned above in the previous section. I am not the most technologically knowledgeable person so most of these Digital Humanities tools are new to me and will take some time to get comfortable with. Website creation and design is probably the most challenging thing about this project since it is the most technical aspect. I will probably be using Wix or Weebly as a way to build my website, like most people in class, because I have no interest in creating the site on my own using HTML. In high school I took a computer programming class and soon realized that computer coding and programming was not my thing, I don’t remember anything from that class, least of all any HTML programming information.

Obviously in creating this project I will have to conduct some research in order to be able to create an educational tool. Since this topic is something that I have already written a paper on, it should be easier and less time-consuming to find all of the relevant information that I need to create this project. One of the parameters for this write-up asks how I am going to either create new data or re-purpose old data. My plan so far would see me taking information from other sources (and crediting those of course) to provide the bulk of the educational information. I still have the bibliography from my own essay and other sources I had collected and consulted. Some of these sources are scholarly journals such as Andrew Cole’s “Chaucer’s English Lesson” which talks about how Chaucer came to use English as a literary language (key to the purpose of my project), or books like Lynne Arner’s “Chaucer, Gower, and the Vernacular Rising Poetry and the Problem of the Populace After 1381” which outlines many of the contributing factors to the rise of English vernacular literature, which would help to provide historical context to the Canterbury Tales. I would also be creating my own, new, data through the text-analysis I have mentioned above, and with this new data I could try and put Chaucer and the Canterbury Tales in the context of English literature as a whole.

In summary, my project idea is to create a web-resource where people could go and quickly learn about Chaucer and the Canterbury Tales, and why it has been remembered in history due to its part in helping to develop English as a literary language. Ideally this project, when completed, will be something that I wish I could have had when I was writing my essay about this topic a year ago.

My Project Plan

40 Maps That Explain the World: Weekly Reading

This week one of our required readings was a piece from the Washington Post called “40 Maps that explain the world”. As the title suggests it contains 40 different maps that each display a breakdown of a different question or topic across the different countries of the world.

The first of the 40 maps (and one of the most interesting to me) is a political map of the world from around 200 A.D. This map shows the extent of the Roman Empire at about its height, which is a pretty common thing to see when studying history or even looking at historical maps and documents. What is somewhat rare in this map is that it shows many other empires and civilizations that existed at the same time. Obviously we know that there were other civilizations at this time but to see them all on one map really puts the time period and Roman history into context. I had never thought that at the same time that Rome was at its greatest extent that the Maya and Teotihuacan civilizations existed in Mexico. Other things to take note are that the empires around modern day China and South Asia that I know next to nothing about, but in terms of land mass, appear to be almost on par with Rome. Maybe it is the way that we are taught history, but it is always interesting to find things that are usually taught in a vacuum (in this case Roman and Ancient Mexican civlizations) that occurred around the same time. Another thing to take note from this map is that maybe our study of history is extremely biased towards our own past (i.e. Rome and Roman Civilization) and almost completely ignoring (unless you are in a specialized class…) other histories, like in this map, Asian or African histories.

Another of the other interesting maps includes a breakdown of majority religion for each country, and to me the interesting part of this map is how linear it is. North to South America, southern Africa, and from Portugal to the very eastern end of Russia Christianity is the majority religion, but also from west Africa to Kazakhstan in an uninterrupted line the majority religion is Islam. This is very interesting because of how ordered everything seems on this map. For the most part there are no one-off countries or areas with a different religion, they are mostly grouped together. Finding out why and how this spread came to be could be a pretty interesting study.

Most of the other maps ask a question and show how each country would answer it, and the maps usually point out the obvious or stereotypical answer. For example one of the maps is about the best and worst places to be born which shows that North America and Europe are among the best places, and places like Eastern Europe/Russia, Africa and South Asia are among the worst. These results are pretty similar to other studies about income equality and social benefits so there is no real surprise to these findings. Another map shows where people are more or less emotional, and the results are what you may expect. According to this map, people in North America and parts of South America are most likely to report significant positive or negative emotions on a daily basis, whereas people in Eastern Europe and Russia are less likely to report these emotions.

Each of these 40 maps have an interesting story to tell and every one of them could be analyzed to give a different outlook on the world or to different countries. Many of them beg the question of how or why this map came to be, and that is probably where experts or historians could come in and find that out.

40 Maps That Explain the World: Weekly Reading

Text Analysis Experiment: Evaluating word usage with the Google Ngram Viewer

Last week we looked at some different methods of text analysis and I wrote about the Google Ngram Viewer, a tool that scans through Google Books’ collection for all the instances of a word and graphs the percentage of that word’s usage over time. This week I am looking at the Ngram viewer as a way of conducting a text analysis experiment.

I thought it would be interesting to see how the usage of different sports in writing would change over the years, and to see what events may have caused them. For this experiment, I decided to focus on just hockey (I had to remember to search “ice hockey” because I didn’t want any results about field hockey or whatever other kind of hockey there is) and soccer (also searching football for foreign usage) because those are the sports that I know the most about and had an idea of big events that could skew the usage into large peaks or valleys. I searched these keywords in different language collections as well because different parts of the world would have had different usages of these words at different times.

What I found was pretty interesting, and for the most part what I expected. I knew that big events like world cups or Olympics would impact the usage of these sports in writing, but others I had to look up and do some research to even guess why a certain sport was being written about more than usual. Obviously this is not the best way to do this experiment seeing as the Ngram viewer doesn’t have a way to look at where a book was written or published, just the language it was written in, and any conclusions I came to were just me assuming something based on an event that occurred during a year where usage was notable.

Looking at ice hockey appearing in the English language (assuming that the British English language encompasses Canadian sources) a few things stand out. As a topic in English writing, ice hockey didn’t really exist before the creation of the NHL in 1917, and as expected after the NHL expanded from 6 teams to 12 in 1967 a sharp rise in the usage of ice hockey as a term in writing occurred. The largest growth in the usage of the term occurs between 1995 and 2002 which I’m assuming that the 1994-95 NHL lockout had some effect, as well as the introduction  of Women and NHL players  into the Olympics in 1998. The peak comes in 2002 where I can only assume that every author in Canada wrote a book about Canada winning the gold medal at the Salt Lake City Olympics and breaking a 50 year drought. There is a plateau around 2005 which is probably related to the cancelled NHL season.   In American English sources there are some other interesting assumptions to be made. For example, there is a big growth in the usage of the term ice hockey starting in the late 1980s which coincides with the Wayne Gretzky trade to Los Angeles that many people believe was a spark for the popularity of hockey in the United States, so maybe this Ngram viewer experiment can give a bit of evidence to that claim.

For soccer (football) I realized that looking at English sources, especially American ones wouldn’t be that interesting so I looked at other languages like French, German, and Spanish to see if there was more interesting stories. In French sources, there is a pretty significant dip in the usage of “football” in 2002, which makes sense considering that when I did some research about soccer in France, 2002 was an embarrassing year where they blew out of the World Cup by losing every game despite coming into the tournament as the defending champions. Looking at German sources for the usage of “football” and “soccer” the most obvious change occurs beween 2004 and 2006 where usage grows exponentially, but then drops off right away after 2006. I am pretty confident in suggesting that this outcome is due to the fact that Germany was hosting the World Cup in 2006, so in the year leading up to it there was tons of literature written about the event and that afterwards there was nothing left to write about. The graph showing that the usage of both “football” and “soccer” declined at the exact same time in the same proportion is further evidence to this theory. Germany not winning might have also played a role in the sudden decrease. The stats for the usage of “football” and “soccer” in the Spanish language are the most interesting because the reasons for the results are not entirely obvious. Obviously the Spanish language is used in many countries so there is no one reason, but a few assumptions could be made. There is a huge peak of the usage of “football” around 1930 and then a large drop off through the Second World War. The first World Cup was held in Uruguay in 1930, which could be an explanation for the large peak, but the dramatic decrease in usage offers no clear possibility. One of the reasons could be the Second World War, or that Spain was undergoing a civil war in the late 1930s, and the instillation of a fascist government that may have had an impact on the usage of football in .Spanish literature.

Overall this experiment in text analysis using the Google Ngram Viewer provides some very interesting information, but it is much too basic to base any actual historical analysis on. The Ngram Viewer is great for getting a simple look at the usage of a term in writing over time, but that simplicity is not enough to develop a stable thesis off of. It doesn’t show data from every single book ever written, just the ones that Google has access to and there is no way to specify where you want your results to come from. For this experiment I was assuming a lot of things due to the limited features of the Ngram Viewer, like what countries were providing the sources for each search. For hockey I was assuming that British English sources were mostly Canadian, and for soccer it was hard to even guess what countries were providing the majority of French or Spanish results. The search function for books using the keywords is also very basic and not very helpful in this type of experiment which meant I was relying on my own research outside of the Ngram Viewer to attempt to figure out why the usage of the keywords would have gone up or down as they did.

Text Analysis Experiment: Evaluating word usage with the Google Ngram Viewer

Text Analysis as a Digital Tool

One of the readings for this week was called “The Remaking of Reading: Data Mining and the Digital Humanities” by Matthew Kirschenbaum. In this article he talks about the applications of data mining in the realm of digital humanities and specifically about how there are different types of reading and how digital tools can help with what he calls “distant reading”. The article says that distant reading is a way of finding out what is in a book without actually reading the whole thing, or even any of it. This is something that I and I’m sure most other people do when faced with reading things for classes or finding sources for papers. Trying to find the important parts of a reading by skimming over the body of the text, looking at the abstract before an article, using an in-text search to find certain keywords, or even searching other sources to see what they say about the source I’m trying to read are all things I have done when faced with readings in classes. As a history student, I think that this type of “predatory reading” (as another of my professors has put it) is especially valuable because of all of the primary sources that we are required to read and analyze. For most of these primary sources what is written within them is not what needs to be studied; it is the impact that those documents had that is being questioned and a great way to find that out is to look around at what other people are writing about this source. I’m sure that this method of “reading” is a form of laziness, but when faced with really dull and long passages to read it is a great way to efficiently understand the purpose behind that reading or whether or not it can be used in your paper.

So far what I’ve talked about is a basic description of what text analysis is, but the article mentions things that go way beyond the little picture I have painted above. The article states that there are more books than anybody could possibly read and that attempting to read these books closely (as opposed to distantly) is not the right approach. The digitization of books and other sources is part of text analysis and data mining projects that are attempting to benefit scholars and non-scholars alike.  Services like Google Books, or Project Gutenberg are making textual information widely accessible, and have come in handy countless times in my history studies. Once books are digitized they are then subjected to various projects and processes that categorize, search through, and organize the information written inside them so that scholars and others can quickly find what they need to find within these texts. This is a perfect example of a digital tool making the study of any of the humanities better for a student or scholar.

The article talks about a few projects being developed that have various capabilities in collecting and displaying data from text resources but they all display some aspect of the “distant reading” mentioned above. The information gained from these digital projects and tools can be applied to many avenues of analysis, such as the traits of certain writers, or even how many times certain keywords or phrases were used in books over a period of time. Information like this could be used to study societies as a whole through what topics were popular at a given time, and what people where writing about at certain time periods. Google Books Ngram Viewer is an example of one of these tools. You can type in a keyword and it will show you the frequency of that term being used in the sources that Google Books has access to. This is an interesting tool because you can see how  popular a certain term was at certain times. For example I searched “War” as a topic within English sources and you can see that a spike in frequency occurs around the time of the first and second world wars. Obviously this means that there were more people writing about wars in these times, but using the Ngram Viewer you could compare English sources with German or French sources to see if there is a difference in how each keyword was covered in different languages and in different parts of the world. This type of analysis would be nearly impossible without the implementation of digital tools into the study of humanities.

Text Analysis as a Digital Tool