Art by Timothy Bradstreet |
Text Analysis
Over the past few seasons we've made some changes to the format of the show. New for this season is the addition of analytical approaches to the text using a statistics software package called R and a front end called RStudio. Essentially, we will generate three items for each story to aid in our discussions: a word cloud that shows the most frequently used words in the story in tandem with a top ten most frequent words list, and a sentiment analysis score, which provides a numerical rating for the "tone" of the story. Our approach involves taking a .txt file of each story, removing common stop words (i.e. the, is, at, which, on, etc), removing punctuation, converting the words to lower case, and generating the word cloud from the resulting matrix of words. From there, we compare the word matrix to dictionaries of positive and negative words. Here is a look at our R script for those who might be curious, and here are links to the positive and negative word databases we'll be using.
For a more detailed explanation, we discuss these methods at about the 1:12 mark of this episode.
Word Cloud and Sentiment Score
Within the text of "The Daughter of Erlik Khan", we found a total of 557 positive words and 1,110 negative words using the methods outlined above. That gives us a sentiment score of -553 for this story. That means the tone of the story is generally negative, and you can explain it from the amount of combat and bloodshed that takes place!
It will be interesting to see how other stories in this season score in comparison to "The Daughter of Erlik Khan"!
The top eleven most frequently used words are:
Word |
Freq.
|
gordon
|
257
|
man
|
93
|
men
|
93
|
like
|
89
|
one
|
75
|
yogok
|
75
|
back
|
67
|
ormond
|
64
|
will
|
51
|
came
|
49
|
knew
|
49
|
We removed the word "gordon" from the word cloud in order to get a better look at the language Howard used. (Click here for a higher resolution copy of the image.) As you can see, "gordon" is an outlier at 257 occurrences vs the next most frequent word "man" at 93 occurrences. As you can see from the word cloud, it's a very masculine story! Hopefully over time, by comparing these word clouds, we can glean some interesting insights into Howard's writing style, demonstrate recurrent themes, and make comparisons between REH and other authors he was influenced by and that he influenced!
What do you think of this new text analysis approach? What jumps out at you from the word cloud? Let us know!
One Things
Jon: Paradise Sky, by Joe R. Lansdale.
Josh: Daredevil Season 2 on Netflix
Luke: Downbelow Station, by CJ Cherryh
Luke: Downbelow Station, by CJ Cherryh
Finally - check out Rogues in the House, a podcast focusing on heroic fantasy and sword & sorcery in gaming, comics, and pop culture. Tell them The Cromcast sent you!
Other topics discussed in this episode include (but are not limited to): Dodgeball, Ben Stiller, honor culture, Very Bad Wizards, Cassius Clay, Steranko!, Bourbon & Statistics, Disney as the Entertainment Singularity, Afghanistan, yurts, Indiana Jones, and mutton.
Next time
The Soul of a Regiment by Talbot Mundy!
Questions? Comments? Curses?
Email us! (thecromcast at gmail dot com)
You know you want to follow us on Twitter!Did you know that we're on Facebook?
We're posting photos on the Instagrams!
Subscribe to our feed on FeedBurner!
Or, check us out on iTunes!
We're also on Stitcher Radio and Google Play!
Finally.... Call us! (859) 429-CROM!
Legal Mumbo-Jumbo
Our episode is freely available on archive.org and is licensed under Creative Commons: By Attribution 3.0. http://creativecommons.org/licenses/by/3.0/ Themes by Kevin MacLeod (incompetech.com) Licensed under Creative Commons: By Attribution 3.0. Pop music includes "Respect" by the Notorious B.I.G. All music was obtained legally; we hope our discussion of this content makes you want to go out and purchase the work!