Notes on Corpus Analysis
From what I remember we talked about more than just age range, but rather searching for key phrases that might help determine age range but also whether the writer was in the military, etc. Here are some potential options for doing that.
Scripts to accomplish phrase searching
- With Python - https://corpuslinguisticmethods.wordpress.com/2014/02/14/query-a-text-corpus-with-python/
- With R - http://stackoverflow.com/questions/8996513/r-text-mining-counting-the-number-of-times-a-specific-word-appears-in-a-corpus
- With php - http://stackoverflow.com/questions/3950622/how-to-search-text-using-php-if-text-contains-world
Concordance Tools
- Antconc - http://www.laurenceanthony.net/software/antconc/
- Tapor - http://www.tapor.ca/
- Mallett (more for topic modelling but can be tweaked to do this kind of thing too) - http://mallet.cs.umass.edu/topics.php
Maybe a good first step would be to play with concordancing and see how far that gets us, and if we need to do some more sophisticated work we could look at Python. I don't have much experience with R or php, but that might be another potential option as well.