The goal of this class is to create wrapper functions for various '''nltk''' and other libraries to make explatory data analysis of texts in English and french more easily done within a Jupyter notebook. Some functionality that I would like to see:
- The ability to take a text, and in a single line create a word cloud.
- Easily create data displays of substring frequency in the text.
- Show distribution of paragraph and sentence length.
- Easily show the the word frequencies in the text.
- Determine the grade level of various texts.
Below you will see some examples of how the functions can be used with. The goal is that these functions would be easily used in a Jupyter notebook for easy analysis
import digitalhumanities
etrangfr=digitalhumanities.Corpus("texts/etrangerfr.txt", 'french')
etrangfr.wordcloud()
etrenglish.show_occurences("?")