Text-Analysis

This is not a module for large scale use, but rather a set of scripts to explain popular methodologies in text analysis, including Web Scraping, Preprocessing, Skip Gram (word2vec), and Topic Modelling.

1. Web Scraping

How can I download text data from a website algorithmically using Python? How do I store the data in a csv file for later use?

Web_Scraping.py: explains how to download movie quotes and store the data neatly in a table using the Pandas Python module.

2. Preprocessing

How are documents and words represented in Python? How can I clean text in Python by removing unnecessary words and adjusting for infrequent words?

Text_Preprocessing.py: explains common ways of representing text data in Python through one-hot encoded vectors, cleaning data with removal of stopwords and lowercasing, and TF-IDF weights.

3. EM-Algorithm

How can I discover topics of documents? I.e. how can I calculate how much one article is about sports, another about business, etc.?

EM_Algorithm.py: explains how to estimate a distribution using the EM-Algorithm. This is a precursor to the topic modelling example.

4. Gibbs Sampling

How can I discover topics of documents? I.e. how can I calculate how much one article is about sports, another about business, etc.?

Gibbs_Sampling.py: explains how Gibbs sampling works in the context of topic modelling.

5. Skip Gram

How can I find which words in my documents are related to each other syntactically and semantically? How does a basic neural network work?

Skip_Gram.py: explains how the Skip Gram model from Mikolov et al. works (with gradient descent and no negative sampling).

6. Long Short Term Memory

How can I develop a language model with memory? How does backpropogation/gradient descent through time work?

LSTM_Tutorial.py: explains the backpropogation of an LSTM model. Extends the code from Nicolas Jimenez to train a language model with memory.

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
.ipynb_checkpoints		.ipynb_checkpoints
markdown		markdown
1_Web_Scraping.py		1_Web_Scraping.py
1_Web_Scraping_Tutorial.ipynb		1_Web_Scraping_Tutorial.ipynb
2_Text_Preprocessing.py		2_Text_Preprocessing.py
2_Text_Preprocessing_Tutorial.ipynb		2_Text_Preprocessing_Tutorial.ipynb
3_EM_Algorithm.py		3_EM_Algorithm.py
3_EM_Algorithm_Tutorial.ipynb		3_EM_Algorithm_Tutorial.ipynb
4_Gibbs_Sampling.py		4_Gibbs_Sampling.py
4_Gibbs_Sampling_Tutorial.ipynb		4_Gibbs_Sampling_Tutorial.ipynb
5_Skip_Gram.py		5_Skip_Gram.py
5_Skip_Gram_Tutorial.ipynb		5_Skip_Gram_Tutorial.ipynb
6_LSTM.py		6_LSTM.py
6_LSTM_Tutorial.ipynb		6_LSTM_Tutorial.ipynb
movie_quotes.csv		movie_quotes.csv
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text-Analysis

1. Web Scraping

2. Preprocessing

3. EM-Algorithm

4. Gibbs Sampling

5. Skip Gram

6. Long Short Term Memory

About

Releases

Packages

Languages

pesoto/Text-Analysis

Folders and files

Latest commit

History

Repository files navigation

Text-Analysis

1. Web Scraping

2. Preprocessing

3. EM-Algorithm

4. Gibbs Sampling

5. Skip Gram

6. Long Short Term Memory

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages