Advanced text classification with SpaCy

In this project, I worked on implementing a fully functioning text classifier using SpaCy.

You can access the Google Colab notebook here

Dataset

Here I have used a dataset of Amazon fine food reviews.

This dataset consists of reviews of fine foods from amazon. The data span a period of more than 10 years, including all ~500,000 reviews up to October 2012. Reviews include product and user information, ratings, and a plain text review. It also includes reviews from all other Amazon categories.

Reviews.csv: 568,454 food reviews Amazon users left up to October 2012

Data includes:

Reviews from Oct 1999 - Oct 2012
568,454 reviews
256,059 users
74,258 products
260 users with > 50 reviews

Sense2vec

Sense2vec word embeddings model works better than word2vec , since it utilises contextual information from words.

The idea behind sense2vec is super simple. If the problem is that duck as in waterfowl and duck as in crouch are different concepts, the straight-forward solution is to just have two entries, duckN and duckV. Trask et al (2015) published a nice set of experiments showing that the idea worked well.

It assigns parts of speech tags like verb, noun , adjective to words, which will in turn be used to make sense of context.

Please book [VERB] my ticket.
Read the book [NOUN].

.

Here I have made use of Reddit vectors dataset for training sense2vec model. This is a corpus of Reddit vectors from Reddit comments. (trained on all comments of 2015).

Access the folder here

Technology stack and concepts

SpaCy
sklearn
nltk
pandas
tokenization
part-of-speech tagging
dependency parsing
lemmatization
named entities recognition

Visualizing Data

explacy - explaining how parsing is done
displaCy - visualizing named entities

Word vectors and similarity

sense2vec - using contextual information for building word embeddings

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Advanced_text_classification_spacy.ipynb		Advanced_text_classification_spacy.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Advanced text classification with SpaCy

Dataset

Sense2vec

Technology stack and concepts

Visualizing Data

Word vectors and similarity

License

About

Releases

Packages

Languages

nikjohn7/Advanced-text-classification-with-SpaCy

Folders and files

Latest commit

History

Repository files navigation

Advanced text classification with SpaCy

Dataset

Sense2vec

Technology stack and concepts

Visualizing Data

Word vectors and similarity

License

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages