Word Representation, Text Classification and Machine Translation

Word Embeddings with Word2Vec

Following things have executed here:

Preprocessing the training corpus (removed special characters, empty strings, digits and stopwords from the sentences and changed words into lower cases). Sentences with fewer than 3 words or at least empty sentences have been removed.
Creating the corpus vocabulary and preparing the dataset (word2idex("word":idx) and idx2word(idx:"word")).
Building the skip-gram neural network architecture
- Initialize and transform the input and context word
- Merge the inputs and pass these merged inputs to a output layer with sigmoid activation function.
- Initialize and then compile the model.
Training the models with 5 epochs
Getting the word embedding and visualizing them using t-SNE

Using LSTMs for Text Classification

Following things have executed here:

Readying the inputs for the LSTM. As there is varying length inputs, a fixed length is obtained by padding and truncating.
Building the model
- Sorting the inputs expected and adding an embedding layer.
- Adding an LSTM layer and finally a fully connected (Dense) layer.
- The model has a 'binary_crossentropy' loss function and an 'adam' optimizer.
Training the model with bacth size 1000.
Model is evaluated on the test data and word embeddings are extracted (for visualization purpose).

Comparing Classification Models

Model 1: Neural averaging network using one-hot vectors:
- The inputs are as an one-hot layer. The second layer is to compute average on all word vectors in a sentence without considering padding.
- The output vector is piped through a fully-connected layer. The last layer is connected with a single output node with the sigmoid activation function. The final value is a float between 0 and 1.
Model 2: Neural averaging network using embedding layer:
- Instead of one-hot vectors, embeddings are used (integer-encoded vocabulary and looks up the embedding vector for each word-index).
Model 3: Using pre-trained word embeddings (GloVe):
- Neural bag of words using pre-trained word embeddings
  - Fine-tuning the pre-trained embeddings
- LSTM with pre-trained word embeddings
Model 4: Adding extra dense layer into Neural averaging network model:
- Adding one-extra and two-extra dense layers
Model 5: CNN for Text Classification
- Building a basic CNN model and then adding an extra convolutional layer to it.

Aspect-Based Sentiment Analysis

Given a review and an aspect, the sentiment conveyed is classified towards that aspect on a three-point scale: POSITIVE, NEUTRAL, and NEGATIVE. According to the word_index and the tokenizer function (text_to_word_sequence), the review text and aspect words are converted to word tokens and integers separately.

Neural bag of words without pre-trained word embeddings
CNN without pre-trained word embeddings
Using pre-trained word embeddings:
- Neural bag of words using pre-trained word embeddings
- CNN with pre-trained word embeddings
Model with multiple-input:
- Neural bag of words model with multiple-input
- CNN model with multiple-input
Another LSTM model

Neural Machine Translation

Deep neural network using seq2seq modelling with and without attention.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
nmt_lab_files		nmt_lab_files
7001_2021_22_lab1_wordEmbeddings_without_answers.ipynb		7001_2021_22_lab1_wordEmbeddings_without_answers.ipynb
7001_2021_22_lab2_TextClassificationWithLSTMs_without_answers.ipynb		7001_2021_22_lab2_TextClassificationWithLSTMs_without_answers.ipynb
7001_2021_22_lab3_Text_Classification_FFNNCNNLSTM_without_answers.ipynb		7001_2021_22_lab3_Text_Classification_FFNNCNNLSTM_without_answers.ipynb
7001_2021_22_lab4_Aspect_Based_Sentiment_Analysis_without_answers.ipynb		7001_2021_22_lab4_Aspect_Based_Sentiment_Analysis_without_answers.ipynb
README.md		README.md
Report.pdf		Report.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Word Representation, Text Classification and Machine Translation

Word Embeddings with Word2Vec

Using LSTMs for Text Classification

Comparing Classification Models

Aspect-Based Sentiment Analysis

Neural Machine Translation

About

Releases

Packages

Languages

Animesh-Chourey/Word_Representation-Text_Classification-and-Machine_Translation

Folders and files

Latest commit

History

Repository files navigation

Word Representation, Text Classification and Machine Translation

Word Embeddings with Word2Vec

Using LSTMs for Text Classification

Comparing Classification Models

Aspect-Based Sentiment Analysis

Neural Machine Translation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages