Sentitou: Hotel Review Sentiment Analysis

Sentitou is a machine learning project designed to analyze hotel reviews. The project utilizes the Multinomial Naive Bayes classifier, a probabilistic model, to classify reviews as positive or negative and to perform aspect-based sentiment analysis across several categories.

Classifier Details

The Multinomial Naive Bayes classifier observes outcomes, selects the best estimation, and returns it based on the highest probability or likelihood. In this project, the classifier is used for:

Positive/Negative classification: Determines whether a review is overall positive or negative.
Aspect-based classification: Analyzes specific aspects such as Food, Staff, Price, Pool/Sea, and Room.

Training Data

Positive/Negative Classifier: Trained on 10,000 hotel reviews.
Aspect-Based Classifier: Trained on 200 hotel reviews categorized into five aspects:
- Food
- Staff
- Price
- Pool/Sea
- Room

Data Preprocessing

The training data undergoes the following preprocessing steps:

Remove non-ASCII characters: Handled using regular expressions.
Convert to lowercase: Standardizes all text to lowercase.
Remove stopwords: Filters out common words that don't contribute to sentiment analysis.

Training Process

The training process includes:

Data Shuffling: 75% of the data is used for training, while 25% is reserved for testing. This ratio applies to both the Positive/Negative and Aspect-Based classifiers.
Model Training: The model calculates ( p(c) ) (prior probability) and ( p(d|c) ) (likelihood given class ( c_i )) for each class. It then forms a vocabulary list based on the training data.
Output: The training output is saved in three JSON files:
- Vocabulary
- Prior likelihood
- Likelihood

These JSON files are used by the Test_classifier function and the Predict class to analyze new data.

Performance

The model's accuracy was tested 10 times with resetting between tests:

Positive/Negative Classifier: Achieved an average accuracy of 95%.
Aspect-Based Classifier: Achieved an average accuracy of 92%.

Analyzer Class

The Analyzer class is responsible for:

Data Extraction: Fetches review data from the database. Only .xlsx files are processed.
Sentence Splitting: Divides reviews into sentences for detailed analysis.
Data Cleaning: Applies the same cleaning steps as during training.
Sentiment and Aspect Classification: Uses the Naive Bayes algorithm to determine the sentiment and categorize each sentence.

Rating Calculation

Ratings are dynamically calculated based on the frequency and sentiment of aspects mentioned in a review:

Multiple Mentions: If the same aspect is mentioned multiple times, it has a stronger influence on the rating.
Conflicting Sentiments: If an aspect is mentioned with both positive and negative sentiments, they neutralize each other.

The Analyzer generates data for:

Word clouds
Donut charts
The number of reviews mentioning each aspect

Project Structure

The project is developed using Object-Oriented Programming (OOP) principles. The main components include:

Main Window: The primary user interface.
Frames: Two classes for managing frames within the window.
Analyzer: The core class responsible for processing and analyzing the reviews.

User Interface

Figure 1: User Interface

Process Flowcharts

The following flowcharts illustrate the training and analysis processes:

Figure 2: Training Flowchart | Figure 3: Analyzing Flowchart

Data Collection

A custom bot was developed to crawl data for training our model. The source code can be found below, but please note that TripAdvisor updates platform frequently, which will affect the bot's functionality:

Selenium-TripAdvisorReviewScraper: GitHub Repository

Conclusion

Sentitou offers a robust solution for analyzing hotel reviews, providing insights into both overall sentiment and specific aspects of the customer experience. The use of a Naive Bayes classifier ensures accurate predictions, making it a valuable tool for understanding guest feedback in the tourism industry.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.idea		.idea
Analyser		Analyser
UserInterface		UserInterface
aspect_based_data		aspect_based_data
positive_negative_data		positive_negative_data
training_data		training_data
training_models		training_models
venv		venv
Main.py		Main.py
README.md		README.md
UI.jpeg		UI.jpeg
database.db		database.db
flowchart Training.jpeg		flowchart Training.jpeg
flowchart analysing.jpeg		flowchart analysing.jpeg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentitou: Hotel Review Sentiment Analysis

Classifier Details

Training Data

Data Preprocessing

Training Process

Performance

Analyzer Class

Rating Calculation

Project Structure

User Interface

Process Flowcharts

Data Collection

Conclusion

About

Releases

Packages

Languages

AhmetNSHN/sentitou

Folders and files

Latest commit

History

Repository files navigation

Sentitou: Hotel Review Sentiment Analysis

Classifier Details

Training Data

Data Preprocessing

Training Process

Performance

Analyzer Class

Rating Calculation

Project Structure

User Interface

Process Flowcharts

Data Collection

Conclusion

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages