Assignment for the course Business Intelligence and Big Data Analytics in the 4th year winter semester:
Use SQL Server (database and analysis services) or MySQL + Pentaho:
- A large data set will be found, which will will be cleaned and inserted into a data warehouse. Then a data cube and various metrics should be created. [40%]
- A visualization tool (Tableau or Power BI) will be used to create various instances of data visualization. [20%]
- The warehouse data will be used for some mining operations, such as categorization, correlation rules, clustering, etc. Use trading system methods and models or an open-source tool. Implement at least two models. [40%]
The analysis concerns the Los Angeles crimes from 2010-19. All the information about the dataset can be found here. In detail:
- LA CRIMES PRESENTATION.pdf is the presentation of the analysis, application and conclusions. Quick way to examine the analysis.
- Cleaning data.ipynb is the notebook file executing the ETL process (mostly extract and cleaning data).
- Data Visualizations.ipynb are some examples of complex visualisations of the crime data with python. For all the visualizations check out the book, in section
VISUALIZATIONS
. - Clustering.ipynb describes three different clustering analyses, visualizations about them and the conclusions.
- Machine Learning.ipynb is the analysis of training decision trees and their extensions in order to predict the crime type of unique incidents by using the rest information.
- LA Crimes Book Greek.pdf is the whole book of the detailed analysis written in the greek language.
Firstly follow the instructions and run Cleaning data.ipynb
and then execute the others.