Statistical Data Analysis project for Università degli Studi di Salerno. Academic year 2019/2020.
See the documentation to understand the goals of the project.
To use the ColabSDA2020.ipynb Python colab notebook:
- Download it and save a copy in your drive, or
- Click this link to open it in Google Colab
Please refer to the official Google Colab documentation for GitHub for any problem or further information
- To correctly use the script, run packages.R and models.R first
- preprocessing.R will merge the datasets into preprocessed_complete.csv
- There is no need to run it, you can just import the final dataset
- linear_regression.R contains fits a multiple linear regression for all the models
- resampling_methods.R applies the Validation Set Approach and K-Fold Cross-Validation to estimate the test MSE and the Bootstrap to estimate the coefficients standard errors
- fit_approach_linear.R applies subset selection methods to the linear model
- fit_approach_poly2.R applies subset selection methods to the polynomial model of degree 2
- fit_approach_poly3.R applies subset selection methods to the polynomial model of degree 3
- fit_approach_poly4.R applies subset selection methods to the polynomial model of degree 4
- regularization_linear.R applies Ridge and LASSO regularization to the linear model
- regularization_poly2.R applies Ridge and LASSO regularization to the polynomial model of degree 2
- regularization_poly3.R applies Ridge and LASSO regularization to the polynomial model of degree 3
- regularization_poly4.R applies Ridge and LASSO regularization to the polynomial model of degree 4
- pcr_pls_linear.R applies PCR and PLS methods to the linear model
- pcr_pls_poly2.R applies PCR and PLS methods to the polynomial model of degree 2
- pcr_pls_poly3.R applies PCR and PLS methods to the polynomial model of degree 3
- pcr_pls_poly4.R applies PCR and PLS methods to the polynomial model of degree 4