GitHub - nicofretti/HAR: A solution of Human Activity Recognition with Smartphones challenge hosted on Kaggle

Human Activity Recognition with Smartphones (HAR)

This project contains my solution of the HAR problem hosted on Kaggle. The accuracy of the model is around 0.95.

Accuracy: 0.953

Analysis

The dataset has a total of 561 features, and it is divided into two sets:

Train: 7352 samples
Test: 2947 samples

The dataset is well-formed, and the activities are distributed among the samples.

The number of features is quite large, so an initial step is to try to reduce the number of features.

Dimensionality Reduction

In this case I have used the PCA algorithm to reduce the number of features. In the file tools.py there is the function PCA that performs the PCA's operation and returns the projection matrix that can be used to transform the data:

pca_proj = tools.PCA(x_train, n_eigenvectors)
pca_data = np.matmul(x_train, pca_proj.T)

The variable n_eigenvectory is the number of eigenvectors to be used, looking at the plot the number of eigenvectors for a correct coverage of 99% of the variance is around 154:

After applying the PCA algorithm the data are transformed into the new space where the activities are visibly separated:

After applying PCA to reduce even more the number of features, I have applied LDA, in order to reduce the number of features to C-1 where C is the number of classes:

lda_proj = tools.LDA(pca_data, y_train, n_classes=6)
lda_data = np.matmul(pca_data, lda_proj.T)

This way the data is transformed into the new space where the separability of the activities looks better:

Classification

For this step I have used the sklearn library to perform the classification. I choose the KNeighborsClassifier algorithm, because looking at the plot it is clear that there are some blobs where the classes are not well separated:

knn = KNeighborsClassifier(n_neighbors=20)
knn.fit(lda_data, y_train)

Conclusion

The number of features has been reduced from 561 to only 5 and the accuracy of the model is 0.95, looking at the confusion matrix it is clear that the model makes the wrong prediction with the classes SITTING and STANDING, as expected, because in the plot the two classes are still not well separated.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.idea		.idea
data		data
img		img
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
k_means.py		k_means.py
main.py		main.py
plots.py		plots.py
requirements.txt		requirements.txt
tools.py		tools.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Human Activity Recognition with Smartphones (HAR)

Analysis

Dimensionality Reduction

Classification

Conclusion

About

Releases

Packages

Languages

License

nicofretti/HAR

Folders and files

Latest commit

History

Repository files navigation

Human Activity Recognition with Smartphones (HAR)

Analysis

Dimensionality Reduction

Classification

Conclusion

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages