Data Science and Machine Learning projects
Description:
Creation of a relevance classifier for articles.
"Article_relevance_classification" directory contains the .ipynb file and data links (Google Drive)
Stack: Python, Jupyter Notebook, NumPy, Pandas, Scikit-learn, Matplotlib, nltk, pymorphy2, wordcloud
Description:
Implementation of BERTopic framework to create news topic model.
"BERTopic_news_clf" directory contains the source files and links
Stack: Python, Jupyter Notebook, BERTopic, Pandas, NumPy, Matplotlib, SentenceTransformers, Scikit-learn
Description:
The model identifies a digit on an input image using softmax output activation function. The accuracy of the CNN reaches about ~99%.
"CNN_MNIST_classifier" directory contains the .ipynb file
MNIST is imported from tensorflow.keras.datasets
Stack: Python, Jupyter Notebook, NumPy, Matplotlib, TensorFlow, Keras
Description:
The CNN is projected as a fire-detection solution based on flight altitude obtained thermograms analysis.
The model classifies a thermogram with values 0 (no fire) and 1 (fire). The reached accuracy of the network classification is about ~75%.
"CNN_thermogram_classifier" directory contains the .ipynb file and link to the dataset
Stack: Python, Jupyter Notebook, NumPy, Matplotlib, TensorFlow, Keras
Description:
Realization of semantic search using cosine distance, GigaChatEmbeddings
and Weaviate vector database.
"Embeddings_and_Similarity" directory contains source files, configs and news dataset with 1000 docs
Stack: Python, Pandas, LangChain, GigaChat SDK (GigaChain), Weaviate
Description:
A model to predict either a Titanic passenger will survive or not based on their passenger class, sex, age, amount of sibligs, amount of children and ticket price.
The solution uses the logistic regression method provided by Scikit-learn library. The reached accuracy is top 80%.
"Logistic_regression_Titanic_classifier" directory contains the .ipynb file and the dataset
Stack: Python, Jupyter Notebook, NumPy, Pandas, Scikit-learn
Description:
Realization of MapReduce algorithm to find average title lengths
for specified news sources based on open news dataset.
"Map-Reduce" directory contains the source files and data links
Stack: Python, Pandas