DS-ML_projects

Data Science and Machine Learning projects

Article relevance classification

Description:
Creation of a relevance classifier for articles.

"Article_relevance_classification" directory contains the .ipynb file and data links (Google Drive)

Stack: Python, Jupyter Notebook, NumPy, Pandas, Scikit-learn, Matplotlib, nltk, pymorphy2, wordcloud

BERTopic model for news topic modeling

Description:
Implementation of BERTopic framework to create news topic model.

"BERTopic_news_clf" directory contains the source files and links

Stack: Python, Jupyter Notebook, BERTopic, Pandas, NumPy, Matplotlib, SentenceTransformers, Scikit-learn

CNN MNIST classifier

Description:
The model identifies a digit on an input image using softmax output activation function. The accuracy of the CNN reaches about ~99%.

"CNN_MNIST_classifier" directory contains the .ipynb file
MNIST is imported from tensorflow.keras.datasets

Stack: Python, Jupyter Notebook, NumPy, Matplotlib, TensorFlow, Keras

CNN thermogram classifier

Description:
The CNN is projected as a fire-detection solution based on flight altitude obtained thermograms analysis. The model classifies a thermogram with values 0 (no fire) and 1 (fire). The reached accuracy of the network classification is about ~75%.

"CNN_thermogram_classifier" directory contains the .ipynb file and link to the dataset

Stack: Python, Jupyter Notebook, NumPy, Matplotlib, TensorFlow, Keras

Embeddings and Similarity

Description:
Realization of semantic search using cosine distance, GigaChatEmbeddings and Weaviate vector database.

"Embeddings_and_Similarity" directory contains source files, configs and news dataset with 1000 docs

Stack: Python, Pandas, LangChain, GigaChat SDK (GigaChain), Weaviate

Logistic regression Titanic classifier

Description:
A model to predict either a Titanic passenger will survive or not based on their passenger class, sex, age, amount of sibligs, amount of children and ticket price. The solution uses the logistic regression method provided by Scikit-learn library. The reached accuracy is top 80%.

"Logistic_regression_Titanic_classifier" directory contains the .ipynb file and the dataset

Stack: Python, Jupyter Notebook, NumPy, Pandas, Scikit-learn

MapReduce algorithm realization

Description:
Realization of MapReduce algorithm to find average title lengths for specified news sources based on open news dataset.

"Map-Reduce" directory contains the source files and data links

Stack: Python, Pandas

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DS-ML_projects

Article relevance classification

BERTopic model for news topic modeling

CNN MNIST classifier

CNN thermogram classifier

Embeddings and Similarity

Logistic regression Titanic classifier

MapReduce algorithm realization

About

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
Article_relevance_classification		Article_relevance_classification
BERTopic_news_clf		BERTopic_news_clf
CNN_MNIST_classifier		CNN_MNIST_classifier
CNN_thermogram_classifier		CNN_thermogram_classifier
Embeddings_and_Similarity		Embeddings_and_Similarity
Logistic_regression_Titanic_classifier		Logistic_regression_Titanic_classifier
Map-Reduce		Map-Reduce
.gitignore		.gitignore
README.md		README.md

Angon-pro/DS-ML_projects

Folders and files

Latest commit

History

Repository files navigation

DS-ML_projects

Article relevance classification

BERTopic model for news topic modeling

CNN MNIST classifier

CNN thermogram classifier

Embeddings and Similarity

Logistic regression Titanic classifier

MapReduce algorithm realization

About

Topics

Resources

Stars

Watchers

Forks

Languages