Machine Learning Applications
01b_ml_applications.ipynb (last update: 2021-08-22)
Our First Machine Learning Model
01_intro_to_ml.ipynb (last update: 2021-08-22)
Optimization Algorithms in Machine Learning and Beyond
02_optimization_algorithms.ipynb (last update: 2021-03-23)
Regression
03_regression.ipynb (last update: 2021-04-06)
Hands-On Classification
not yet prepared (last update: xx-xx-xx)
Multiclass Classification
multiclass_classification.ipynb (last update: 2021-02-09): We discuss how to generalize a classification problem to a multiclass classification problem. First of all, we show how to transform a logistic regression model into a multinomial logistic regression model. Then we show, with the use of the Iris dataset, how to generalize the sklearn classification algorithms to multiclass problems. After an outlook into multiclass performance metrics, like a multiclass confusion matrix, we discuss so-called meta-estimators available in *sklearn.multiclass* which help to increase accuracy and runtime performance of the classifiers .
Hands-On Clustering
02_clustering.ipynb (last update: 2021-04-26): We analyze clustering algorithms both from a practical and a theoretical perspective. We go into detail of different clustering approaches, like k-means clustering, Gaussian mixture models, DBSCAN and hierachical clustering. In order to gain insights into the theoretical aspects of clustering we discuss the concept of similarity measures and define metrics to measure the quality of clustering methods. Finally we evaluate our techniques on a clustering use case.
Hands-On Clustering - Part II
02b_clustering.ipynb (last update: 2021-05-04)
Additional: Maximum Likelihood and Expectation-Maximization Algorithm
02c_MLE_and_EM_algorithm.ipynb (last update: 2021-10-19)
Hands-On Support Vector Machines
04_support_vector_machines.ipynb (last update: 2021-06-08)
Decision Trees and Random Forests
05_decision_trees_and_random_forests.ipynb (last update: 2021-06-22)
Boosting Methods
09_boosting_methods.ipynb (last update: 2021-07-06): We deepen our understanding of random forest algorithms, namely how boosting trees work. After discussing an analytical example we go over to the scikit learn's implementation of boosted trees. We also discuss most recent algorithms, as XGBoost, LightGBM and CatBoost.
Theory and Concepts
Q_A_genetic_algorithms_theory.ipynb (last update: 2021-07-20): Based on *Haupt & Haupt, Practical Genetic Algorithms (2004)* we discuss how to approach GAs both for binary as well as continuous problems. We try to understand how to encode variables, find the initial population, perform the natural selection process, discuss mating/crossover strategies and mutation strategies until convergence is reached.
Applications
Q_A_genetic_algorithms_applications.ipynb: The knapsack problem and the traveling salesman problem. (last update: 2021-07-20)
Performance Metrics
performance_measures.ipynb (last update: 2020-12-22) We discuss how to evaluate the performance of a machine-learning algorithm, both for supervised and unsupervised tasks. Jupyter notebook exploring the individual performance measures from the *sklearn.metrics* functions.
Recommendation Systems
recommendation_systems.ipynb` (last update: 2021-01-05): We discuss the basic principles of how to implement recommendation systems. For the MovieLens dataset we build up a first, simple user-based collaborative filtering movie recommendation system.
Machine Learning and Parallel Computing
multiclass_classification.ipynb (last update: 2021-02-23): We show on a simple example how easy it is to parallelize a for-loop in python (see main.py and main_multi.py). We then turn to parallelizable tasks in Machine Learning, the difference between data and model parallelization, GPU usage and cloud computing.
Open Questions
open_questions.ipynb (last update: 2021-08-10): Open questions on Machine Learning, where you can test your knowledge and understanding.