Overview

{:.no_toc}

* TOC {:toc}

The goal

scikit-learn is a machine learning tool kit for data analysis.

pip install scikit-learn

Simple and efficient tools for predictive data analysis

Accessible to everybody, and reusable in various contexts

Built on NumPy, SciPy, and matplotlib

I will keep it short and I will mark the most relevant tools in bold

sklearn.base: Base classes and utility functions

see here

sklearn.calibration: Probability Calibration


calibration.CalibratedClassifierCV([...])	Probability calibration with isotonic regression or logistic regression.
calibration.calibration_curve(y_true, y_prob, *)	Compute true and predicted probabilities for a calibration curve.

sklearn.cluster: Clustering

Classes


cluster.AffinityPropagation(*[, damping, ...])	Perform Affinity Propagation Clustering of data.
cluster.AgglomerativeClustering([...])	Agglomerative Clustering.
cluster.Birch(*[, threshold, ...])	Implements the BIRCH clustering algorithm.
cluster.DBSCAN([eps, min_samples, metric, ...])	Perform DBSCAN clustering from vector array or distance matrix.
cluster.HDBSCAN([min_cluster_size, ...])	Cluster data using hierarchical density-based clustering.
cluster.FeatureAgglomeration([n_clusters, ...])	Agglomerate features.
cluster.KMeans([n_clusters, init, n_init, ...])	K-Means clustering.
cluster.BisectingKMeans([n_clusters, init, ...])	Bisecting K-Means clustering.
cluster.MiniBatchKMeans([n_clusters, init, ...])	Mini-Batch K-Means clustering.
cluster.MeanShift(*[, bandwidth, seeds, ...])	Mean shift clustering using a flat kernel.
cluster.OPTICS(*[, min_samples, max_eps, ...])	Estimate clustering structure from vector array.
cluster.SpectralClustering([n_clusters, ...])	Apply clustering to a projection of the normalized Laplacian.
cluster.SpectralBiclustering([n_clusters, ...])	Spectral biclustering (Kluger, 2003).
cluster.SpectralCoclustering([n_clusters, ...])	Spectral Co-Clustering algorithm (Dhillon, 2001).

Functions


cluster.affinity_propagation(S, *[, ...])	Perform Affinity Propagation Clustering of data.
cluster.cluster_optics_dbscan(*, ...)	Perform DBSCAN extraction for an arbitrary epsilon.
cluster.cluster_optics_xi(*, reachability, ...)	Automatically extract clusters according to the Xi-steep method.
cluster.compute_optics_graph(X, *, ...)	Compute the OPTICS reachability graph.
cluster.dbscan(X[, eps, min_samples, ...])	Perform DBSCAN clustering from vector array or distance matrix.
cluster.estimate_bandwidth(X, *[, quantile, ...])	Estimate the bandwidth to use with the mean-shift algorithm.
cluster.k_means(X, n_clusters, *[, ...])	Perform K-means clustering algorithm.
cluster.kmeans_plusplus(X, n_clusters, *[, ...])	Init n_clusters seeds according to k-means++.
cluster.mean_shift(X, *[, bandwidth, seeds, ...])	Perform mean shift clustering of data using a flat kernel.
cluster.spectral_clustering(affinity, *[, ...])	Apply clustering to a projection of the normalized Laplacian.
cluster.ward_tree(X, *[, connectivity, ...])	Ward clustering based on a Feature matrix.

sklearn.compose: Composite Estimators


compose.ColumnTransformer(transformers, *[, ...])	Applies transformers to columns of an array or pandas DataFrame.
compose.TransformedTargetRegressor([...])	Meta-estimator to regress on a transformed target.
compose.make_column_transformer(*transformers)	Construct a ColumnTransformer from the given transformers.
compose.make_column_selector([pattern, ...])	Create a callable to select columns to be used with ColumnTransformer.

sklearn.covariance: Covariance Estimators


covariance.EmpiricalCovariance(*[, ...])	Maximum likelihood covariance estimator.
covariance.EllipticEnvelope(*[, ...])	An object for detecting outliers in a Gaussian distributed dataset.
covariance.GraphicalLasso([alpha, mode, ...])	Sparse inverse covariance estimation with an l1-penalized estimator.
covariance.GraphicalLassoCV(*[, alphas, ...])	Sparse inverse covariance w/ cross-validated choice of the l1 penalty.
covariance.LedoitWolf(*[, store_precision, ...])	LedoitWolf Estimator.
covariance.MinCovDet(*[, store_precision, ...])	Minimum Covariance Determinant (MCD): robust estimator of covariance.
covariance.OAS(*[, store_precision, ...])	Oracle Approximating Shrinkage Estimator as proposed in [R69773891e6a6-1].
covariance.ShrunkCovariance(*[, ...])	Covariance estimator with shrinkage.
covariance.empirical_covariance(X, *[, ...])	Compute the Maximum likelihood covariance estimator.
covariance.graphical_lasso(emp_cov, alpha, *)	L1-penalized covariance estimator.
covariance.ledoit_wolf(X, *[, ...])	Estimate the shrunk Ledoit-Wolf covariance matrix.
covariance.ledoit_wolf_shrinkage(X[, ...])	Estimate the shrunk Ledoit-Wolf covariance matrix.
covariance.oas(X, *[, assume_centered])	Estimate covariance with the Oracle Approximating Shrinkage as proposed in [Rca3a42e5ec35-1].
covariance.shrunk_covariance(emp_cov[, ...])	Calculate a covariance matrix shrunk on the diagonal.

sklearn.cross_decomposition: Cross decomposition


cross_decomposition.CCA([n_components, ...])	Canonical Correlation Analysis, also known as "Mode B" PLS.
cross_decomposition.PLSCanonical([...])	Partial Least Squares transformer and regressor.
cross_decomposition.PLSRegression([...])	PLS regression.
cross_decomposition.PLSSVD([n_components, ...])	Partial Least Square SVD.

sklearn.datasets: Datasets

see here

sklearn.decomposition: Matrix Decomposition


decomposition.DictionaryLearning([...])	Dictionary learning.
decomposition.FactorAnalysis([n_components, ...])	Factor Analysis (FA).
decomposition.FastICA([n_components, ...])	FastICA: a fast algorithm for Independent Component Analysis.
decomposition.IncrementalPCA([n_components, ...])	Incremental principal components analysis (IPCA).
decomposition.KernelPCA([n_components, ...])	Kernel Principal component analysis (KPCA) [R396fc7d924b8-1].
decomposition.LatentDirichletAllocation([...])	Latent Dirichlet Allocation with online variational Bayes algorithm.
decomposition.MiniBatchDictionaryLearning([...])	Mini-batch dictionary learning.
decomposition.MiniBatchSparsePCA([...])	Mini-batch Sparse Principal Components Analysis.
decomposition.NMF([n_components, init, ...])	Non-Negative Matrix Factorization (NMF).
decomposition.MiniBatchNMF([n_components, ...])	Mini-Batch Non-Negative Matrix Factorization (NMF).
decomposition.PCA([n_components, copy, ...])	Principal component analysis (PCA).
decomposition.SparsePCA([n_components, ...])	Sparse Principal Components Analysis (SparsePCA).
decomposition.SparseCoder(dictionary, *[, ...])	Sparse coding.
decomposition.TruncatedSVD([n_components, ...])	Dimensionality reduction using truncated SVD (aka LSA).
decomposition.dict_learning(X, n_components, ...)	Solve a dictionary learning matrix factorization problem.
decomposition.dict_learning_online(X[, ...])	Solve a dictionary learning matrix factorization problem online.
decomposition.fastica(X[, n_components, ...])	Perform Fast Independent Component Analysis.
decomposition.non_negative_factorization(X)	Compute Non-negative Matrix Factorization (NMF).
decomposition.sparse_encode(X, dictionary, *)	Sparse coding.

sklearn.discriminant_analysis: Discriminant Analysis


discriminant_analysis.LinearDiscriminantAnalysis([...])	Linear Discriminant Analysis.
discriminant_analysis.QuadraticDiscriminantAnalysis(*)	Quadratic Discriminant Analysis.

sklearn.dummy: Dummy estimators


dummy.DummyClassifier(*[, strategy, ...])	DummyClassifier makes predictions that ignore the input features.
dummy.DummyRegressor(*[, strategy, ...])	Regressor that makes predictions using simple rules.

sklearn.ensemble: Ensemble Methods


ensemble.AdaBoostClassifier([estimator, ...])	An AdaBoost classifier.
ensemble.AdaBoostRegressor([estimator, ...])	An AdaBoost regressor.
ensemble.BaggingClassifier([estimator, ...])	A Bagging classifier.
ensemble.BaggingRegressor([estimator, ...])	A Bagging regressor.
ensemble.ExtraTreesClassifier([...])	An extra-trees classifier.
ensemble.ExtraTreesRegressor([n_estimators, ...])	An extra-trees regressor.
ensemble.GradientBoostingClassifier(*[, ...])	Gradient Boosting for classification.
ensemble.GradientBoostingRegressor(*[, ...])	Gradient Boosting for regression.
ensemble.IsolationForest(*[, n_estimators, ...])	Isolation Forest Algorithm.
ensemble.RandomForestClassifier([...])	A random forest classifier.
ensemble.RandomForestRegressor([...])	A random forest regressor.
ensemble.RandomTreesEmbedding([...])	An ensemble of totally random trees.
ensemble.StackingClassifier(estimators[, ...])	Stack of estimators with a final classifier.
ensemble.StackingRegressor(estimators[, ...])	Stack of estimators with a final regressor.
ensemble.VotingClassifier(estimators, *[, ...])	Soft Voting/Majority Rule classifier for unfitted estimators.
ensemble.VotingRegressor(estimators, *[, ...])	Prediction voting regressor for unfitted estimators.
ensemble.HistGradientBoostingRegressor([...])	Histogram-based Gradient Boosting Regression Tree.
ensemble.HistGradientBoostingClassifier([...])	Histogram-based Gradient Boosting Classification Tree.

sklearn.exceptions: Exceptions and warnings

see here

sklearn.experimental: Experimental

see here

sklearn.feature_extraction: Feature Extraction


feature_extraction.DictVectorizer(*[, ...])	Transforms lists of feature-value mappings to vectors.
feature_extraction.FeatureHasher([...])	Implements feature hashing, aka the hashing trick.

From images


feature_extraction.image.extract_patches_2d(...)	Reshape a 2D image into a collection of patches.
feature_extraction.image.grid_to_graph(n_x, n_y)	Graph of the pixel-to-pixel connections.
feature_extraction.image.img_to_graph(img, *)	Graph of the pixel-to-pixel gradient connections.
feature_extraction.image.reconstruct_from_patches_2d(...)	Reconstruct the image from all of its patches.
feature_extraction.image.PatchExtractor(*[, ...])	Extracts patches from a collection of images.

From text


feature_extraction.text.CountVectorizer(*[, ...])	Convert a collection of text documents to a matrix of token counts.
feature_extraction.text.HashingVectorizer(*)	Convert a collection of text documents to a matrix of token occurrences.
feature_extraction.text.TfidfTransformer(*)	Transform a count matrix to a normalized tf or tf-idf representation.
feature_extraction.text.TfidfVectorizer(*[, ...])	Convert a collection of raw documents to a matrix of TF-IDF features.

sklearn.feature_selection: Feature Selection


feature_selection.GenericUnivariateSelect([...])	Univariate feature selector with configurable strategy.
feature_selection.SelectPercentile([...])	Select features according to a percentile of the highest scores.
feature_selection.SelectKBest([score_func, k])	Select features according to the k highest scores.
feature_selection.SelectFpr([score_func, alpha])	Filter: Select the pvalues below alpha based on a FPR test.
feature_selection.SelectFdr([score_func, alpha])	Filter: Select the p-values for an estimated false discovery rate.
feature_selection.SelectFromModel(estimator, *)	Meta-transformer for selecting features based on importance weights.
feature_selection.SelectFwe([score_func, alpha])	Filter: Select the p-values corresponding to Family-wise error rate.
feature_selection.SequentialFeatureSelector(...)	Transformer that performs Sequential Feature Selection.
feature_selection.RFE(estimator, *[, ...])	Feature ranking with recursive feature elimination.
feature_selection.RFECV(estimator, *[, ...])	Recursive feature elimination with cross-validation to select features.
feature_selection.VarianceThreshold([threshold])	Feature selector that removes all low-variance features.
feature_selection.chi2(X, y)	Compute chi-squared stats between each non-negative feature and class.
feature_selection.f_classif(X, y)	Compute the ANOVA F-value for the provided sample.
feature_selection.f_regression(X, y, *[, ...])	Univariate linear regression tests returning F-statistic and p-values.
feature_selection.r_regression(X, y, *[, ...])	Compute Pearson's r for each features and the target.
feature_selection.mutual_info_classif(X, y, *)	Estimate mutual information for a discrete target variable.
feature_selection.mutual_info_regression(X, y, *)	Estimate mutual information for a continuous target variable.

sklearn.gaussian_process: Gaussian Processes


gaussian_process.GaussianProcessClassifier([...])	Gaussian process classification (GPC) based on Laplace approximation.
gaussian_process.GaussianProcessRegressor([...])	Gaussian process regression (GPR).

Kernels


gaussian_process.kernels.CompoundKernel(kernels)	Kernel which is composed of a set of other kernels.
gaussian_process.kernels.ConstantKernel([...])	Constant kernel.
gaussian_process.kernels.DotProduct([...])	Dot-Product kernel.
gaussian_process.kernels.ExpSineSquared([...])	Exp-Sine-Squared kernel (aka periodic kernel).
gaussian_process.kernels.Exponentiation(...)	The Exponentiation kernel takes one base kernel and a scalar parameter and combines them via
gaussian_process.kernels.Hyperparameter(...)	A kernel hyperparameter's specification in form of a namedtuple.
gaussian_process.kernels.Kernel()	Base class for all kernels.
gaussian_process.kernels.Matern([...])	Matern kernel.
gaussian_process.kernels.PairwiseKernel([...])	Wrapper for kernels in sklearn.metrics.pairwise.
gaussian_process.kernels.Product(k1, k2)	The Product kernel takes two kernels k1 and k2 and combines them via
gaussian_process.kernels.RBF([length_scale, ...])	Radial basis function kernel (aka squared-exponential kernel).
gaussian_process.kernels.RationalQuadratic([...])	Rational Quadratic kernel.
gaussian_process.kernels.Sum(k1, k2)	The Sum kernel takes two kernels k1 and k2 and combines them via
gaussian_process.kernels.WhiteKernel([...])	White kernel.

sklearn.impute: Impute


impute.SimpleImputer(*[, missing_values, ...])	Univariate imputer for completing missing values with simple strategies.
impute.IterativeImputer([estimator, ...])	Multivariate imputer that estimates each feature from all the others.
impute.MissingIndicator(*[, missing_values, ...])	Binary indicators for missing values.
impute.KNNImputer(*[, missing_values, ...])	Imputation for completing missing values using k-Nearest Neighbors.

sklearn.inspection: Inspection


inspection.partial_dependence(estimator, X, ...)	Partial dependence of features.
inspection.permutation_importance(estimator, ...)	Permutation importance for feature evaluation [Rd9e56ef97513-BRE].

Plotting


inspection.DecisionBoundaryDisplay(*, xx0, ...)	Decisions boundary visualization.
inspection.PartialDependenceDisplay(...[, ...])	Partial Dependence Plot (PDP).

sklearn.isotonic: Isotonic regression


isotonic.IsotonicRegression(*[, y_min, ...])	Isotonic regression model.
isotonic.check_increasing(x, y)	Determine whether y is monotonically correlated with x.
isotonic.isotonic_regression(y, *[, ...])	Solve the isotonic regression model.

sklearn.kernel_approximation: Kernel Approximation


kernel_approximation.AdditiveChi2Sampler(*)	Approximate feature map for additive chi2 kernel.
kernel_approximation.Nystroem([kernel, ...])	Approximate a kernel map using a subset of the training data.
kernel_approximation.PolynomialCountSketch(*)	Polynomial kernel approximation via Tensor Sketch.
kernel_approximation.RBFSampler(*[, gamma, ...])	Approximate a RBF kernel feature map using random Fourier features.
kernel_approximation.SkewedChi2Sampler(*[, ...])	Approximate feature map for "skewed chi-squared" kernel.

sklearn.kernel_ridge: Kernel Ridge Regression


kernel_ridge.KernelRidge([alpha, kernel, ...])	Kernel ridge regression.

sklearn.linear_model: Linear Models

Linear classifiers


linear_model.LogisticRegression([penalty, ...])	Logistic Regression (aka logit, MaxEnt) classifier.
linear_model.LogisticRegressionCV(*[, Cs, ...])	Logistic Regression CV (aka logit, MaxEnt) classifier.
linear_model.PassiveAggressiveClassifier(*)	Passive Aggressive Classifier.
linear_model.Perceptron(*[, penalty, alpha, ...])	Linear perceptron classifier.
linear_model.RidgeClassifier([alpha, ...])	Classifier using Ridge regression.
linear_model.RidgeClassifierCV([alphas, ...])	Ridge classifier with built-in cross-validation.
linear_model.SGDClassifier([loss, penalty, ...])	Linear classifiers (SVM, logistic regression, etc.) with SGD training.
linear_model.SGDOneClassSVM([nu, ...])	Solves linear One-Class SVM using Stochastic Gradient Descent.

Classical linear regressors


linear_model.LinearRegression(*[, ...])	Ordinary least squares Linear Regression.
linear_model.Ridge([alpha, fit_intercept, ...])	Linear least squares with l2 regularization.
linear_model.RidgeCV([alphas, ...])	Ridge regression with built-in cross-validation.
linear_model.SGDRegressor([loss, penalty, ...])	Linear model fitted by minimizing a regularized empirical loss with SGD.

Regressors with variable selection


linear_model.ElasticNet([alpha, l1_ratio, ...])	Linear regression with combined L1 and L2 priors as regularizer.
linear_model.ElasticNetCV(*[, l1_ratio, ...])	Elastic Net model with iterative fitting along a regularization path.
linear_model.Lars(*[, fit_intercept, ...])	Least Angle Regression model a.k.a.
linear_model.LarsCV(*[, fit_intercept, ...])	Cross-validated Least Angle Regression model.
linear_model.Lasso([alpha, fit_intercept, ...])	Linear Model trained with L1 prior as regularizer (aka the Lasso).
linear_model.LassoCV(*[, eps, n_alphas, ...])	Lasso linear model with iterative fitting along a regularization path.
linear_model.LassoLars([alpha, ...])	Lasso model fit with Least Angle Regression a.k.a.
linear_model.LassoLarsCV(*[, fit_intercept, ...])	Cross-validated Lasso, using the LARS algorithm.
linear_model.LassoLarsIC([criterion, ...])	Lasso model fit with Lars using BIC or AIC for model selection.
linear_model.OrthogonalMatchingPursuit(*[, ...])	Orthogonal Matching Pursuit model (OMP).
linear_model.OrthogonalMatchingPursuitCV(*)	Cross-validated Orthogonal Matching Pursuit model (OMP).

Bayesian regressors


linear_model.ARDRegression(*[, max_iter, ...])	Bayesian ARD regression.
linear_model.BayesianRidge(*[, max_iter, ...])	Bayesian ridge regression.

Multi-task linear regressors with variable selection


linear_model.MultiTaskElasticNet([alpha, ...])	Multi-task ElasticNet model trained with L1/L2 mixed-norm as regularizer.
linear_model.MultiTaskElasticNetCV(*[, ...])	Multi-task L1/L2 ElasticNet with built-in cross-validation.
linear_model.MultiTaskLasso([alpha, ...])	Multi-task Lasso model trained with L1/L2 mixed-norm as regularizer.
linear_model.MultiTaskLassoCV(*[, eps, ...])	Multi-task Lasso model trained with L1/L2 mixed-norm as regularizer.

Outlier-robust regressors


linear_model.HuberRegressor(*[, epsilon, ...])	L2-regularized linear regression model that is robust to outliers.
linear_model.QuantileRegressor(*[, ...])	Linear regression model that predicts conditional quantiles.
linear_model.RANSACRegressor([estimator, ...])	RANSAC (RANdom SAmple Consensus) algorithm.
linear_model.TheilSenRegressor(*[, ...])	Theil-Sen Estimator: robust multivariate regression model.

Generalized linear models (GLM) for regression


linear_model.PoissonRegressor(*[, alpha, ...])	Generalized Linear Model with a Poisson distribution.
linear_model.TweedieRegressor(*[, power, ...])	Generalized Linear Model with a Tweedie distribution.
linear_model.GammaRegressor(*[, alpha, ...])	Generalized Linear Model with a Gamma distribution.

Miscellaneous


linear_model.PassiveAggressiveRegressor(*[, ...])	Passive Aggressive Regressor.
linear_model.enet_path(X, y, *[, l1_ratio, ...])	Compute elastic net path with coordinate descent.
linear_model.lars_path(X, y[, Xy, Gram, ...])	Compute Least Angle Regression or Lasso path using the LARS algorithm [1].
linear_model.lars_path_gram(Xy, Gram, *, ...)	The lars_path in the sufficient stats mode [1].
linear_model.lasso_path(X, y, *[, eps, ...])	Compute Lasso path with coordinate descent.
linear_model.orthogonal_mp(X, y, *[, ...])	Orthogonal Matching Pursuit (OMP).
linear_model.orthogonal_mp_gram(Gram, Xy, *)	Gram Orthogonal Matching Pursuit (OMP).
linear_model.ridge_regression(X, y, alpha, *)	Solve the ridge equation by the method of normal equations.

sklearn.manifold: Manifold Learning


manifold.Isomap(*[, n_neighbors, radius, ...])	Isomap Embedding.
manifold.LocallyLinearEmbedding(*[, ...])	Locally Linear Embedding.
manifold.MDS([n_components, metric, n_init, ...])	Multidimensional scaling.
manifold.SpectralEmbedding([n_components, ...])	Spectral embedding for non-linear dimensionality reduction.
manifold.TSNE([n_components, perplexity, ...])	T-distributed Stochastic Neighbor Embedding.
manifold.locally_linear_embedding(X, *, ...)	Perform a Locally Linear Embedding analysis on the data.
manifold.smacof(dissimilarities, *[, ...])	Compute multidimensional scaling using the SMACOF algorithm.
manifold.spectral_embedding(adjacency, *[, ...])	Project the sample on the first eigenvectors of the graph Laplacian.
manifold.trustworthiness(X, X_embedded, *[, ...])	Indicate to what extent the local structure is retained.

sklearn.metrics: Metrics

Model Selection Interface


metrics.check_scoring(estimator[, scoring, ...])	Determine scorer from user options.
metrics.get_scorer(scoring)	Get a scorer from string.
metrics.get_scorer_names()	Get the names of all available scorers.
metrics.make_scorer(score_func, *[, ...])	Make a scorer from a performance metric or loss function.

Classification metrics


metrics.accuracy_score(y_true, y_pred, *[, ...])	Accuracy classification score.
metrics.auc(x, y)	Compute Area Under the Curve (AUC) using the trapezoidal rule.
metrics.average_precision_score(y_true, ...)	Compute average precision (AP) from prediction scores.
metrics.balanced_accuracy_score(y_true, ...)	Compute the balanced accuracy.
metrics.brier_score_loss(y_true, y_prob, *)	Compute the Brier score loss.
metrics.class_likelihood_ratios(y_true, ...)	Compute binary classification positive and negative likelihood ratios.
metrics.classification_report(y_true, y_pred, *)	Build a text report showing the main classification metrics.
metrics.cohen_kappa_score(y1, y2, *[, ...])	Compute Cohen's kappa: a statistic that measures inter-annotator agreement.
metrics.confusion_matrix(y_true, y_pred, *)	Compute confusion matrix to evaluate the accuracy of a classification.
metrics.dcg_score(y_true, y_score, *[, k, ...])	Compute Discounted Cumulative Gain.
metrics.det_curve(y_true, y_score[, ...])	Compute error rates for different probability thresholds.
metrics.f1_score(y_true, y_pred, *[, ...])	Compute the F1 score, also known as balanced F-score or F-measure.
metrics.fbeta_score(y_true, y_pred, *, beta)	Compute the F-beta score.
metrics.hamming_loss(y_true, y_pred, *[, ...])	Compute the average Hamming loss.
metrics.hinge_loss(y_true, pred_decision, *)	Average hinge loss (non-regularized).
metrics.jaccard_score(y_true, y_pred, *[, ...])	Jaccard similarity coefficient score.
metrics.log_loss(y_true, y_pred, *[, eps, ...])	Log loss, aka logistic loss or cross-entropy loss.
metrics.matthews_corrcoef(y_true, y_pred, *)	Compute the Matthews correlation coefficient (MCC).
metrics.multilabel_confusion_matrix(y_true, ...)	Compute a confusion matrix for each class or sample.
metrics.ndcg_score(y_true, y_score, *[, k, ...])	Compute Normalized Discounted Cumulative Gain.
metrics.precision_recall_curve(y_true, ...)	Compute precision-recall pairs for different probability thresholds.
metrics.precision_recall_fscore_support(...)	Compute precision, recall, F-measure and support for each class.
metrics.precision_score(y_true, y_pred, *[, ...])	Compute the precision.
metrics.recall_score(y_true, y_pred, *[, ...])	Compute the recall.
metrics.roc_auc_score(y_true, y_score, *[, ...])	Compute Area Under the Receiver Operating Characteristic Curve (ROC AUC) from prediction scores.
metrics.roc_curve(y_true, y_score, *[, ...])	Compute Receiver operating characteristic (ROC).
metrics.top_k_accuracy_score(y_true, y_score, *)	Top-k Accuracy classification score.
metrics.zero_one_loss(y_true, y_pred, *[, ...])	Zero-one classification loss.

Regression metrics


metrics.explained_variance_score(y_true, ...)	Explained variance regression score function.
metrics.max_error(y_true, y_pred)	The max_error metric calculates the maximum residual error.
metrics.mean_absolute_error(y_true, y_pred, *)	Mean absolute error regression loss.
metrics.mean_squared_error(y_true, y_pred, *)	Mean squared error regression loss.
metrics.mean_squared_log_error(y_true, y_pred, *)	Mean squared logarithmic error regression loss.
metrics.median_absolute_error(y_true, y_pred, *)	Median absolute error regression loss.
metrics.mean_absolute_percentage_error(...)	Mean absolute percentage error (MAPE) regression loss.
metrics.r2_score(y_true, y_pred, *[, ...])	R^2 (coefficient of determination) regression score function.
metrics.mean_poisson_deviance(y_true, y_pred, *)	Mean Poisson deviance regression loss.
metrics.mean_gamma_deviance(y_true, y_pred, *)	Mean Gamma deviance regression loss.
metrics.mean_tweedie_deviance(y_true, y_pred, *)	Mean Tweedie deviance regression loss.
metrics.d2_tweedie_score(y_true, y_pred, *)	D^2 regression score function, fraction of Tweedie deviance explained.
metrics.mean_pinball_loss(y_true, y_pred, *)	Pinball loss for quantile regression.
metrics.d2_pinball_score(y_true, y_pred, *)	D^2 regression score function, fraction of pinball loss explained.
metrics.d2_absolute_error_score(y_true, ...)	D^2 regression score function, fraction of absolute error explained.

Multilabel ranking metrics


metrics.coverage_error(y_true, y_score, *[, ...])	Coverage error measure.
metrics.label_ranking_average_precision_score(...)	Compute ranking-based average precision.
metrics.label_ranking_loss(y_true, y_score, *)	Compute Ranking loss measure.

Clustering metrics


metrics.adjusted_mutual_info_score(...[, ...])	Adjusted Mutual Information between two clusterings.
metrics.adjusted_rand_score(labels_true, ...)	Rand index adjusted for chance.
metrics.calinski_harabasz_score(X, labels)	Compute the Calinski and Harabasz score.
metrics.davies_bouldin_score(X, labels)	Compute the Davies-Bouldin score.
metrics.completeness_score(labels_true, ...)	Compute completeness metric of a cluster labeling given a ground truth.
metrics.cluster.contingency_matrix(...[, ...])	Build a contingency matrix describing the relationship between labels.
metrics.cluster.pair_confusion_matrix(...)	Pair confusion matrix arising from two clusterings [R9ca8fd06d29a-1].
metrics.fowlkes_mallows_score(labels_true, ...)	Measure the similarity of two clusterings of a set of points.
metrics.homogeneity_completeness_v_measure(...)	Compute the homogeneity and completeness and V-Measure scores at once.
metrics.homogeneity_score(labels_true, ...)	Homogeneity metric of a cluster labeling given a ground truth.
metrics.mutual_info_score(labels_true, ...)	Mutual Information between two clusterings.
metrics.normalized_mutual_info_score(...[, ...])	Normalized Mutual Information between two clusterings.
metrics.rand_score(labels_true, labels_pred)	Rand index.
metrics.silhouette_score(X, labels, *[, ...])	Compute the mean Silhouette Coefficient of all samples.
metrics.silhouette_samples(X, labels, *[, ...])	Compute the Silhouette Coefficient for each sample.
metrics.v_measure_score(labels_true, ...[, beta])	V-measure cluster labeling given a ground truth.

Biclustering metrics


metrics.consensus_score(a, b, *[, similarity])	The similarity of two sets of biclusters.

Distance metrics


metrics.DistanceMetric	Uniform interface for fast distance metric functions.

Pairwise metrics


metrics.pairwise.additive_chi2_kernel(X[, Y])	Compute the additive chi-squared kernel between observations in X and Y.
metrics.pairwise.chi2_kernel(X[, Y, gamma])	Compute the exponential chi-squared kernel between X and Y.
metrics.pairwise.cosine_similarity(X[, Y, ...])	Compute cosine similarity between samples in X and Y.
metrics.pairwise.cosine_distances(X[, Y])	Compute cosine distance between samples in X and Y.
metrics.pairwise.distance_metrics()	Valid metrics for pairwise_distances.
metrics.pairwise.euclidean_distances(X[, Y, ...])	Compute the distance matrix between each pair from a vector array X and Y.
metrics.pairwise.haversine_distances(X[, Y])	Compute the Haversine distance between samples in X and Y.
metrics.pairwise.kernel_metrics()	Valid metrics for pairwise_kernels.
metrics.pairwise.laplacian_kernel(X[, Y, gamma])Compute the laplacian kernel between X and Y.
metrics.pairwise.linear_kernel(X[, Y, ...])	Compute the linear kernel between X and Y.
metrics.pairwise.manhattan_distances(X[, Y, ...])	Compute the L1 distances between the vectors in X and Y.
metrics.pairwise.nan_euclidean_distances(X)	Calculate the euclidean distances in the presence of missing values.
metrics.pairwise.pairwise_kernels(X[, Y, ...])	Compute the kernel between arrays X and optional array Y.
metrics.pairwise.polynomial_kernel(X[, Y, ...])	Compute the polynomial kernel between X and Y.
metrics.pairwise.rbf_kernel(X[, Y, gamma])	Compute the rbf (gaussian) kernel between X and Y.
metrics.pairwise.sigmoid_kernel(X[, Y, ...])	Compute the sigmoid kernel between X and Y.
metrics.pairwise.paired_euclidean_distances(X, Y)	Compute the paired euclidean distances between X and Y.
metrics.pairwise.paired_manhattan_distances(X, Y)	Compute the paired L1 distances between X and Y.
metrics.pairwise.paired_cosine_distances(X, Y)	Compute the paired cosine distances between X and Y.
metrics.pairwise.paired_distances(X, Y, *[, ...])	Compute the paired distances between X and Y.
metrics.pairwise_distances(X[, Y, metric, ...])	Compute the distance matrix from a vector array X and optional Y.
metrics.pairwise_distances_argmin(X, Y, *[, ...])	Compute minimum distances between one point and a set of points.
metrics.pairwise_distances_argmin_min(X, Y, *)	Compute minimum distances between one point and a set of points.
metrics.pairwise_distances_chunked(X[, Y, ...])	Generate a distance matrix chunk by chunk with optional reduction.

Plotting


metrics.ConfusionMatrixDisplay(...[, ...])	Confusion Matrix visualization.
metrics.DetCurveDisplay(*, fpr, fnr[, ...])	DET curve visualization.
metrics.PrecisionRecallDisplay(precision, ...)	Precision Recall visualization.
metrics.PredictionErrorDisplay(*, y_true, y_pred)	Visualization of the prediction error of a regression model.
metrics.RocCurveDisplay(*, fpr, tpr[, ...])	ROC Curve visualization.
calibration.CalibrationDisplay(prob_true, ...)	Calibration curve (also known as reliability diagram) visualization.

sklearn.mixture: Gaussian Mixture Models


mixture.BayesianGaussianMixture(*[, ...])	Variational Bayesian estimation of a Gaussian mixture.
mixture.GaussianMixture([n_components, ...])	Gaussian Mixture.

sklearn.model_selection: Model Selection

Splitter Classes


model_selection.GroupKFold([n_splits])	K-fold iterator variant with non-overlapping groups.
model_selection.GroupShuffleSplit([...])	Shuffle-Group(s)-Out cross-validation iterator
model_selection.KFold([n_splits, shuffle, ...])	K-Folds cross-validator
model_selection.LeaveOneGroupOut()	Leave One Group Out cross-validator
model_selection.LeavePGroupsOut(n_groups)	Leave P Group(s) Out cross-validator
model_selection.LeaveOneOut()	Leave-One-Out cross-validator
model_selection.LeavePOut(p)	Leave-P-Out cross-validator
model_selection.PredefinedSplit(test_fold)	Predefined split cross-validator
model_selection.RepeatedKFold(*[, n_splits, ...])	Repeated K-Fold cross validator.
model_selection.RepeatedStratifiedKFold(*[, ...])	Repeated Stratified K-Fold cross validator.
model_selection.ShuffleSplit([n_splits, ...])	Random permutation cross-validator
model_selection.StratifiedKFold([n_splits, ...])	Stratified K-Folds cross-validator.
model_selection.StratifiedShuffleSplit([...])	Stratified ShuffleSplit cross-validator
model_selection.StratifiedGroupKFold([...])	Stratified K-Folds iterator variant with non-overlapping groups.
model_selection.TimeSeriesSplit([n_splits, ...])	Time Series cross-validator

Splitter Functions


model_selection.check_cv([cv, y, classifier])	Input checker utility for building a cross-validator.
model_selection.train_test_split(*arrays[, ...])	Split arrays or matrices into random train and test subsets.

Hyper-parameter optimizers


model_selection.GridSearchCV(estimator, ...)	Exhaustive search over specified parameter values for an estimator.
model_selection.HalvingGridSearchCV(...[, ...])	Search over specified parameter values with successive halving.
model_selection.ParameterGrid(param_grid)	Grid of parameters with a discrete number of values for each.
model_selection.ParameterSampler(...[, ...])	Generator on parameters sampled from given distributions.
model_selection.RandomizedSearchCV(...[, ...])	Randomized search on hyper parameters.
model_selection.HalvingRandomSearchCV(...[, ...])	Randomized search on hyper parameters.

Model validation


model_selection.cross_validate(estimator, X)	Evaluate metric(s) by cross-validation and also record fit/score times.
model_selection.cross_val_predict(estimator, X)	Generate cross-validated estimates for each input data point.
model_selection.cross_val_score(estimator, X)	Evaluate a score by cross-validation.
model_selection.learning_curve(estimator, X, ...)	Learning curve.
model_selection.permutation_test_score(...)	Evaluate the significance of a cross-validated score with permutations.
model_selection.validation_curve(estimator, ...)	Validation curve.

Visualization


model_selection.LearningCurveDisplay(*, ...)	Learning Curve visualization.
model_selection.ValidationCurveDisplay(*, ...)	Validation Curve visualization.

sklearn.multiclass: Multiclass classification


multiclass.OneVsRestClassifier(estimator, *)	One-vs-the-rest (OvR) multiclass strategy.
multiclass.OneVsOneClassifier(estimator, *)	One-vs-one multiclass strategy.
multiclass.OutputCodeClassifier(estimator, *)	(Error-Correcting) Output-Code multiclass strategy.

sklearn.multioutput: Multioutput regression and classification


multioutput.ClassifierChain(base_estimator, *)	A multi-label model that arranges binary classifiers into a chain.
multioutput.MultiOutputRegressor(estimator, *)	Multi target regression.
multioutput.MultiOutputClassifier(estimator, *)	Multi target classification.
multioutput.RegressorChain(base_estimator, *)	A multi-label model that arranges regressions into a chain.

sklearn.naive_bayes: Naive Bayes


naive_bayes.BernoulliNB(*[, alpha, ...])	Naive Bayes classifier for multivariate Bernoulli models.
naive_bayes.CategoricalNB(*[, alpha, ...])	Naive Bayes classifier for categorical features.
naive_bayes.ComplementNB(*[, alpha, ...])	The Complement Naive Bayes classifier described in Rennie et al. (2003).
naive_bayes.GaussianNB(*[, priors, ...])	Gaussian Naive Bayes (GaussianNB).
naive_bayes.MultinomialNB(*[, alpha, ...])	Naive Bayes classifier for multinomial models.

sklearn.neighbors: Nearest Neighbors


neighbors.BallTree(X[, leaf_size, metric])	BallTree for fast generalized N-point problems
neighbors.KDTree(X[, leaf_size, metric])	KDTree for fast generalized N-point problems
neighbors.KernelDensity(*[, bandwidth, ...])	Kernel Density Estimation.
neighbors.KNeighborsClassifier([...])	Classifier implementing the k-nearest neighbors vote.
neighbors.KNeighborsRegressor([n_neighbors, ...])	Regression based on k-nearest neighbors.
neighbors.KNeighborsTransformer(*[, mode, ...])	Transform X into a (weighted) graph of k nearest neighbors.
neighbors.LocalOutlierFactor([n_neighbors, ...])	Unsupervised Outlier Detection using the Local Outlier Factor (LOF).
neighbors.RadiusNeighborsClassifier([...])	Classifier implementing a vote among neighbors within a given radius.
neighbors.RadiusNeighborsRegressor([radius, ...])	Regression based on neighbors within a fixed radius.
neighbors.RadiusNeighborsTransformer(*[, ...])	Transform X into a (weighted) graph of neighbors nearer than a radius.
neighbors.NearestCentroid([metric, ...])	Nearest centroid classifier.
neighbors.NearestNeighbors(*[, n_neighbors, ...])	Unsupervised learner for implementing neighbor searches.
neighbors.NeighborhoodComponentsAnalysis([...])	Neighborhood Components Analysis.
neighbors.kneighbors_graph(X, n_neighbors, *)	Compute the (weighted) graph of k-Neighbors for points in X.
neighbors.radius_neighbors_graph(X, radius, *)	Compute the (weighted) graph of Neighbors for points in X.
neighbors.sort_graph_by_row_values(graph[, ...])	Sort a sparse graph such that each row is stored with increasing values.

sklearn.neural_network: Neural network models


pipeline.FeatureUnion(transformer_list, *[, ...])	Concatenates results of multiple transformer objects.
pipeline.Pipeline(steps, *[, memory, verbose])	Pipeline of transforms with a final estimator.
pipeline.make_pipeline(*steps[, memory, verbose])	Construct a Pipeline from the given estimators.
pipeline.make_union(*transformers[, n_jobs, ...])	Construct a FeatureUnion from the given transformers.

sklearn.pipeline: Pipeline

see here

sklearn.preprocessing: Preprocessing and Normalization


preprocessing.Binarizer(*[, threshold, copy])	Binarize data (set feature values to 0 or 1) according to a threshold.
preprocessing.FunctionTransformer([func, ...])	Constructs a transformer from an arbitrary callable.
preprocessing.KBinsDiscretizer([n_bins, ...])	Bin continuous data into intervals.
preprocessing.KernelCenterer()	Center an arbitrary kernel matrix
preprocessing.LabelBinarizer(*[, neg_label, ...])	Binarize labels in a one-vs-all fashion.
preprocessing.LabelEncoder()	Encode target labels with value between 0 and n_classes-1.v
preprocessing.MultiLabelBinarizer(*[, ...])	Transform between iterable of iterables and a multilabel format.
preprocessing.MaxAbsScaler(*[, copy])	Scale each feature by its maximum absolute value.
preprocessing.MinMaxScaler([feature_range, ...])	Transform features by scaling each feature to a given range.
preprocessing.Normalizer([norm, copy])	Normalize samples individually to unit norm.
preprocessing.OneHotEncoder(*[, categories, ...])	Encode categorical features as a one-hot numeric array.
preprocessing.OrdinalEncoder(*[, ...])	Encode categorical features as an integer array.
preprocessing.PolynomialFeatures([degree, ...])	Generate polynomial and interaction features.
preprocessing.PowerTransformer([method, ...])	Apply a power transform featurewise to make data more Gaussian-like.
preprocessing.QuantileTransformer(*[, ...])	Transform features using quantiles information.
preprocessing.RobustScaler(*[, ...])	Scale features using statistics that are robust to outliers.
preprocessing.SplineTransformer([n_knots, ...])	Generate univariate B-spline bases for features.
preprocessing.StandardScaler(*[, copy, ...])	Standardize features by removing the mean and scaling to unit variance.
preprocessing.TargetEncoder([categories, ...])	Target Encoder for regression and classification targets.
preprocessing.add_dummy_feature(X[, value])	Augment dataset with an additional dummy feature.
preprocessing.binarize(X, *[, threshold, copy])	Boolean thresholding of array-like or scipy.sparse matrix.
preprocessing.label_binarize(y, *, classes)	Binarize labels in a one-vs-all fashion.
preprocessing.maxabs_scale(X, *[, axis, copy])	Scale each feature to the [-1, 1] range without breaking the sparsity.
preprocessing.minmax_scale(X[, ...])	Transform features by scaling each feature to a given range.
preprocessing.normalize(X[, norm, axis, ...])	Scale input vectors individually to unit norm (vector length).
preprocessing.quantile_transform(X, *[, ...])	Transform features using quantiles information.
preprocessing.robust_scale(X, *[, axis, ...])	Standardize a dataset along any axis.
preprocessing.scale(X, *[, axis, with_mean, ...])	Standardize a dataset along any axis.
preprocessing.power_transform(X[, method, ...])	Parametric, monotonic transformation to make data more Gaussian-like.

sklearn.random_projection: Random projection


random_projection.GaussianRandomProjection([...])	Reduce dimensionality through Gaussian random projection.
random_projection.SparseRandomProjection([...])	Reduce dimensionality through sparse random projection.
random_projection.johnson_lindenstrauss_min_dim(...)	Find a 'safe' number of components to randomly project to.

sklearn.semi_supervised: Semi-Supervised Learning


semi_supervised.LabelPropagation([kernel, ...])	Label Propagation classifier.
semi_supervised.LabelSpreading([kernel, ...])	LabelSpreading model for semi-supervised learning.
semi_supervised.SelfTrainingClassifier(...)	Self-training classifier.

sklearn.svm: Support Vector Machines


svm.LinearSVC([penalty, loss, dual, tol, C, ...])	Linear Support Vector Classification.
*svm.LinearSVR([, epsilon, tol, C, loss, ...])**	Linear Support Vector Regression.
svm.NuSVC(*[, nu, kernel, degree, gamma, ...])	Nu-Support Vector Classification.
svm.NuSVR(*[, nu, C, kernel, degree, gamma, ...])	Nu Support Vector Regression.
*svm.OneClassSVM([, kernel, degree, gamma, ...])**	Unsupervised Outlier Detection.
*svm.SVC([, C, kernel, degree, gamma, ...])**	C-Support Vector Classification.
*svm.SVR([, kernel, degree, gamma, coef0, ...])**	Epsilon-Support Vector Regression.
svm.l1_min_c(X, y, *[, loss, fit_intercept, ...])	Return the lowest bound for C.

sklearn.tree: Decision Trees


tree.DecisionTreeClassifier(*[, criterion, ...])	A decision tree classifier.
tree.DecisionTreeRegressor(*[, criterion, ...])	A decision tree regressor.
tree.ExtraTreeClassifier(*[, criterion, ...])	An extremely randomized tree classifier.
tree.ExtraTreeRegressor(*[, criterion, ...])	An extremely randomized tree regressor.
tree.export_graphviz(decision_tree[, ...])	Export a decision tree in DOT format.
tree.export_text(decision_tree, *[, ...])	Build a text report showing the rules of a decision tree.
tree.plot_tree(decision_tree, *[, ...])	Plot a decision tree.

sklearn.utils: Utilities

see here

Files

README.md

Latest commit

History

README.md

File metadata and controls

Overview

The goal

Classes

Functions

From images

From text

Kernels

Plotting

Linear classifiers

Classical linear regressors

Regressors with variable selection

Bayesian regressors

Multi-task linear regressors with variable selection

Outlier-robust regressors

Generalized linear models (GLM) for regression

Miscellaneous

Model Selection Interface

Classification metrics

Regression metrics

Multilabel ranking metrics

Clustering metrics

Biclustering metrics

Distance metrics

Pairwise metrics

Plotting

Splitter Classes

Splitter Functions

Hyper-parameter optimizers

Model validation

Visualization