Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation automation rate analysis #2

Open
davidggphy opened this issue Aug 31, 2020 · 5 comments
Open

Documentation automation rate analysis #2

davidggphy opened this issue Aug 31, 2020 · 5 comments

Comments

@davidggphy
Copy link

Could you provide some documentation or at least a link about the automation rate analysis plot in classification reports?

Thanks!

@magnusja
Copy link

I might be able to shed a little light on this.

The plot illustrates how many samples can be "automated" at what performance. Automated means, we have a certain threshold at which we consider the model output as confident enough to make a decision (ie. model output > threshold). So lets say you want to have a 99% of accuracy, automation rate describes how many samples you are confident enough to be able to achieve that performance goal (based on some sort of validation/test set). For the other samples you cannot guarantee a this performance and therefore cannot make a decision. This is based that the model output (eg. softmax) can be interpreted as a confidence score of the network.

@marlonjan
Copy link
Contributor

Hi, and sorry for the late reply. Thanks for helping out @magnusja, that's a great explanation.

I'm considering removing the automation rate plot from the plots that are displayed by default. It's fairly non-standard and not as self-explanatory as I would like it to be. For example, an include_experimental_features: bool = False parameter could be added to the compare_classifiers function, and then the automation rate analysis would not be displayed unless someone really wants to see it.

@jmrichardson
Copy link

Hi @marlonjan ,

This include_experimental_features=False doesn't work for with compare_classifiers. Is there another parameter I can use to remove the Automation rate analysis?

@marlonjan
Copy link
Contributor

Hi @jmrichardson,

Thanks for trying out the package and asking the question! The include_experimental_features doesn't exist yet, but you can hide the automation rate analysis by filtering the plots: filter_figures=lambda figure_title: "Automation Rate" not in figure_title.

Here is an example (with the interesting bit at the very end):

import metriculous
import numpy as np


def normalize(array2d: np.ndarray) -> np.ndarray:
    return array2d / array2d.sum(axis=1, keepdims=True)


class_names = ["Cat", "Dog", "Pig"]
num_classes = len(class_names)
num_samples = 500

# Mock ground truth
ground_truth = np.random.choice(range(num_classes), size=num_samples, p=[0.5, 0.4, 0.1])

# Mock model predictions
perfect_model = np.eye(num_classes)[ground_truth]
noisy_model = normalize(
    perfect_model + 2 * np.random.random((num_samples, num_classes))
)
random_model = normalize(np.random.random((num_samples, num_classes)))

metriculous.compare_classifiers(
    ground_truth=ground_truth,
    model_predictions=[perfect_model, noisy_model, random_model],
    model_names=["Perfect Model", "Noisy Model", "Random Model"],
    class_names=class_names,
    one_vs_all_figures=True,
    # Filter out arbitrary figures by their name:
    filter_figures=lambda figure_title: "Automation Rate" not in figure_title,  # <--- YOUR FILTER
    # Sidenote: You can do the same for the metrics displayed in the table:
    filter_quantities=lambda name: "Accuracy" not in name,
).display()

As you've probably noticed the documentation isn't great, so let me also share this list of parameters of the compare_classifiers function, some of which might be useful for further customization of the output:

def compare_classifiers(
    ground_truth: ClassificationGroundTruth,
    model_predictions: Sequence[ClassificationPrediction],
    model_names: Optional[Sequence[str]] = None,
    sample_weights: Optional[Sequence[float]] = None,
    class_names: Optional[Sequence[str]] = None,
    one_vs_all_quantities: bool = True,
    one_vs_all_figures: bool = False,
    top_n_accuracies: Sequence[int] = (),
    filter_quantities: Optional[Callable[[str], bool]] = None,
    filter_figures: Optional[Callable[[str], bool]] = None,
    primary_metric: Optional[str] = None,
    simulated_class_distribution: Optional[Sequence[float]] = None,
    class_label_rotation_x: str = "horizontal",
    class_label_rotation_y: str = "vertical",
) -> Comparison:

    return compare(
        evaluator=ClassificationEvaluator(
            ...
        ),
        ...
    )

Let me know if you have any other questions.

@jmrichardson
Copy link

Hi @marlonjan ,

Works perfectly! Thanks so much for the fast reply. Metriculous html output works perfectly and with datapane as dp.HTML(html) block. Thanks again for making the available :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants