Classification Metrics in Machine Learning
Classification problems are perhaps the most common type of machine learning problem. There are a myriad of metrics that can be used to evaluate predictions for these problems.
1. Classification Accuracy
2. Logarithmic Loss
3. Area under ROC Curve
4. Confusion Matrix
5. Classification Report
Classification accuracy is the number of correct predictions made as a ratio of all predictions made. This is most common evaluation metric for classification problems, it is also the most misused. It is really only suitable when there are an equal number of observation in each class and that all predictions and prediction error are equally important, which is often not the case.
Logarithmic Loss is a performance metric for evaluating the predictions of probabilities of membership to a given class:
The scalar probability between 0 and 1 can be seen as a measure of confidence for a prediction by an algorithm. Predictions that are correct or incorrect are rewarded or punished proportionality to the confidence of the prediction.
Area under ROC Curve:
Area under ROC Curve is a performance metric for binary classification problems. The AUC represent a model’s ability to discriminate between positive and negative classes. An area of 1.0 represents a model that made all predictions perfectly. An area of 0.5 represents a model as good as random.
The confusion matrix is a handy presentation of the accuracy of a model with two or more classes. The table presents predictions on the x-axis and accuracy outcomes on the y-axis, the cells of the table are the number of predictions made by a machine learning algorithm.
Scikit-learn does provide a convenience report when working on classification problems to give you a quick idea of the accuracy of model using a number of measures. The classification_report() function displays the precision, recall, f1-score and support for each class.