Classification Metrics in Machine Learning

Classification Metrics:

It is the most common type of machine learning problem. There are a myriad of metrics that can be used to evaluate predictions for these problems.

Classification Metrics in Machine Learning

    1. Classification Accuracy
    2. Precision
    3. Area under ROC Curve (Auc-ROC)
    4. Confusion Matrix
    5. F1-Score
    6. Recall

Classification Accuracy:

It is the number of correct predictions made as a ratio of all predictions made. This is most common evaluation metric for classification problems, it is also the most misused. It is really only suitable when there are an equal number of observation in each class and that all predictions and prediction error are equally important, which is often not the case.


It evaluates the accuracy of the positive prediction made by the classifier. In simple terms, precision answers the question: “Of all the instances that the model predicted as positive, how many were actually positive”. Mathematically it is defined as:

Precision = True Positive (TP) / True Positive (TP) + False Positive (FP)

Area under ROC Curve:

AUC-ROC stands for Area Under the Receiver Operating Characteristic Curve. It is a graphical representation of classification model performance at different thresholds. It is a performance metric for binary classification problems. The AUC represent a model’s ability to discriminate between positive and negative classes. An area of 1.0 represents a model that made all predictions perfectly. An area of 0.5 represents a model as good as random.

Confusion Matrix:

The confusion matrix is a handy presentation of the accuracy of a model with two or more classes. The table presents predictions on the x-axis and accuracy outcomes on the y-axis, the cells of the table are the number of predictions made by a machine learning algorithm.


F1 score is the harmonic mean of precision and recall. It provide a single metric that balances the trade-off between precision and recall. It is espically useful when the class distribution is imbalanced. Mathematically, it is given by:

F1 Score = 2 x [(Precision x Recall)/ (Precision + Recall)]


The recall is also known as sensitivity or true positive rate. It is the ratio of the number of true positive predictions to the total number of actual positive instances in the dataset. It measures the ability of a model to identify all relevant instances. Mathematically, recall is defined as:

Recall = True Positive (TP) / True Positive (TP) + False Negative (FN)