I always forget the definitions of classification scorings. Just to remind myself, I'd like to summarize it once.

First of all, there are 4 groups of classification outcomes — True Positive, True Negative, False Positive, False Negative. Positive/Negative means the predicted value. True/False means that the prediction was right or wrong. They can be depicted by a 2x2 confusion matrix.

Specifically, false outcomes mean...

- False Positive: Negative values are included in positive predictions. High FP rate may mean the threshold is too lose, meaning low precision.
- False Negative: The model is failing to predict some positive values correctly. High FN rate may mean the threshold is too tight, meaining low recall.

## Accuracy

Accuracy = (TP + TN) / (TP + TN + FP + FN)

Accuracy simply means the ratio of correct predictions to all. Accuracy doesn't work well for skewed datasets (i.e., when some classes are much more frequent than others).

## Precision

Precision = TP / (TP + FP)

Precision means the ratio of correct predictions to all positive predictions. High precision relates to the low false positive rate.

## Recall

Recall = TP / (TP + FN)

Recall means the ratio of correct predictions to all actual positive instances. Recall is also known as "Sensitivity". High recall relates to the low false negative rate.

## F1 score

F1 = 2 / (1/Precision + 1/Recall)

F1 score is the harmonic mean of precision and recall, which means it takes both FP and FN into account. F1 score is generally more useful than accuracy, especially when the dataset is skewed.

Here are references used in this post.