A confusion matrix is a simple way to visually present the accuracy of a classification algorithm. A confusion matrix can only be constructed for values which are already known so this analysis uses your labelled evaluation set rather than your unlabelled test set1.

A lot of the explanations of how these work use the smallest possible matrix (that of a binary classifier) but that makes larger grids harder to comprehend, therefore it’s worth looking at the theory before working through simple examples.

\(A_{pred}\) \(B_{pred}\) \(C_{pred}\)
\(A_{act}\) \(T_p\) \(E_{AB}\) \(E_{AC}\)
\(B_{act}\) \(E_{BA}\) \(T_p\) \(E_{BC}\)
\(C_{act}\) \(E_{CA}\) \(E_{CB}\) \(T_p\)

This is a confusion matrix for a fairly simple three class problem. It allows you to Ascertain some basic facts about how well your classifier is doing.

How to read this:

  • each row tells you the total number of points which should be in that class
  • each column tells you the total number of points your classifier assigned to the classic

From this you can work out:

  • The true positives ($T_p$): those points which were classified as being in a class and actually were. These are the values on the diagonal, highlighted in green.
  • The false positives: those points which were classified as being in a class even though they were actually in a different one. This is the sum of all the values in the classes column except those which were $T_p$. For class B this would be $E_{AB} + $E_{CB}$
  • The true negatives: those points which weren’t in class and weren’t classified as being part of it. This is the sum of all the values not in the classes column or row. So for class C this is the the pink area in the table below.
  • The false negatives: all the values ascribed to a different class than the one they were actually in. This is the sum of all the error values in a classes row so for class A the false negative would be $E_{AB} + E_{AC}$.
\(A_{pred}\) \(B_{pred}\) \(C_{pred}\)
\(A_{act}\) \(T_p\) \(E_{AB}\) \(E_{AC}\)
\(B_{act}\) \(E_{BA}\) \(T_p\) \(E_{BC}\)
\(C_{act}\) \(E_{CA}\) \(E_{CB}\) \(T_p\)

Worked example: binary classifier

For a binary classifier tested on $n=175$ data points:

\(A_{pred}\) \(B_{pred}\)
\(A_{act}\) 100 5
\(B_{act}\) 10 60
  • True positives:
    • A = 100
    • B = 60
  • False positives:
    • A = 10
    • B = 5
  • True negatives:
    • A = 60
    • B = 100
  • False negatives:
    • A = 5
    • B = 10

Resources:

  1. How best to divide up your data is covered in Model Selection and Evaluation