Confusion Matrix in Data Mining Explained

The Confusion Matrix in data mining is used to explain Type I and a Type II errors from your results. These results are also referred to as false positives and false negatives.

confusion matrix in data mining

A false positive is when something is predicted to occur but does not occur. A false negative is when something is predicted to not occur, but it does occur.

The common notation is:

  • y for the actual values
  • y^ for predicted values

A confusion matrix in data mining can give a quick overview of how the prediction model has performed. It is used to see accuracy in Logistic Regression and K-Nearest Neighbor classification models, for example.

Conusion Matrix Accuracy
A confusion matrix makes it easy to calculate the accuracy and error rates .

accuracy rateIn the example above, the prediction model accurately predicted 35 events  that did not occur. And it accurately predicted 50 events that did not occur. The test set in this example has 100 events. From this, finding the accuracy or error rate is quite simple.

So, don’t let the name confuse you!

Leave a Reply

Your email address will not be published. Required fields are marked *