Breaking Down Classification Evaluation Metrics
Accuracy, Precision, Recall, ROC Curve, True Positive, False Positive, True Negative, and False Negative
The classification problem is a data mining task where the ultimate goal is to accurately predict the categorical response variable. The setup often requires a training data containing a set of attributes and the target and a prediction set for which the algorithm is given a data not seen before. Then, the algorithm analyses the input and generates the prediction output (see Table 1: Training and Prediction Set for Medical Database). “A classification task begins with a data set in which the class assignments are known. For example, a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time. In addition to the historical credit rating, the data might track employment history, home ownership or rental, years of residence, number and type of investments, and so on. Credit rating would be the target, the other attributes would be the predictors, and the data for each customer would constitute a case. Classifications are discrete and do not imply order.” (Kesavaraj & Sukumaran, 2013)
Often, we chop up the dataset into three sets: training, validation, and test sets. There’s no hard rule how to…