You are given a data set on cancer detection. You’ve build a classification model and achieved an accuracy of 96%. Why shouldn’t you be happy with your model performance? What can you do about it?

Data Science Interview QuestionsCategory: Data ScienceYou are given a data set on cancer detection. You’ve build a classification model and achieved an accuracy of 96%. Why shouldn’t you be happy with your model performance? What can you do about it?
1 Answers
MockInterview Staff answered 5 years ago

Answer from Analytics Vidhya:
If you have worked on enough data sets, you should deduce that cancer detection results in imbalanced data. In an imbalanced data set, accuracy should not be used as a measure of performance because 96% (as given) might only be predicting majority class correctly, but our class of interest is minority class (4%) which is the people who actually got diagnosed with cancer. Hence, in order to evaluate model performance, we should use Sensitivity (True Positive Rate), Specificity (True Negative Rate), F measure to determine class wise performance of the classifier. If the minority class performance is found to to be poor, we can undertake the following steps:

  1. We can use undersampling, oversampling or SMOTE to make the data balanced.
  2. We can alter the prediction threshold value by doing probability caliberation and finding a optimal threshold using AUC-ROC curve.
  3. We can assign weight to classes such that the minority classes gets larger weight.
  4. We can also use anomaly detection.

Know more: Imbalanced Classification

Your Answer

10 + 13 =