Assessment of the Influence of Dependent Variable Distribution on Selected Goodness of Fit Measures Using the Example of Customer Churn Model

Authors

Abstract

Classification models enable optimal actions to be taken at every stage of the customer's lifecycle. A circumstance affecting both the model building process and the assessment of their discriminatory power is the unbalanced distribution of the dichotomous dependent variable. The article focuses on the question of reliable assessment of the goodness of fit. The first part of the article reviews the measures of predictive power and then assesses the impact of the distribution of the dependent variable on the selected measures of goodness of fit. As a result, the high sensitivity of a number of measures such as lift, accuracy (ACC), or F-Score was observed. The sensitivity of MCC and Kappa Cohen's measurements was also observed. Sensitivity (SENS) and specificity (SPEC), Youden's index and measures based on ROC curves showed no such sensitivity. The conclusions obtained may allow the avoidance of misjudging the predictive power of models built for both learning and business practice.(original abstract)

Downloads

Download data is not yet available.

Downloads

Published

2020-01-30

Issue

Section

Articles