Assessment of the Influence of Dependent Variable Distribution on Selected Goodness of Fit Measures Using the Example of Customer Churn Model
Abstract
Classification models enable optimal actions to be taken at every stage of the customer's lifecycle. A circumstance affecting both the model building process and the assessment of their discriminatory power is the unbalanced distribution of the dichotomous dependent variable. The article focuses on the question of reliable assessment of the goodness of fit. The first part of the article reviews the measures of predictive power and then assesses the impact of the distribution of the dependent variable on the selected measures of goodness of fit. As a result, the high sensitivity of a number of measures such as lift, accuracy (ACC), or F-Score was observed. The sensitivity of MCC and Kappa Cohen's measurements was also observed. Sensitivity (SENS) and specificity (SPEC), Youden's index and measures based on ROC curves showed no such sensitivity. The conclusions obtained may allow the avoidance of misjudging the predictive power of models built for both learning and business practice.(original abstract)Downloads
Download data is not yet available.
Downloads
Published
2020-01-30
Issue
Section
Articles
License
Copyright (c) 2020 Grzegorz Migut
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.