信用评分卡模型稳定度指标PSI

由于模型是以特定时期的样本所开发的,此模型是否适用于开发样本之外的族群,必须经过稳定性测试才能得知。稳定度指标(population stability index ,PSI)可衡量测试样本及模型开发样本评分的的分布差异,为最常见的模型稳定度评估指针。其实PSI表示的就是按分数分档后,针对不同样本,或者不同时间的样本,population分布是否有变化,就是看各个分数区间内人数占总人数的占比是否有显著变化。公式如下:

信用评分卡模型稳定度指标PSI_第1张图片

 

PSI实际应用范例:

1)样本外测试

  针对不同的样本测试一下模型稳定度,比如训练集与测试集,也能看出模型的训练情况,我理解是看出模型的方差情况。

2)时间外测试

  测试基准日与建模基准日相隔越远,测试样本的风险特征和建模样本的差异可能就越大,因此PSI值通常较高。至此也可以看出模型建的时间太长了,是不是需要重新用新样本建模了。

 

信用评分卡模型稳定度指标PSI_第2张图片

Population Stability Index (PSI) – Our Banking Case Continues

The following is a representation for the latest quarterly comparison your team has performed against the benchmark sample. Here Actual %’ is the population distribution for the latest quarter and ‘Expected %’ is the population distribution for the validation sample (a.k.a. benchmark sample).

Comparing two populations visually is a good place to start. The current population seems to have shifted towards the right side of the graph. To a small extent, this is expected since scorecards often influence the through-the-door population as the market starts reacting to the approval strategies of the bank. However, the question we need to ask is whether this a major shift in the population? Essentially, you are comparing two different distributions and could use any goodness-of-fit measure such as Chi-square test. However, the population stability index is an industry-accepted metric that presents some convenient rules of thumb for the same. The population stability index (PSI) formula is displayed below (refer to ‘Credit Risk Scorecards’ by Naeem Siddiqui)

PSI=\sum ((Actual\ \%-Expected\ \%)\times (ln(\frac{Actual\ \%}{Expected\ \%}))

Again like the weight of evidence and the information value, PSI seems to have it’s root in information theory. Let’s calculate the population stability index (PSI) for our population (we have already seen a histogram for this above).

Score bands Actual % Expected % Ac-Ex ln(Ac/Ex) Index
< 251 5% 8% -3% -0.47 0.014
251–290 6% 9% -3% -0.41 0.012
291–320 6% 10% -4% -0.51 0.020
321–350 8% 13% -5% -0.49 0.024
351–380 10% 12% -2% -0.18 0.004
381–410 12% 11% 1% 0.09 0.001
411–440 14% 10% 4% 0.34 0.013
441–470 14% 9% 5% 0.44 0.022
471–520 13% 9% 4% 0.37 0.015
520 < 9% 8% 1% 0.12 0.001
  Population Stability Index (PSI)= 0.1269

The last column in the above table is what we care for. Let us consider the score band 251-290 and calculate the index value for this row.

Index=(6\ \% - 9\ \%)\times ln(\frac{6\ \%}{9\ \%})=0.012

The final value for the PSI i.e. 0.13 is the sum of all the values of the last column. Now the question is how to interpret this value? The rule of thumb for the PSI is displayed below

PSI Value Inference Action
Less than 0.1 Insignificant change No action required
0.1 – 0.25 Some minor change Check other scorecard monitoring metrics
Greater than 0.25 Major shift in population Need to delve deeper

The value of 0.13 falls in the second bucket which indicates a minor shift in population from the validation or benchmark sample. These are handy rules to have. However, one must ask, how is this population shift going to make any difference in the scorecard? Actually, it may or may not make any difference. Each score band of a scorecard has an associated bad rate or probability of customers not paying off their loans.  For instance, score band 251-290 in our scorecard has a bad rate of 10% or one customer out of the population of 10 in this score band won’t service his/her loan. The population stability index simply indicates changes in the population of loan applicants. However, this may or may not result in deterioration in performance of the scorecard to predict risk. Nevertheless, the PSI indicates changes in the environment which need to be further investigated through analyzing the change in macroeconomic conditions and overall lending policies of the bank.

参考https://www.cnblogs.com/webRobot/p/9133507.html 

你可能感兴趣的:(机器学习应用)