weixin_34033624

信用评分卡Credit Scorecards （1-7）

欢迎关注博主主页，学习python视频资源，还有大量免费python经典文章

python风控评分卡建模和风控常识

https://study.163.com/course/introduction.htm?courseId=1005214003&utm_campaign=commission&utm_source=cp-400000000398149&utm_medium=share

Toby，项目合作QQ：231469242

Credit Scorecards – Introduction (part 1 of 7)

http://ucanalytics.com/blogs/credit-scorecards-part-1/

Credit Scorecards in the Age of Credit Crisis

This incident took place at a friend’s party circa 2009, in the backdrop of the worst financial crisis the planet has seen for a long time. The average Joe on the street was aware of terms such as mortgaged-backed securities (MBS), sub-prime lending and credit crisis – the reasons for his plight. Back to our party, I met an informed & compassionate elderly woman and after a few minutes of chitchat, the topic came to what I do for a living. At that point, I was working on a project of developing credit-scorecard for a leading mortgage lender in Mumbai. As I started explaining the details of my job, her expression changed from curious to angst and pain. Eventually, she interrupted and said – why would you do such a thing? Is this not the reason for all the mess? I was used to this reaction and had to correct her misconception.

信用危机时代的信用记分卡
这一事件发生在大约2009年的朋友聚会上，在这个星球长期以来最严重的金融危机背景下。街上的乔普通知道抵押贷款支持证券（MBS），次级贷款和信贷危机等条款 - 这是他困境的原因。回到我们的聚会上，我遇到了一位知情和富有同情心的老年妇女，经过几分钟的闲聊，这个主题来到了我的生活。那时，我正在为孟买一家领先的抵押贷款机构开发一个信用记分卡项目。当我开始解释我的工作细节时，她的表情从好奇变为焦虑和痛苦。最后，她打断了她说 - 你为什么要做这样的事？这不是所有混乱的原因吗？我习惯了这种反应，不得不纠正她的误解。

Predictive Analytics: The lurking Danger – by Roopam

Credit or application scorecards can be excellent tools for both lender and borrower to work out debt serving capability of the borrower. For lenders, scorecards can help them assess the creditworthiness of the borrower and maintain a healthy portfolio – which will eventually influence the economy as a whole. Additionally to the borrower, they can provide valuable information such as 45% of people with her socio-economic background have struggled to keep up with the EMI commitment. This could help the borrower make a well-informed decision before getting into a debt trap. Blaming science for reckless human behavior is not new. I believe, any rigorous science with practical applications is like a sharp German blade, a master chef prepares delicious meals with it and the irresponsible leaves a deep and painful cut.

信用卡或应用程序记分卡可以成为贷款人和借款人计算借款人偿债能力的绝佳工具。对于贷方而言，记分卡可以帮助他们评估借款人的信誉并维持健康的投资组合 - 这最终将影响整个经济。除借款人外，他们还可以提供有价值的信息，例如45％具有社会经济背景的人都在努力跟上EMI的承诺。这可以帮助借款人在陷入债务陷阱之前做出明智的决定。为鲁莽的人类行为指责科学并不新鲜。我相信，任何具有实际应用的严谨科学就像一把锋利的德国刀片，一位大厨用它准备可口的饭菜，而不负责任的会留下深刻而痛苦的切口。

Scorecards and Predictive Analytics

In the following series, we will explore the practitioners’ approach for developing and maintaining a scorecard. At a very high-level, credit scorecards have their roots in the classification problem in statistics & data mining. The classification problems present an extremely broad methodology/thought-process that has multiple business applications. A few applications for classification problem are:

• Application or credit scorecards to assess repayment risk of the borrower

• Image analytics of MRI to identify if the cancer is benevolent or malignant
• Behavioral models to identify the most probable future action of the customer

• Identification of potential drug targets in the protein structure
• Fraud detection models

• Sentiment analysis of Tweets and Facebook posts
• Cross/up sell propensity models
• Campaign response models
• Insurance ratings

在下面的系列中，我们将探讨从业者开发和维护记分卡的方法。在非常高的层次上，信用记分卡的根源在于统计和数据挖掘中的分类问题。分类问题提供了一个极其广泛的方法/思维过程，具有多个业务应用程序。一些分类问题的应用是：

•应用程序或信用记分卡，用于评估借款人的还款风险
•MRI的图像分析，以确定癌症是仁慈的还是恶性的
•行为模型，用于识别客户最可能的未来行为
•鉴定蛋白质结构中的潜在药物靶标
•欺诈检测模型
•推文和Facebook帖子的情绪分析
•交叉/向上销售倾向模型
•活动响应模型
•保险评级

For that matter, there are subtle links between credit scorecards and other models mentioned above. The details of these models could be drastically different but the underlining idea for these models is linked to the classification problem. In this series, I shall focus on credit or application scorecard methodology but will try to bring in other another scorecards and models whenever possible.

就此而言，信用记分卡与上述其他模型之间存在微妙的联系。这些模型的细节可能截然不同，但这些模型的强调理念与分类问题有关。在本系列中，我将重点介绍信用卡或应用记分卡方法，但会尝试尽可能引入其他记分卡和模型。

Credit Scoring: Development Stages of Credit Scorecard – by Roopam

Flow of Subsequent Articles

The flow of subsequent articles in the series will be as following

1. Classification problem and sampling
2. Variable selection and coarse classing
3. Predictive Models
4. Logistic regression and scorecards
5. Model validation
6. Application and business process integration

后续文章的流程
该系列中后续文章的流程如下

1.分类问题和抽样
2.变量选择和粗略分类
3.预测模型
4.逻辑回归和记分卡
5.模型验证
6.应用程序和业务流程集成

Books for Credit Scorecards

I have compiled a list of books you may find useful while learning about analytical scorecards. The first four of these books have more or less the same flow, with Anderson’s book (#4) a little more detailed. However, you could choose any one of these four books without losing much .The last book (#5) is a collection of articles / papers by practitioners and academicians and is quite interesting.

信用记分卡的书籍
在编写分析记分卡时，我编制了一份您可能会发现有用的书籍清单。这些书中的前四本或多或少都有相同的流程，而安德森的书（＃4）更为详细。但是，您可以选择这四本书中的任何一本，而不会损失太多。最后一本书（＃5）是一组由从业者和学者组成的文章/论文，非常有趣。

1. Credit Risk Scorecards: Developing and Implementing Intelligent Credit Scoring – Naeem Siddiqi
2. Credit Scoring, Response Modeling, and Insurance Rating: A Practical Guide to Forecasting Consumer Behavior – Steven Finlay
3. Credit Scoring for Risk Managers: The Handbook for Lenders – Elizabeth Mays and Niall Lynas
4. The Credit Scoring Toolkit: Theory and Practice for Retail Credit Risk Management and Decision Automation – Raymond Anderson
5. Credit Risk Models – Elizabeth Mays

Sign-off Note

Look forward to sharing my views on predictive analytics and hearing back from you. See you soon with the second part of this series.

Credit Scorecards – Classification Problem (part 2 of 7)

http://ucanalytics.com/blogs/credit-scorecards-classification-problem-part-2/

Classification Problem in Statistics & Data Mining

I must say I was shocked when Amishi, a girl little over three years old, announced that going forward she is only friends with my wife and not me. Her reason for the breakup was that I am a boy and girls can only be friends with girls. She has learned this social norm from her friends at the preschool. I still remember the way she modeled for me in her swimsuit and umbrella just a few months ago. She was aware of the boy-girl difference even then, it is just she has learned this weird social norm now. The point over here is that toddlers can distinguish genders without much effort. Nature has given us a built-in equation to classify gender through a mere glance with a high degree of precision. Imagine a similar mechanism to distinguish between good and bad borrowers. You are talking about every banker’s dream. However, evolution has trained us to mate not to lend.

我必须说，当三十岁的女孩Amishi宣布前进时，她只是与我的妻子而不是我的朋友，我感到震惊。分手的原因是我是男孩，女孩只能是女孩的朋友。她从幼儿园的朋友那里学到了这种社会规范。几个月前，我还记得她在泳衣和雨伞中为我塑造的方式。即便如此，她也意识到了男女之间的差异，现在只是她已经学会了这种奇怪的社会规范。这里的重点是，幼儿可以毫不费力地区分性别。大自然给了我们一个内置的方程式，通过高度精确的一瞥来对性别进行分类。想象一下类似的机制来区分好的和坏的借款人。你在谈论每个银行家的梦想。然而，进化训练我们交配不放贷。

Predictive Analytics: Classification Problem – by Roopam

As I have mentioned in the previous article, scorecards have their roots in the classification problem in statistics and data mining. The idea with most classification problems is to create a mathematical equation to distinguish dichotomous variables. These variables can only take two values such as

• Male/ Female
• Good / Bad
• Yes / No
• God / Devil
• Happy / Sad
• Sales / No Sales

The list can go on until eternity. The reason why most business problems try to model dichotomies is that it is easy to comprehend for us humans. We must appreciate that dichotomies are never absolute and have degrees attached to them. For example, I am 80% good and 20% bad – at least I would like to believe this. I shall keep Pareto’s 80-20 principle away from this i.e. my 20% bad is responsible for my 80% of behavior.

正如我在上一篇文章中提到的，记分卡的根源在于统计和数据挖掘中的分类问题。大多数分类问题的想法是创建一个数学方程来区分二分变量。这些变量只能采用两个值，例如

•男/女
• 好坏
•是/否
•上帝/魔鬼
•快乐/悲伤
•销售/无销售

这份清单可以持续到永恒。大多数商业问题试图模拟二分法的原因是它很容易理解我们人类。我们必须明白，二分法从来都不是绝对的，是有度的。例如，我80％好，20％坏 - 至少我想相信这一点。我将保持帕累托的80-20原则远离这一点，即我的20％不好对我80％的行为负责。

Credit Scorecards Development – Problem Statement & Sampling（坏客户定义是灵活的）

In the case of credit scorecards, the problem statement is to distinguish analytically between the good and bad borrowers. Hence, the first task is to define a good and a bad borrower. For most loan products, good and bad credit is defined in the following way

1. Good loan: never or once missed on the EMI payment
2. Bad loan: ever missed 3 consecutive EMIs in a row (i.e. 90 days-past-due)

Additionally, for tagging someone good or bad, you need to observe his or her behavior for a significant length of time. This length of time varies from product to product based on the tenor of the loan. For home loans, with a tenor of 20 years, 2-3 years is a reasonable observation period.
However, there is nothing sacrosanct about the above definition and can be modified at the discretion of the analyst. Roll-rate analysis and vintage analysis are the two analytical tools you may want to consider while constructing the above definition.

信用记分卡开发 - 问题陈述和抽样
在信用记分卡的情况下，问题陈述是在好的和坏的借款人之间进行分析。因此，第一个任务是定义一个好的和坏的借款人。对于大多数贷款产品，信用良好和不良以下列方式定义

1.良好的贷款：永远或曾一次逾期
2.不良贷款：连续3次错过EMI（即90天过期）

此外，为了标记好人或坏人，你需要在很长一段时间内观察他或她的行为。根据贷款期限，这段时间因产品而异。对于房屋贷款，期限为20年，2 - 3年是合理的观察期。
但是，对于上述定义没有什么神圣不可侵犯的，可以由分析师自行决定修改。滚动率分析和复古分析是您在构建上述定义时可能需要考虑的两种分析工具。

Sampling Strategy for Credit Scorecards

A few years ago, I did a daylong workshop on Statistical Inference for a large German shipping & cargo company in Mumbai. At the time of Q&A session the Vice President of operations asked a tricky question, what is a good sample size to achieve good precision? He was looking for a one-size-fits-all answer and I wish it were that simple. The sample size depends on the degree of similarity or homogeneity of the population in question. For example, what do you think is a good sample size to answer the following two questions?

1. What is the salinity of the Pacific Ocean?
2. Is there another planet with intelligent life in the Universe?

In terms of population size, a number of drops in the ocean and planets in the Universe is similar. A couple of drops of water are enough to answer the first question since the salinity of oceans is fairly constant. On the other hand, the second question is a black swan problem. You may need to visit every single planet to rule our possibility of an intelligent form of life.

For credit scorecard development, the accepted rule of thumb for sample size is at least 1000 records of both good and bad loans. There is no reason why you cannot build a scorecard with a smaller sample size (say 500 records). However, the analyst needs to be cautious in doing so because a higher degree of randomness creeps in a small data sample. Additionally, it is also advisable to keep the sample window as short as possible i.e. a financial quarter or two while scorecard development. Further, the sample is divided into two pieces – usually, 70 % for development and remaining for validation sample. We discuss the development and validation sample in detail in the subsequent sections of this series.

信用记分卡的抽样策略
几年前，我为孟买的一家大型德国航运和货运公司举办了为期一天的统计推断研讨会。在问答环节时，运营副总裁提出了一个棘手的问题，即获得良好精度的样本量是多少？他正在寻找一个通用的答案，我希望它很简单。样本量取决于所讨论的群体的相似程度或同质性。例如，您认为回答以下两个问题的样本量是多少？

1.太平洋的盐度是多少？
2.宇宙中还有另一个拥有智慧生命的星球吗？

就人口规模而言，宇宙中海洋和行星的数量下降是相似的。由于海洋的盐度相当稳定，几滴水足以回答第一个问题。另一方面，第二个问题是黑天鹅问题。您可能需要访问每个星球来统治我们生活的智能生活的可能性。

对于信用记分卡开发，样本大小的公认经验法则是至少1000个好的和坏的贷款记录。没有理由不能建立样本量较小的记分卡（比如500条记录）。但是，分析师需要谨慎行事，因为较小程度的随机性会在小数据样本中蔓延。此外，还建议尽可能缩短样本窗口，即在记分卡开发时用一个或两个季度数据。此外，样品分为两部分 - 通常70％用于显影，剩余用于验证样品。我们将在本系列的后续章节中详细讨论开发和验证示例。

Credit Scorecard Development: Sampling Strategy – by Roopam

Sign-off Note

In the next article, we will discuss an important topic of variables classing and coarse classing for credit scorecards. See you soon

Credit Scorecards – Variables Selection (part 3 of 7)

http://ucanalytics.com/blogs/credit-scorecards-variables-selection-part-3/

Variables Selection in Predictive Analytics

Predictive Analytics: Variables Selection – by Roopam

The following story goes back to the time when I just started my transition from physics to business. I met this investment banker* in his mid-thirties during a Friday night party. After gulping down a few pints of beer, his mood became a bit somber and he told me how he hates his job. However, he had a plan of working his ass off until he retires at 45. Then he will do everything that makes him happy. I was thoroughly confused, how could someone debar himself from an emotion – happiness – for so many years and rediscover it later? I was wondering about the recipe for happiness – raindrops on roses and whiskers on kittens. An individual’s happiness is a tricky thing; however, I shall attempt to tackle this issue in my later article on logistic regression. For now, let us try to explore how states measure the collective well-being of their people. I shall use this topic of population well-being to explore an interesting topic in analytical scorecard development: variables selection.

以下故事可以追溯到我刚开始从物理到商业的过渡时期。我在周五晚上的聚会期间遇到了这位投资银行家*。在喝了几品脱啤酒之后，他的心情变得有些忧郁，他告诉我他是如何讨厌自己的工作的。然而，他有一个计划工作他的屁股，直到他在45退休。然后他会做一切让他开心的事情。我彻底搞糊涂了，这么多年以后，有多少人会从情感 - 快乐中贬低自己，并在以后重新发现它？我想知道快乐的秘诀 - 玫瑰上的雨滴和小猫的胡须。个人的幸福是一件棘手的事情; 但是，我将在后面关于逻辑回归的文章中尝试解决这个问题。现在，让我们试着探讨各国如何衡量其人民的集体福祉。我将利用这个人口福祉主题来探索分析记分卡开发中的一个有趣话题：

Variables Selection – Lessons from GDP & GNH

The most popular measure for national prosperity, unanimously projected by economists and TV channels, is Gross Domestic Product (GDP). The equation for measuring GDP as taught in macroeconomics 101 is:

Clearly, there are 5 factors/variables that govern GDP according to this equation. The first look at GDP as a measure for national well-being seemed incomplete to me. All the variables for GDP were from commerce. They are important but cannot be the only factors for country’s well-being, more so in a highly diverse & complicated country like India.

ariables Selection - 来自GDP和GNH的经验教训
经济学家和电视频道一致预测的最受国民兴趣的衡量标准是国内生产总值（GDP）。宏观经济学101中教授的衡量GDP的等式是：

GDP方程式

显然，根据这个等式，有5个因素/变量可以控制GDP。首先将国内生产总值视为衡量国家福祉的指标对我来说似乎不完整。 GDP的所有变量都来自商业。它们很重要，但不能成为国家福祉的唯一因素，在印度等高度多样化和复杂的国家更是如此。

Gross National Happiness Index – The Story of Bhutan Naresh

Variables Selection – by Roopam

Ok, so what else do we have? A lesser-known index is Gross National Happiness (GNH). The origins of GNH are in Bhutan. They measure their country’s progress through GNH. The term was coined and implemented by Jigme Singye Wangchuck. This name immediately takes me back to the early nineties live telecast of the SAARC summit by India’s national broadcaster Doordarshan (DD). The old-timer Hindi commentators were referring to a modest man in a bathrobe-like-attire as ‘Bhutan Naresh’ – King of Bhutan. At first glance, he did not fit well with the power horses of the south Asian region. Nevertheless, he seems to have devised a more holistic metric to measure his country’s well-being. GNH is a combination of the following broad categories:

1. Living standard & income
2. Health coverage
3. Physiological well-being
4. Time spent at work and relaxing
5. Good governance
6. Schooling & education
7. Cultural diversity
8. Community vitality
9. Environmentalism and conservatism

There are 72 total variables in GNH measured on a scale of 0 to 1, such as daily hours of sleep and trust in media; hmmm, not a bad start! You could do your own research on GNH and let me know what you feel about it. Actually, we can work out our own formula for a GNH like metric. The idea is to select the right variables to build your model!

国民幸福总指数 - 不丹纳雷什的故事

变量选择 - 由Roopam

好的，那我们还有什么呢？一个鲜为人知的指数是国民幸福总值（GNH）。 GNH的起源在不丹。他们通过GNH衡量他们国家的进步。该术语由Jigme Singye Wangchuck创造和实施。这个名字让我回到了印度国家广播公司Doordarshan（DD）在九十年代早期的SAARC峰会现场直播。旧时的印地语评论员指的是一个穿着浴衣般装扮的谦虚男人，就像不丹之王“不丹纳雷什”。乍一看，他并不适合南亚地区的动力马。然而，他似乎已经设计了一个更全面的衡量标准来衡量他的国家的福祉。 GNH是以下大类的组合：

1.生活水平和收入
2.健康保险
3.生理健康
4.工作和放松的时间
5.善治
6.学校教育
7.文化多样性
8.社区活力
9.环境保护主义和保守主义

GNH中有72个总变量，按0到1的等级测量，例如每天的睡眠时间和对媒体的信任;嗯，这不是一个糟糕的开始！你可以自己研究GNH，让我知道你对它的看法。实际上，我们可以为GNH度量标准制定出我们自己的公式。我们的想法是选择正确的变量来构建您的模型！

Variables Selection in Credit Scoring

In data mining and statistical model building exercises, similar to credit scoring, variables selection process is performed through statistical significance – a reasonably automated process through advanced software. However, the variables are still created and measured by humans. High impact analyses in businesses are still driven by hunches. Human intelligence is not obsolete yet.

In one of the projects I did with a financial organization, the result of credit risk analysis and scoring led to redesigning of the application form. Application forms are a major source of data collection regarding the borrower. However, nobody wants to fill a lengthy form hence an optimal size of the form ensures accurate information provided by the borrower. The idea is to select the right variable and ensure accurate measurement.

There are several aspects regarding variables but I will mention just one of them here (coarse classing).

信用评分中的变量选择
在数据挖掘和统计模型构建练习中，类似于信用评分，变量选择过程通过统计显着性来执行 - 通过高级软件进行合理自动化的过程。但是，变量仍由人类创造和测量。企业的高影响力分析仍然受到预感的驱动。人类智慧尚未过时。

在我与金融机构合作的一个项目中，信用风险分析和评分的结果导致了申请表的重新设计。申请表是有关借款人的主要数据收集来源。然而，没有人想要填写冗长的表格，因此表格的最佳尺寸确保了借款人提供的准确信息。我们的想法是选择正确的变量并确保准确的测量。

关于变量有几个方面，但我在这里只提到其中一个（粗略分类）。

Coarse Classing in Credit Scoring

One of my favorite activities as a kid was going to a shoe store and getting my feet measured every summer before the school started. The shoe shops had a strange, miniature, slide-like device to measure foot size. It was fun to see my feet grow from one size to another every year or two. The growth was quantized i.e you are size-2 or 3 never 2.5 or 2.7. This aspect of converting measure such as 2.5 & 2.7 to 3 is called grouping, bucketing or classing. This is an integral part of creating scorecards that you will find in all the books I have listed in the first part of this blog series.

I have been a part of several heated discussions on the relevance of coarse class in scorecard development throughout my career. In most, if not all academic articles you will rarely see coarse classing as a technique during model development. Quite a few academicians & practitioners for a good reason believe that coarse classing results in loss of information. However, in my opinion, coarse classing has the following advantage over using raw measurement for a variable.

1. It reduces random noise that exists in raw variables – similar to averaging and yes, you lose some information here.
2. It handles extreme events – on two extremes of a variable – much better where you have thin data.
3. It handles the non-linear relationship between dependent and independent variable without a lot of effort of variable transformation from the analyst.

信用评分中的粗分类
3鞋子测量我小时候最喜欢的一项活动是去一家鞋店，每年夏天在学校开始前测量我的脚。这些鞋店有一个奇怪的，微型的滑动式设备来测量脚的大小。每年或每两年看到我的脚从一个尺寸增长到另一个尺寸很有趣。增量被量化，即你的大小为2或3从不2.5或2.7。将诸如2.5和2.7之类的度量转换为3的这一方面称为分组，分组或分类。这是创建记分卡的一个组成部分，您可以在本博客系列的第一部分列出的所有书籍中找到这些记分卡。

在我的职业生涯中，我参与了几个关于粗俗课程在记分卡开发中的相关性的热烈讨论。在大多数情况下，如果不是所有的学术文章，你很少会在模型开发过程中看到粗略的分类。相当多的学者和从业者有充分理由相信粗略的分类会导致信息丢失。但是，在我看来，粗略分类比使用变量的原始测量具有以下优势。

1.它减少了原始变量中存在的随机噪声 - 类似于平均值，是的，你在这里丢失了一些信息。
它处理极端事件 - 在变量的两个极端情况下 - 在您拥有精简数据的情况下更好。
3.它处理依赖变量和自变量之间的非线性关系，而无需分析师进行变量转换。

Sign-off Note

We are half way through this series on ‘Analytical Scorecard Development’ and I am enjoying writing this thoroughly. I hope as a reader you are on the same page. Scorecard building is highly technical and I have tried to discuss some aspects with easy to understand examples. However, to manage the length of the article, I am not able to get into the details. I must say that I love the details! So, if you have any queries, doubts, points-of-view or recommendations please write back on the discussion board or on my email: [email protected]

Credit Scorecards – Advanced Analytics (part 4 of 7)

http://ucanalytics.com/blogs/credit-scorecards-advanced-analytics-part-4/

Modeling in Advanced Analytics

Advanced Analytics: Model Development – by Roopam

The room, full of Analysts, erupts with a loud round of laughter when a young business analyst narrates to us an incident from his recent trip back home. A distant aunt inquired about his new profession. His response – I am into modeling. She got all excited and asked – is it just on the ramp or will I see you on the television? Jokes apart, this left me wondering about the roots of the word modeling or model. What is a model?

A model is defined as a simplified representation of reality. A representation of reality, hmmm, a photograph is a representation of reality – a moment of reality capture on the reel – does that makes it into a model. I think yes. Similarly, a newspaper reporter covering an incident and makes it into breaking news is also a model – a descriptive model. Now, let us try to link models with Analytics.

当一位年轻的商业分析师向我们讲述他最近回家的事件时，充满分析师的房间爆发出一阵响亮的笑声。一位遥远的阿姨询问了他的新职业。他的回答 - 我正在进行建模。她兴高采烈地问道 - 它只是在坡道上还是我会在电视上看到你？开玩笑，这让我想知道建模或模型这个词的根源。什么是模特？

模型被定义为现实的简化表示。现实的表现，嗯，照片是现实的代表 - 在卷轴上捕捉现实的瞬间 - 这使它成为一个模型。我想是的。同样，报道一个事件并将其作为突发新闻的报纸记者也是一个模型 - 描述性模型。现在，让我们尝试将模型与Google Analytics相关联。

Data warehouse, Business Intelligence and Advanced Analytics

Analytics has received a massive boost because of the emergence of information technology. We are living in the era of big data. A plethora of data collected at every stage of the business process had created a need to extract knowledge out of the information. This overall process has three aspects to it

1. Data warehouse or data marts: transactional data is extracted-transformed and loaded (ETL) into a data model / schema for the purpose of analysis
2. Business Intelligence or dashboards: “as is” business reports
3. Predictive Analytics or Advanced Analytics: high-end statistical and data mining exercise

As the quantum of data is exponentially increasing, Hadoop and big data technologies are replacing the data warehouses. However, the thought process for business intelligence and predictive analytics – the focus of this article – will not change much. Let me try to distinguish between business intelligence and predictive Analytics using something I learned at a professional theater.

1.数据仓库或数据集市：事务数据被提取 - 转换和加载（ETL）到数据模型/模式中以进行分析
2.商业智能或仪表板：“按原样”业务报告
3.预测分析或高级分析：高端统计和数据挖掘练习

随着数据量的呈指数增长，Hadoop和大数据技术正在取代数据仓库。但是，商业智能和预测分析的思维过程 - 本文的重点 - 不会发生太大变化。让我尝试使用我在专业剧院学到的东西来区分商业智能和预测分析

5Ws for business intelligence & predictive Analytics – Lessons from Theater

5 Ws for Data Warehouse, Business Intelligence, and Advanced Analytics – by Roopam

I joined a professional theater group a few years ago. To understand the nuances of acting we started with improv or improvisation theater. This form of theater does not have a predefined script but the actors built the story while performing. Most people thought I was a good improv actor. However, the style of remembering dialogue while performing did not work very well for me and hence it was the end of my theater gig. However, I learn some good lessons from the whole experience. One of them was the five-Ws of deciphering a character to build the drama.

1. What had happened?
2. When did it happen?
3. Where did it happen?
4. Who was part of this?
5. Why did it happen?

Clearly, the first four questions are trying to report an as-is version of the reality – a descriptive model. This is exactly what the business intelligence professionals try to achieve through the fancy reporting platforms & software. The fifth question is the trickiest of the lot. The question that keeps scientists and inquisitive minds awake late at night.

几年前我加入了一个专业剧团。为了理解表演的细微差别，我们从即兴剧或即兴剧开始。这种形式的剧院没有预定义的剧本，但演员在表演时建立了故事。大多数人都认为我是一个很好的即兴演员。然而，在表演时记住对话的风格对我来说并不是很好，因此它是我戏剧演出的结束。但是，我从整个经历中学到了一些好的教训。其中一个是解读一个角色来制作戏剧的五个W.

1.发生了什么事？
2.什么时候发生的？
3它发生在哪里？
4谁是这个的一部分？
5.为什么会这样？

显然，前四个问题试图报告现实的现实版本 - 描述性模型。这正是商业智能专业人员试图通过花哨的报告平台和软件实现的目标。第五个问题是最棘手的问题。让科学家和好奇的头脑在深夜醒来的问题。

Newton’s Legacy

An apple falls from a tree. How difficult is it to answer the first four questions? Most of us can answer them with a help of a clock and a map. However, Isaac Newton answered the fifth question and his answer – Gravity. If he had stopped there, nobody would have remembered him after close to four hundred years since his birth. He gave a mathematical model to explain this phenomenon.

Replace apple and earth with any other objects and you have the general equation for the model. Albert Einstein did shatter the Newtonian notion of Gravity. However, this model still holds good for all problems of practical purposes and used extensively in rocket science.

Advanced analytics tries to facilitate the answer to the fifth question of why did something happen using predictive modeling. The combination of high-end statistical and data mining techniques along with analysts’ business acumen produces models that help organizations make informed decisions. Remember, this is just the beginning and causality is still a fair distance!

一棵苹果从树上掉下来。回答前四个问题有多难？我们大多数人都可以借助时钟和地图来回答这些问题。然而，Isaac Newton回答了第五个问题和他的回答 - Gravity。如果他已经停在那里，那么在他出生后近四百年后，没有人会想起他。他给出了一个数学模型来解释这种现象。

4重力

用任何其他物体替换苹果和地球，你就可以得到模型的一般公式。阿尔伯特爱因斯坦确实粉碎了牛顿的重力概念。然而，这种模型仍然适用于所有实际问题，并广泛用于火箭科学。

高级分析试图通过预测建模来回答第五个问题，即为什么会发生某些事情。高端统计和数据挖掘技术与分析师的商业敏锐度相结合，可以生成帮助组织做出明智决策的模型。请记住，这只是一个开始，因果关系仍然是一个公平的距离

Credit Scoring Models

Credit scorecards are models to predict the probability of a borrower default on his/her loan. The following is a simplified version of credit score with three variables

Credit Score = Age + Loan to Value Ratio (LTV) + Installment (EMI) to Income Ratio (IIR)

信用记分卡是预测借款人违约贷款概率的模型。以下是具有三个变量的信用评分的简化版本

信用评分=年龄+贷款与价值比率（LTV）+分期付款（EMI）与收入比率（IIR）

贷款价值比，英文loan to value,简写LTV，指贷款金额和抵押品价值的比例，多见于抵押贷款，如房产抵押贷款。

如某客户A的房产抵押贷款，抵押房产估值为100万人民币，而银行的信贷政策规定LTV<70%,银行最多可以贷给A客户70万元的贷款。

不同的抵押品贷款的LTV根据银行自身政策，各不相同。反映银行对抵押物的风险预期！

A 28-year-old man with the LTV of 75 and the IIR of 60 will have the score of 10+50+5 =65 and hence is a high credit risk.
一名28岁男子的LTV为75，IIR为60，他的得分为10 + 50 + 5 = 65，因此信用风险很高。

Classification of good & bad loans using two variables – LTV & IIR – by Roopam

Now the question is, how did we arrive at the bucket-wise score points and associated risk tables? By now, after going through the previous three articles of the series, you must have some idea how we will go about it. We have a historical list of good / bad borrowers (article 2) that we want to distinguish using predictor variables (article 3). There are several statistical & data mining techniques that could help us achieve our object such as

1. Decision tree
2. Neural Networks
3. Support Vector Machines
4. Probit Regression
5. Linear discriminant analysis
6. Logistic Regression

Logistic regression is the most commonly used technique for the purpose. We will explore more about logistic regression in the next article.

Sign-off Note

I must conclude this article by saying that the good analysts find a good mathematical model as beautiful as the model walking on the catwalk ramp.

现在的问题是，我们是如何得出存储分数和相关风险表的？到目前为止，在完成系列的前三篇文章之后，你必须知道我们将如何去做。我们有一个好/坏借款人的历史清单（第2条），我们希望使用预测变量来区分（第3条）。有几种统计和数据挖掘技术可以帮助我们实现我们的目标，例如

1.决策树
2.神经网络
3.支持向量机
4.概率回归
5.线性判别分析
6. Logistic回归

Logistic回归是最常用的技术。我们将在下一篇文章中探讨有关逻辑回归的更多信息。

签字笔记
我必须在结束本文时说，优秀的分析师找到了一个很好的数学模型，就像模特走在T台上一样漂亮。

Credit Scorecards – Logistic Regression (part 5 of 7)

http://ucanalytics.com/blogs/credit-scorecards-logistic-regression-part-5/

A Primer on Logistic Regression – Are you Happy?

Logistic regression for happiness- by Roopam

A few years ago, my wife and I took a couple of weeks’ vacation to England and Scotland. Just before boarding the British Airway’s plane, an air-hostess informed us that we were upgraded to business class. Jolly good! What a wonderful start to the vacation. Once we got onto to the plane, we got another tempting offer for a further upgrade to the first class. However, this time, there was a catch – just one seat was available. Now that is a shame, of course, we could not take this offer. The business class seats were fabulous before the first class offer came – by the way, all free upgrades. This is the situation behavioral economist describe as relativity & anchoring – in plain English comparison. Anchoring or comparison is at the root of pricing strategies in business and also to all the human sorrow. However, eventually the vacation mood took over and we enjoyed the business class thoroughly. Humans are phenomenally good at adjusting to the situation in the end and enjoy it as well. You will find some of the happiest faces with people in the most difficult situations. Here is a quote by Henry Miller “I have no money, no resources, no hopes. I am the happiest man alive”. Human behavior is full of anomaly – full of puzzles. The following is an example to strengthen this thesis.

几年前，我和妻子在英格兰和苏格兰度过了几个星期的假期。就在登上英国航空公司的飞机之前，一名空姐告诉我们，我们已升级为商务舱。快乐！度假真是一个美好的开始。一旦我们登上飞机，我们又获得了另一个诱人的提议，可以进一步升级到头等舱。然而，这一次，有一个问题 - 只有一个座位可用。当然，这是一种耻辱，我们无法接受这个提议。在提供头等舱优惠之前，商务舱座位非常棒 - 顺便说一下，所有免费升级。这是行为经济学家描述为相对论和锚定的情况 - 用简单的英语比较。锚定或比较是企业定价策略的根源，也是所有人类悲伤的根源。然而，最终度假心情接管了，我们彻底享受了商务舱。人类在适应最终情况方面非常擅长并享受它。在最困难的情况下，你会发现一些最快乐的面孔。以下是亨利米勒的一句话：“我没有钱，没有资源，没有希望。我是最幸福的人“。人类的行为充满了异常 - 充满了谜题。以下是加强本论文的一个例子

列侬，麦卡特尼，哈里森和贝斯特是这个星球上最着名的乐队 - 甲壳虫乐队的成员。好的，我知道你发现了这个错误。到现在为止，你必须说出正确的名字：John Lennon，Paul McCartney，George Harrison和Ringo Starr，而不是Pete Best。实际上，Ringo Starr是Pete Best的替代品，Pete Best是甲壳虫乐队的原始常规鼓手。皮特一定是被摧毁了，看到他的伙伴们在落后的时候冉冉升起。错了，在Google上搜索他 - 他是所有人中最快乐的披头士乐队。现在这是违反直觉的，我想我们不知道是什么让我们开心。

正如在前一篇文章中所承诺的那样，在本文中，我将尝试使用逻辑回归来探索幸福 - 这种技术广泛用于记分卡开发。

Source: flicker.com

Lennon, McCartney, Harrison, and Best are the members of the most famous band ever on the planet – the Beatles. Ok, I know you have spotted the error. By now your must have uttered out the right names: John Lennon, Paul McCartney, George Harrison and Ringo Starr not Pete Best. Actually, Ringo Starr was the replacement for Pete Best, the original regular drummer for the Beatles. Pete must have been devastated seeing his partners rising to glory while he was left behind. Wrong, search for him on Google – he is the happiest Beatle of all. Now that is counter intuitive, I guess we do not have a clue what makes us happy.

As promised in a previous article, in this article I will attempt to explore happiness using logistic regression – the technique extensively used in scorecard development.

我是一位彻底的经验主义者 - 支持基于事实的管理。因此，让我设计一个快速而肮脏的实验*来生成数据来评估幸福感。我们的想法是确定影响我们整体幸福感的因素/变量。让我列出一个生活在城市中的工作成年人的代表性因素列表：

Logistic Regression – An Experiment

I am a thorough empiricist – a proponent of fact-based management. Hence, let me design a quick and dirty experiment* to generate data to evaluate happiness. The idea is to identify the factors / variables that influence our overall happiness. Let me present a representative list of factors for a working adult living in a city:

Now, throw in some other factors to the above list such as – random act of kindness or an unplanned visit to a friend. As you could see, the above list can easily be expanded (recall the article on variable selection- article 3). This is a representative list and you will have to create your own to figure out factors that influence your level of happiness.

The second part of the experiment is to collect data. This is like maintaining a diary only this one will be in Microsoft Excel. Every night before sleeping, you could assess your day and fill up numbers in the Spreadsheet along with your overall level of happiness for the day (as shown in the figure below).

*I am calling this a quick and dirty experiment for the following reasons (1) It’s not a well thought out experiment but is created more to illustrate how logistic regression works (2) the observer and the observed are same in this experiment which might create a challenge for objective measurement.

After a couple of years of data collection, you will have enough observations to create a model – a logistic regression model in this case. We are trying to model feeling of happiness (column B) with other columns (C to I) in the above data set. If we plot B on the Y-axis and the additive combination of C to I (we’ll call it Z) on the X-axis it will look something like the plot shown below.

The idea behind logistic regression is to optimize Z in such a way that we get the best possible distinction between happy and sad faces, as achieved in the plot above. This is a curve-fitting problem with sigmoid function (the curve in violet) as the choice of function.

I would recommend using dates of observations (column A) in our model; this might give an interesting influence of seasons on our mood.

逻辑回归背后的想法是以这样的方式优化Z，使得我们在快乐和悲伤面孔之间得到最佳区分，如上图所示。这是一个曲线拟合问题，其中sigmoid函数（紫色曲线）作为函数的选择。

我建议在我们的模型中使用观察日期（A栏）; 这可能会给季节带来有趣的影响。

Applications in Banking and Finance

This is exactly what we do in case of analytical scorecards such as credit scorecards, behavioral scorecards, fraud scorecards or buying propensity models. Just replace happy and sad faces with …

• Good and Bad borrowers
• Fraud and genuine cases
• Buyers and non-buyers

…. for the respective cases and you have the model. If you remember in the previous article (4), I have shown a simple credit scorecard model: Credit Score = Age + Loan to Value Ratio (LTV) + Instalment (EMI) to Income Ratio (IIR)

A straightforward transformation of the sigmoid function will help us arrive at the above equation of the line. This is the final link to arrive at the desired scorecard.

Variable Transformation in Credit Scorecards

The Swordsmith – by Roopam

I loved the movie Kill-Bill, both parts. In the first part, I enjoyed when Uma Thurman’s character went to Japan to get a sword from Hattori Hanzō, the legendary swordsmith. After learning about her motive, he agrees to make his finest sword for her. Then Quentin Tarantino, director of the movie, briefly showed the process of making the sword. Hattori Hanzō transformed a regular piece of iron to the fabulous sword – what a craftsman. This is fairly similar to how analysts perform transformation of the sigmoid function to the linear equation. The difference is that analysts use mathematical tools rather than hammers and are not as legendary as Hattori Hanzō.

我喜欢电影Kill-Bill这两部分。在第一部分中，当Uma Thurman的角色去日本从传说中的剑士HattoriHanzō手中拿剑时，我很享受。在了解了她的动机之后，他同意为她做出最好的剑。然后电影导演昆汀·塔伦蒂诺（Quentin Tarantino）简要介绍了制作剑的过程。 HattoriHanzō将一块普通的铁片变成了神话般的剑 - 这真是一个工匠。这与分析师如何将S形函数转换为线性方程非常相似。不同之处在于，分析师使用数学工具而不是锤子，并不像HattoriHanzō那样具有传奇色彩。

Reject Inference

Reject inference is a distinguishing aspect about credit or application scorecards which is different from all other classification models. For the application scorecards, the development sample is biased because of the absence of performance for rejected loans. Reject inference is a way to rectify this shortcoming and removing the bias from the sample. We will discuss reject inference in detail in some later article on YOU CANalytics.

拒绝推断是信用或应用记分卡的一个显着方面，它与所有其他分类模型不同。对于应用记分卡，由于拒绝贷款缺乏绩效，开发样本存在偏差。拒绝推断是一种纠正这一缺点并消除样本偏差的方法。我们将在后面有关您的CANalytics的文章中详细讨论拒绝推断。

Sign-off Note

Now that we have our scorecard ready the next task is to validate the predictive power of the scorecard. This is precisely what we will do in the next article. See you soon.

Credit Scorecards – Model Validation (Part 6 of 7)

http://ucanalytics.com/blogs/credit-scorecards-model-validation-part-6/

There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.

– Albert Einstein

A Commentary on Curiosity

Advanced Analytics Professional: An Unbiased Observer – by Roopam

I think the best way to appreciate and enjoy the trivial is to travel. When I say trivial, it includes doorknobs, posters, letterboxes, graffiti and everything we never bother to turn our heads for in our own city. I experienced the same last week while traveling with my wife across Florence and Tuscany. I think one’s level of awareness and curiosity goes up many-fold while traveling. In Florence, we stayed at a lovely bed-and-breakfast named Fiorenza. The breakfast was good and the people even better. There we met this amicable family from the UK with a year old baby named Owen and his 7-year-old sister Kyra. Owen and Kyra were playing hide and seek while having their breakfast. Kyra hid behind the same chair repeatedly and jumped out to reveal herself to her younger brother. Owen was pleasantly surprised every time during this process. All humans are born curious. However, they lose it as they grow older and get familiar with things. The phenomenon could be the reason why we never turn our heads for the trivial in our own city.

我认为欣赏和享受琐事的最佳方式是旅行。当我说琐碎的时候，它包括门把手，海报，信箱，涂鸦以及我们从未在我们自己的城市中转过头来做的一切。上周我与妻子一起在佛罗伦萨和托斯卡纳旅行时经历了同样的经历。我认为一个人的意识水平和好奇心在旅行时会增加很多倍。在佛罗伦萨，我们住在一个可爱的住宿加早餐，名为Fiorenza。早餐很好，人们甚至更好。在那里，我们遇到了这个来自英国的友好家庭，一个名叫Owen的婴儿和他7岁的妹妹Kyra。欧文和凯拉在吃早餐时玩捉迷藏。凯拉反复躲在同一把椅子后面，跳出来向她的弟弟透露自己。欧文在这个过程中每次都感到惊喜。所有人都天生好奇。然而，随着年龄的增长和熟悉事物，他们会失去它。这种现象可能是我们永远不会为自己城市中的琐事而烦恼的原因。

Curiosity and Data Science Career

Being curious and aware requires constant energy and effort. Perhaps, humans have the natural tendency to slip into a low energy state. Nonetheless, this is particularly dangerous for analysts since their job requires finding meaning in something that seems mundane to others. In my opinion, the biggest challenge for analytics is not the sophistication of statistical algorithms and enhancement of computing power, but for its practitioners to stay curious and constantly ask questions. Zen Buddhists try to achieve cosmic awareness by living in the moment. If that is too difficult, I would recommend that treat your job like a wonderful travel destination and be a good tourist – curious and aware.

Ok, so that was a bit of a detour from our original discussion on scorecards. However, there are a couple of reasons for telling you the above: primarily, to tell you why I was late in posting this part of the series. Secondly, I would like us to have a discussion on the importance and challenges of being curious at work and life in general. I already have a few examples in mind i.e. Louis Pasteur and Edward Lorenz but that is for later.

Now, let’s continue with the topic for this part i.e. model evaluation.

好奇心与数据科学事业
充满好奇和意识需要不断的精力和努力。也许，人类有自然倾向于陷入低能量状态。尽管如此，这对分析师来说尤其危险，因为他们的工作需要在对他人而言看似平凡的事情中找到意义。在我看来，分析的最大挑战不是统计算法的复杂性和计算能力的提高，而是让其从业者保持好奇并不断提出问题。禅宗佛教徒试图通过生活在当下来实现宇宙意识。如果这太难了，我建议把你的工作当作一个很棒的旅游目的地，做个好游客 - 好奇又有意识。

好的，所以这与我们对记分卡的原始讨论有点迂回。但是，有几个原因告诉你上面的内容：主要是告诉你为什么我在发布这个系列的这一部分时迟到了。其次，我希望我们讨论一般对工作和生活充满好奇的重要性和挑战。我已经有一些例子，即路易斯巴斯德和爱德华洛伦兹，但这是为了以后。

现在，让我们继续讨论这个部分的主题，即模型评估。

Model Validation & Evaluation

Model Evaluation & Validation: the test of the pudding is in the eating – by Roopam

When I was in high school, I joined a cricket academy during the summer vacations. Cricket is a game quite similar to baseball. I shall use baseball terminology in parenthesises for everyone to understand. The design of the training camp was to train for about a month followed by a full game with kids at same skill-level from another club. There was this tall and lean kid with us in the camp; he was the star bowler (pitcher) throughout during the training sessions. He used to bowl (pitch) some of the best Yorkers (curve balls). We were quite sure he would outperform everyone in the game. We ask him to open the bowling, his first bowl went for a six (home run) followed by several more. Maybe it was a mix match pressure, expectations, and the crowd but his performance was an absolute disaster. Later the coach told us what happened was not unusual and he had seen this several times before. At higher levels, the game is played not on the ground but the space between the ears. Clearly, he was referring to players’ presence of mind and temperament.

当我在高中时，我在暑假期间加入了板球学院。 Cricket是一款与棒球非常相似的游戏。我将在括号中使用棒球术语，让每个人都能理解。训练营的设计是训练大约一个月，然后与来自另一个俱乐部的相同技能水平的孩子进行完整的比赛。在营地里有一个高大瘦弱的孩子和我们在一起;在训练期间，他一直是明星投手（投手）。他过去常常把一些最好的Yorkers（曲线球）弄成一团糟。我们非常肯定他会在游戏中胜过每个人。我们要求他打开保龄球，他的第一个碗去了六个（本垒打），然后是几个。也许这是混合比赛压力，期望和人群，但他的表现是绝对的灾难。后来教练告诉我们发生的事情并不罕见，他以前曾多次见过这件事。在更高的级别，游戏不是在地面上播放，而是在耳朵之间的空间播放。显然，他指的是球员的思想和气质。

Sampling Strategy for Model Validation

As the famous saying goes, the test of the pudding is in the eating. One could be a star on the training fields but a complete flop in the match situation. The same is true for an analytical model as well. A model, after going through a round of training (Part 5 of the series) goes through a several rounds of testing.

1. Out of sample test: remember article 2, where we have divided our sample into the training and the test sample. The first level of testing happens on the holdout or test sample. The test sample needs to perform as well as the training sample. Let us come back to this in the next section when I will discuss the measures for performance and ROC curve.

2. Out of time sample test: since the model was built on a sample of the portfolio with reasonable vintage (refer to Part 2), the analyst would like to test the performance of a more recent portfolio. The number of bad borrowers (90+ DPD) in this out of time sample will be certainly less but the overall trend of good/bad ratio against scores will still be a good indicator for model performance. Additionally, the analyst could relax the condition for bad loans and consider 30+ DPD as bad. Again, the overall trend should match the scorecard estimations.

3. On field test: this is where the test of the pudding is; the analyst needs to be completely aware of any credit policy changes that the bank has gone through since the scorecard is developed and more importantly, the impact the changes will have on the scorecard. Always remember not every policy change will influence the scorecard – a good business understanding and a bit of common sense really help here. A regular monitoring and accordingly calibrating the scorecard is a good way to keep it updated.

正如俗名所说，布丁的考验就在于吃。一个人可能是训练场上的明星，但在比赛情况下完全失败了。对于分析模型也是如此。经过一轮训练（系列的第5部分）后，模型经过了几轮测试。

1.train VS test样品外测试：记住第2条，我们将样品分成培训和测试样品。第一级测试发生在保持或测试样本上。测试样本需要与训练样本一样好。让我们在下一节回到这一点，我将讨论性能和ROC曲线的措施。

2.OOT超时样本测试：由于该模型是基于合理年份的投资组合样本（参见第2部分），因此分析师希望测试最近投资组合的表现。在这段时间样本中，不良借款人（90+ DPD）的数量肯定会减少，但是对比分的好/坏比率的整体趋势仍将是模型表现的良好指标。此外，分析师可以放松不良贷款的条件，并认为30+ DPD是坏的。同样，整体趋势应该与记分卡估计相匹配。

3.政策变化对模型影响大

场景测试：这是布丁测试的地方;分析师需要完全了解银行自开发记分卡以来所经历的任何信贷政策变化，更重要的是，变更将对记分卡产生的影响。永远记住不是每个政策变化都会影响记分卡 - 良好的商业理解和一些常识在这里真的很有帮助。定期监控并相应地校准记分卡是保持更新的好方法。

Performance Tests for Model Validation

There are several ways to test the performance of the scorecard such as confusion matrix, KS statistics, Gini and area under ROC curve (AUROC) etc. The KS statistics is widely used metric in scorecards development. However, I personally prefer the AUROC to the others. I must add the Gini is a variant of the AUROC. The reason for my liking of the AUROC could be my formal training in Physics and engineering. I think it is a more holistic measure and lets the analyst visually analyze the model performance. I prefer graph and visual statistics any day to raw numbers.

有几种方法可以测试记分卡的性能，例如混淆矩阵，KS统计，基尼系数和ROC曲线下面积（AUROC）等.KS统计量是记分卡开发中广泛使用的度量标准。但是，我个人更喜欢AUROC和其他人。我必须添加Gini是AUROC的变种。我喜欢AUROC的原因可能是我在物理和工程方面的正式培训。我认为这是一个更全面的衡量标准，让分析师可以直观地分析模型的表现。我更喜欢图形和视觉统计数据，以及原始数字。

ROC Curve: for Credit Scorecard Model Validation and Evaluation – by Roopam

The adjacent graph shows a ROC. The two axes on the curve are true and false positive rates. As expected, the plot informs about the level of prediction for the model. A perfect model will perfectly segregate good and bad cases. Hence, you will get 100% true positives in the beginning (i.e. absolute lift) as shown with the green curve in the graph. However, like anything in life perfection does not exist. As they say – If it is too good to be true it probably is. On the other extreme is a worthless model, curve marked in red. Anything close to or below the red curve is as good as tossing a coin, then why to bother with the effort to build a model. Finally, a typical scorecard ROC will look like the blue curve. The AUROC for a usual credit-scoring model is within 70 to 85, higher the better. However, for some fraud and insurance models, a slightly above 60 is an acceptable ROC. Again, analysts should be sure about the business benefits from the scorecard before finalizing the ROC. A simple cost-benefit analysis helps significantly before finalizing the model and reporting it to the top management.

相邻的图表显示了ROC。曲线上的两个轴是真实和误报率。正如预期的那样，该图表通知了该模型的预测水平。一个完美的模型将完美地隔离好的和坏的案件。因此，您将在开始时获得100％真实的正数（即绝对提升），如图中的绿色曲线所示。但是，生活中的任何事物都不存在完美。正如他们所说 - 如果真是太好了，那可能就是这样。另一个极端是一个毫无价值的模型，曲线标记为红色。任何靠近或低于红色曲线的东西都和投掷硬币一样好，那么为什么要费心去打造一个模型。最后，典型的记分卡ROC看起来像蓝色曲线。通常的信用评分模型的AUROC在70到85之间，越高越好。但是，对于某些欺诈和保险模式，略高于60的是可接受的ROC。同样，分析师应该在最终确定ROC之前确保记分卡的业务收益。在最终确定模型并将其报告给最高管理层之前，简单的成本效益分析可以显着提供帮助。

Sign-off Note

I hope after reading this, you will pick up your camera and visit that unexplored nook at the corner of the street – and be ready for some wonderful surprises!

References
1. Credit Risk Scorecards: Developing and Implementing Intelligent Credit Scoring – Naeem Siddiqi
2. Credit Scoring for Risk Managers: The Handbook for Lenders – Elizabeth Mays and Niall Lynas

Credit Scorecards – Business Integration of Predictive Analytics (part 7 of 7)

http://ucanalytics.com/blogs/credit-scorecards-predictive-analytics-part-7/

Columbus – A lesson in Leadership

The leader – by Roopam

Christopher Columbus – I have adored this man for various reasons at various stages of my life. At seven, I adored him because his mistakes were applauded and became part of history – Columbus mistook Native Americans for Indians because he thought he had landed in Asia rather than the Americas. While my mistakes were circled with red ink and awarded zero, I felt that was unfair – Oh Columbus, you lucky bastard! At seventeen, I adored him because he was a rebel as he went against the popular belief about the planet and sailed in the opposite direction – Oh Columbus, you non-conformist! Now that I feel I know him a little better, I adore him for setting the direction others can follow. He was not the first one to reach the Americas from Europe, although he did not know this. There are references of others achieving this feat before him. However, he was the one who sensitized Europe towards America. At present, the predominant population in the Americas is of European origin. So many people must have followed directions set forth by Columbus – Oh Columbus, you leader!

15th August – today India, the largest democracy, is celebrating its Independence day. Let us take a moment to applaud the spirit of Columbus in all of us – the explorer, the free thinker – before venturing into the integration of predictive analytics with business processes. In addition, yes, Columbus will pave the way for us to understand this integration better.

克里斯托弗·哥伦布 - 我在生命的各个阶段因各种原因而崇拜这个男人。七岁时，我很尊敬他，因为他的错误得到了掌声并成为了历史的一部分 - 哥伦布把印第安人误认为印第安人，因为他认为他已经登陆亚洲而不是美洲。虽然我的错误被红色墨水圈起来并且被授予零，但我觉得那是不公平的 - 哦哥伦布，你这个幸运的混蛋！十七岁的时候，我很尊敬他，因为他是一个反叛者，因为他违背了对这个星球的普遍看法，并朝着相反的方向航行 - 哦哥伦布，你是不守规矩的！现在我觉得我对他的了解要好一点，我崇拜他，因为他设定了其他人可以遵循的方向。他不是第一个从欧洲到达美洲的人，尽管他不知道这一点。在他之前有其他人提到这一壮举。然而，他是那个使欧洲对美国敏感的人。目前，美洲的主要人口来自欧洲。很多人必须遵循哥伦布提出的指示 - 哦哥伦布，你的领导！

8月15日 - 今天，最大的民主国家印度正在庆祝其独立日。在冒险进行预测分析与业务流程的整合之前，让我们花点时间为我们所有人 - 探险家，自由思想家 - 赞扬哥伦布的精神。此外，是的，哥伦布将为我们更好地理解这种整合铺平道路。

Integrating Predictive Analytics with Business Processes

Let us accept it, probability and statistics, despite being logical and right, do not come naturally to us humans. Numerous books and theses have repeatedly proven this fact*. The whole furor about the solution to the Monty Hall problem (read the article) is a testimonial to this thesis. Once we have grounded this fact, let us go back to Columbus. Like him, as an analyst, once you have explored the exotic land you want to show it to others. You want them to appreciate it and make it their home.

让我们接受它，概率和统计，尽管是合乎逻辑的，但是对我们人类来说并不自然。许多书籍和论文反复证明了这一事实*。关于Monty Hall问题解决方案的全部意见（阅读文章）是本论文的证明。一旦我们接受了这个事实，让我们回到哥伦布。像他一样，作为一名分析师，一旦你探索了异国情调的土地，你就要向别人展示它。你希望他们欣赏它并将它作为自己的家。

*Read Daniel Kahneman’s Thinking, Fast and Slow

The Ace 王牌of Advanced Analytics Success

The sole purpose of analytics is business enhancement and growth. The intellectual exercise has to translate into tangible returns. This is by no means an easy task. Let me present the ACE of integration of predictive analytics with business processes. I am feeling like a great business Guru to coin this term ACE – I know you readers are generous to let me get away with some self-indulgence. In fact, could someone tell me how to get this copyrighted? ACE stands for Accessible, Communication-&-Education, and Ease-of-Use.

The ACE of Predictive Analytics project success – by Roopam

Accessible: All humans are involved-creatures. We want to be part of the new. This is also true for that grumpy manager resisting change in the corner office. He wants to be part of the change. He is also ready for controlled experiments. Make him part of it. This has helped me significantly while implementation of advanced analytics solutions. The onus is with the analyst to constantly work with the concern parties. Their knowledge will only enhance the research. Yes, it is time-consuming and could also hurt your ego at times but is absolutely essential for the project’s success.

Communication-&-Education: All humans are curious creatures. We all love to get educated. Analytics projects are also about educating the decision maker about aspects of predictive analytics including pitfalls. As we have discussed, statistics is not an intuitive science. However, you will find most people more than willing to listen to you. It is about communicating your excitement for the field and the results. If they are not listening to you – try harder and be creative.

Ease-of-use: Most humans do not like unnecessary complications in their life and analytics is no exception. This is where an astute use of information technology to integrate analytics with business process is an absolute must. For example, the first versions of credit scorecards I saw were excel based standalone applications. Here, the credit underwriters were punching in the information about the borrower all over again. No wonder they hated it. It is not that difficult to integrate the underwriting application with the scorecard, where no extra effort is required for the users. The job of analytics professional is not over till he/she drives the application usage to generate business benefits.

Sign-off Remark

Wow! Am feeling good after completing this seven-part series (part 1 -7) on Analytical Scorecards, trust me it is good fun writing. I know you are reading the articles because of all the positive feedback I am receiving. If you would like to write some articles on YOU CANalytics, please drop a mail on [email protected] or contact me. I’ll create an author’s account for you. Look forward to hearing back from you.

See you soon with a new topic on Analytics.

分析的唯一目的是业务增强和增长。智力活动必须转化为实际回报。这绝不是一件容易的事。让我介绍预测分析与业务流程集成的ACE。我感觉自己是一个伟大的商业大师，可以用这个术语来证明ACE - 我知道读者很慷慨，让我放弃一些自我放纵。事实上，有人可以告诉我如何获得这个受版权保护的版权？ ACE代表Accessible，Communication - ＆ - Education和Ease-of-Use。

无障碍：所有人都参与其中 - 生物。我们希望成为新的一部分。对于那个脾气暴躁的经理来说，抵制角落办公室的变革也是如此。他希望成为变革的一部分。他也准备好进行对照实验。让他成为一部分。在实施高级分析解决方案时，这对我有很大帮助。分析师有责任不断与关注方合作。他们的知识只会加强研究。是的，这很费时，有时也会伤害你的自我，但对于项目的成功绝对必不可少。

沟通 - 和 - 教育：所有人类都是好奇的生物。我们都喜欢接受教育。分析项目还涉及向决策者提供关于预测分析（包括陷阱）方面的教育。正如我们所讨论的，统计学不是一门直观的科学。但是，你会发现大多数人都愿意听你的。它是关于沟通您对该领域和结果的兴奋。如果他们没有听你的话 - 努力尝试并发挥创意。

易于使用：大多数人不喜欢生活中不必要的麻烦，分析也不例外。在这里，精明地使用信息技术将分析与业务流程集成是绝对必要的。例如，我看到的信用记分卡的第一个版本是基于excel的独立应用程序。在这里，信用承销商正在重新审视借款人的信息。难怪他们讨厌它。将承保应用程序与记分卡集成并不困难，因为记分卡不需要用户额外的努力。分析专业人员的工作还没有结束，直到他/她推动应用程序使用以产生商业利益。

sklearn实战-乳腺癌细胞数据挖掘（博主亲自录制视频）

https://study.163.com/course/introduction.htm?courseId=1005269003&utm_campaign=commission&utm_source=cp-400000000398149&utm_medium=share

你可能感兴趣的:(信用评分卡Credit Scorecards （1-7）)

2014-2023年各区县数字普惠金融指数数据 -夜深- 数据区县区县数字普惠金融指数
2014-2023年各区县数字普惠金融指数数据1、时间：2014-2023年2、来源：北大数字普惠金融指数3、范围：2800个县4、指标：综合指数、覆盖广度、使用深度、支付业务、保险业务、货币基金业务、投资业务、信用业务、信贷业务、数字化程度5、参考文献：郭峰,王靖一,王芳,孔涛,张勋,程志云.测度中国数字普惠金融发展:指数编制与空间特征6、下载链接：2014-2023年各区县数字普惠金融指数数据
打造金融数据新引擎，看永洪科技助力头部农信社搭建一站式分析平台永洪科技金融数据可视化 BI 数据分析大数据
在数字化转型的浪潮中，金融行业作为经济发展的核心引擎，正加速探索数字化、智能化的新路径。永洪科技，近日成功助力某省农村信用社联合社（简称：Z企业）完成了其数字化转型的重要一步，通过部署先进的商业智能解决方案，为Z企业的业务升级与效能提升注入了强劲动力。随着智能金融时代的来临，以大数据、人工智能、移动互联等新兴技术为核心的金融科技持续赋能银行金融业务数字化、智能化、开放化的发展，为金融机构营销体系的
清华出品DeepSeek教程1-7版：前沿技术学习的黄金资源库你好ITgg pdf
《清华出品DeepSeek教程1-7版：前沿技术学习的黄金资源库》「DeepSeek清华资料」共7册链接：https://pan.quark.cn/s/b8d8760976ca「DeepSeek使用手册大全」链接：https://pan.quark.cn/s/52c234062a2e「DeepSeek资料合集」链接：https://pan.quark.cn/s/71c8604f0e8a「DeepS
策略模式详解：实现灵活多样的支付方式 Dong雨策略模式 java
多支付方式的实现：策略模式详解策略模式（StrategyPattern）是一种行为设计模式，它定义了一系列算法，并将每个算法封装起来，使它们可以互换使用。策略模式使得算法可以独立于使用它的客户端变化。本文将通过一个具体的业务场景来介绍策略模式，并给出相应的代码实现。业务场景我们以一个电商平台为例，该平台支持多种支付方式，包括信用卡支付、PayPal支付和比特币支付。我们希望在不修改客户端代码的情况
25年申报工商年报前先看这篇笔记，帮你避坑，少走弯路！搬砖小杨聊资质笔记
又到工商年报申报的时候了（25年截止日期6月30日）,今年年报申报与去年有点区别，我特意整理出来与大家分享，帮助大家避坑。笔记不长，5分钟时间让你事半功倍，你就是老板眼中最靓的仔！！1、今年国家企业信用信息公示系统做了个更新，未完成年报填写或有多家公司需要申报的，一定要点击退出登录，不要直接关闭网页。否则当你想要继续填写年报或申报其他公司的，需要等待系统【自动退出登录】，时间2-3个小时，会大大影
面试官问：什么是分布式定时任务调度？鸡米花不花 java 分布式分布式数据库网络协议 java
任务调度的背景在业务系统中有很多这样的场景：1、账单日或者还款日上午10点，给每个信用卡客户发送账单通知，还款通知。如何判断客户的账单日、还款日，完成通知的发送？2、银行业务系统，夜间要完成跑批的一系列流程，清理数据，下载文件，解析文件，对账清算、切换结算日期等等。如何触发一系列流程的执行？3、金融机构跟人民银行二代支付系统对接，人民银行要求低于5W的金额（小额支付）半个小时打一次包发送，以缓解并
手机租赁系统开发核心技术解析红点租赁系统开发其他
内容概要如果把手机租赁系统比作一台精密运转的智能管家，那它的骨架可不是用代码随便搭的乐高积木。这玩意儿得同时搞定三件事：让用户像刷短视频一样流畅下单，让风控系统比小区门禁还难糊弄，还得让物流信息比外卖小哥的定位更透明。想象一下，当你在APP里滑动挑选最新款折叠屏手机时，后台其实正在上演三重加密的信用评分大战——你的芝麻信用分、电商平台消费记录甚至社交账号活跃度，都被塞进算法熔炉里炼成租赁权限的通行
小程序租赁系统智能风控与多端适配实践红点租赁系统开发其他
内容概要当你的手机里塞满各类小程序时，小程序租赁系统正悄悄把"租东西"这件事变成科技界的"变形金刚"。这套系统不仅打通了支付宝、微信、APP三端的数据壁垒，还像给每个用户装上了信用扫描仪——央行征信评估叠加芝麻信用免押，让押金争议直接退场。更妙的是，区块链存证技术给每笔交易贴上防伪标签，就算遇到纠纷，司法存证模块也能让证据链硬得像块钢板。如果你还在纠结"选安卓还是iOS"，这套系统早就在抖音、PC
信用租赁系统全链路风控解决方案红点租赁系统开发其他
内容概要当商户们头疼于租出去的设备总被拖欠时，这套信用租赁系统的风控设计像给生意上了把智能锁——芝麻信用分成了"入场券"，区块链存证化身"数字公证员"，而支付宝的代扣功能则像一位永不迟到的收租管家。这套方案最妙的地方在于，它把原本分散的流程拧成一股绳：从用户资质筛查、合同存证到代扣执行，甚至为纠纷预留了司法仲裁通道。如果租赁公司还在用纸质合同和人工催缴，建议试试把押金换成数据流——毕竟，比起担心用
手机租赁平台开发核心技术解析红点聊租赁其他
内容概要当我们将目光投向手机租赁平台开发的核心架构，会发现这本质上是一场"信任经济"与"技术基建"的碰撞。区块链技术正化身数字公证员，让信用免押从概念演变为可验证的链上存证；支付宝服务商接口则像精密的齿轮组，将支付清算、合同存证与设备监管锁串联成自动化流水线。有趣的是，这套系统甚至能通过用户刷短视频的停留时长，推演出潜在的履约意愿——当然，这得归功于那些在后台疯狂运算的智能风控模型。就像在游乐场租
手机租赁系统架构设计与实践解析红点聊租赁其他
内容概要如果把手机租赁系统比作一家智能便利店，那它的架构设计就是货架布局手册——既要让用户轻松找到想要的机型，还得防止有人顺走充电器不还。这套系统的心脏由四个模块组成：用户管理负责刷脸认证和信用档案，智能风控模块像全天候AI侦探扫描可疑行为，订单追踪系统化身设备定位雷达，支付接口则要像高速公路收费站般丝滑。有意思的是，系统居然能通过用户刷短视频的时长预测还款概率，这可比星座运势靠谱多了。建议初创团
SQL必知必会40-SQLite：为什么微信用SQLite存储聊天记录？程序员zhi路 MYSQL专栏 sql sqlite jvm
我在上一篇文章中讲了WebSQL，当我们在Chrome、Safari和Firefox等浏览器客户端中使用WebSQL时，会直接操作SQLite。实际上SQLite本身是一个嵌入式的开源数据库引擎，大小只有3M左右，可以将整个SQLite嵌入到应用中，而不用采用传统的客户端／服务器（Client/Server）的架构。这样做的好处就是非常轻便，在许多智能设备和应用中都可以使用SQLite，比如微信就
Typora的学习，Markdown的语法简介，VsCode+Markdown的愉快写作 Geek-Men 机器人工程专业的菜狗日常 markdown html html5
Typora的学习，Markdown的语法简介，VsCode+Markdown的愉快写作来，看个神器相信用了这，以后你将抛弃其他的文本编辑器，什么，学了之后可以让文章逼格拉满？不用再用word来折磨自己，让写作从此愉悦？还不快点来和我一起学习？什么是Typora？Typora是一款支持实时预览的MarkDown文本编译器。支持Windows，MacOS，以及Linux三方平台白嫖党狂喜，因为它是完
智能体群体决策在投资组合风险控制中的应用 AI智能涌现深度研究 DeepSeek R1 &大数据AI人工智能人工智能物联网大数据 ai
1.1引言1.1.1投资组合风险控制的重要性投资组合风险控制是金融领域中至关重要的一环。在市场波动和不确定性加剧的背景下，投资者面临着诸多风险，如市场风险、信用风险、流动性风险等。有效的投资组合风险控制能够帮助投资者降低风险、保持资产价值稳定，从而实现长期投资目标。投资组合风险控制的重要性体现在以下几个方面：降低风险：通过分散投资、优化资产配置等方式，减少单一资产的市场波动对整个投资组合的影响，降
华为工程师带你实战C++：专业深度全面完整 6v6-博客华为 c++java
华为工程师带你实战C++：专业深度全面完整本课程以实战为主，课上全部代码均为边讲边手敲，学完此套课程，可以达到一个C++中高级开发者的水平。既适合于刚刚入门有一定的语言基础的人，也适合于有一定的开发经验的人。课程大纲第1章：C++基础与提高1-1C++学习开山篇1-2C到C++类型安全增强1-3Cout格式输出，函数重载初步1-4函数重载原理1-5C++运算符重载初步1-6C++函数默认参数1-7
SSL 证书对网站的重要性体现在哪些方面？麦辣鸡腿汉堡 ssl 网络协议网络
网站安装SSL（SecureSocketsLayer）证书有诸多好处，主要体现在安全、信任和性能等方面。一、先讲讲安全性。SSL证书能够在用户浏览器与网站服务器之间构建一条加密通道，以此保护数据传输。当用户在网站上输入登录账号密码、信用卡信息等敏感内容时，这些数据都会被加密。如此一来，黑客以及其他心怀不轨的人，就无法偷偷截取、窃取用户的数据了。SSL证书申请：打开JoySSL官网填写注册码2309
解读 3Jane Protocol：基于信用的货币市场还有什么新玩法？ web3区块链比特币
作者：Techub独家解读撰文：Tia，TechubNews现代资本主义金融体系建立在两个基本支柱之上：交易媒介和信用创造。在crypto中，稳定币已然成为加密市场的交易媒介。然而，DeFi生态的增长仍受限于缺乏可扩展的、资本高效的信用创造机制。在当前的DeFi生态中，借贷机制主要分为两类：一类是Aave、Morpho等平台提供的超额抵押贷款，另一类是Goldfinch等协议提供的无抵押贷款，但通
SSL 证书对网站的重要性体现在哪些方面？ httpssslip
网站安装SSL（SecureSocketsLayer）证书有诸多好处，主要体现在安全、信任和性能等方面。一、先讲讲安全性。SSL证书能够在用户浏览器与网站服务器之间构建一条加密通道，以此保护数据传输。当用户在网站上输入登录账号密码、信用卡信息等敏感内容时，这些数据都会被加密。如此一来，黑客以及其他心怀不轨的人，就无法偷偷截取、窃取用户的数据了。SSL证书申请：打开JoySSL官网填写注册码2309
如何安全处置旧设备？ FreeBuf- 安全
每年，数百万台旧设备因老化、故障或被新产品取代而被丢弃，这些设备上存储的数据可能带来安全风险。如果设备没有被正确删除数据，这些数据往往仍可被恢复。因此，安全处置旧设备至关重要。旧设备可能包含的敏感数据旧设备中可能仍然存有以下信息：保存的密码和登录凭证银行和信用卡信息个人照片、电子邮件和文档与身份相关的数据（护照扫描件、社保号码等）如果这些信息落入不法分子手中，可能被用于身份盗窃、金融欺诈或未经授权
手机租赁平台开发核心技术解析红点聊租赁其他
内容概要在开发手机租赁平台这件事上，技术团队就像在组装一台精密仪器——每个齿轮的咬合都关乎整台机器的运转效率。信用免押系统是这台仪器的核心动力舱，它需要区块链存证技术扮演"数字保镖"，用分布式账本给每笔交易打上防伪钢印；而智能风控模型则化身"AI侦探"，通过机器学习在用户行为数据里嗅出潜在风险。不过千万别以为技术堆砌就能高枕无忧，关键是如何让这些模块像交响乐团般默契配合：建议企业先绘制清晰的业务流
手机租赁系统全链路开发实战红点聊租赁其他
内容概要如果把手机租赁系统开发比作造车，那信用评估模块就是发动机，区块链存证是行车记录仪，而物流追踪则是GPS导航——缺了哪个环节都可能导致项目"抛锚"。本实战指南将带你从央行征信接口调试的"弯道超车"，到区块链存证的"法律安全带"配置，再到物流跟踪系统的"实时路况"对接，完整还原系统开发的全生命周期。有趣的是，我们甚至为动态租金算法准备了三种配方：基础版像煮泡面般简单粗暴，进阶版堪比分子料理的精
免押租赁系统创新解决方案助力品牌高效传播与用户体验提升红点聊租赁其他
内容概要在如今这个共享经济蓬勃发展的时代，免押租赁系统正如雨后春笋般涌现，成为了品牌传播新宠。随着消费者对个性化与便捷服务的需求不断提升，这一系统，尤其是通过信用评估和代扣支付的应用，显得尤为重要。“想象一下，无需押金的租赁，就像是在享受一场免费的冒险，你只需带上你的好心情即可！”以下是免押租赁系统市场前景的几个关键要点：关键点描述安全性信用评估机制为双方提供了保障，让用户安心租赁。便捷性代扣支付
python可應用在金融分析的那一個方面，如何部署在linux server上面。蠟筆小新工程師金融
Python在金融分析中應用廣泛，以下是幾個主要方面：###1.**數據處理與分析**-使用**Pandas**和**NumPy**等庫來處理和分析大規模數據集，進行清理、轉換和統計運算。-舉例：處理歷史市場數據，分析價格趨勢、交易量等。###2.**機器學習與預測**-使用**scikit-learn**、**TensorFlow**或**PyTorch**建立模型進行股票價格預測、信用風險評估
Python, Java 联合开发全国以及港澳主要商业银行办信用卡实操APP (Siliver) Geeker-2025 python java
以下是一个使用Python和Java联合开发全国以及港澳主要商业银行办信用卡实操APP的示例架构和部分代码示例。这个APP主要功能包括查询银行信息、了解办信用流程、模拟申请信用卡等操作。###整体架构概述-**Python部分**：-用于数据处理和分析，例如从各种数据源获取银行信息、信用政策等数据，并进行数据清洗和整理。-可以利用数据分析和可视化库来辅助生成信用评估报告和相关图表。-**Java部
华为OD机试 - 信道分配 - 贪心算法（Python/JS/C/C++ 2024 D卷 200分）哪吒 python 华为od 贪心算法
一、题目描述算法工程师Q小明面对着这样一个问题，需要将通信用的信道分配给尽量多的用户：信道的条件及分配规则如下：所有信道都有属性"阶"。阶为r的信道的容量为2^r比特；所有用户需要传输的数据量都一样：D比特；一个用户可以分配多个信道，但每个信道只能分配给一个用户；当且仅当分配给一个用户的所有信道的容量和>=D，用户才能传输数据；给出一组信道资源，最多可以为多少用户传输数据？二、输入描述第一行，一个
私有IP、VLAN和VPC，分别适合哪些场景你知道吗？ Akamai中国云计算 tcp/ip 网络协议网络云计算云平台云服务 VPC
当我们在云中构建应用程序，尤其是使用了第三方云服务商的服务并且我们无法完全掌控后端的每部分时，安全性可能是最需要关注的地方。但这是一项充满挑战的工作，因为保护应用程序的方法实在是太多了！为了改善安全性，开发者可能会使用大量工具和资源，以至于我们也许很难理解和选择所需的内容。这就像给购物狂人一张无限额的信用卡然后把Ta丢在高档购物中心里一样！说真的，确实会有点让人感觉不知所措。我们会面临各种选择：I
基于hive的电信离线用户的行为分析系统赵谨言论文经验分享毕业设计
标题:基于hive的电信离线用户的行为分析系统内容:1.摘要随着电信行业的快速发展，用户行为数据呈现出海量、复杂的特点。为了深入了解用户行为模式，提升电信服务质量和精准营销能力，本研究旨在构建基于Hive的电信离线用户行为分析系统。通过收集电信用户的通话记录、上网行为、短信使用等多源数据，利用Hive数据仓库工具进行数据存储和处理，采用数据挖掘和机器学习算法对用户行为进行分析。实验结果表明，该系统
什么是重放攻击(Reply attack)? 黑风风网络安全安全重放攻击
什么是重放攻击(Replyattack)?重放攻击，也称为回放攻击，是一种网络攻击方式。重放攻击是一种中间人攻击，攻击者通过截获合法的数据传输并重新发送它们来欺骗接收方，让接收方误以为是合法的消息。重放攻击是非常常见的，因为在拦截了来自网络的传输后，黑客不需要专门的专业知识来解密信息。重放攻击不仅限于信用卡交易，还可以采取多种形式，诈骗者可以通过有效的重放攻击来模仿真实用户并完成任何欺诈行为。重放
华为hcip备考内容尼莫有撒四华为开发语言
华为hcip备考内容文章目录华为hcip备考内容第一章·课程简介1-1课程简介1-1.1华为认证简介（数通；datacom）1-2什么是网络1-2.1前言1-2.2什么是网络1-3所有的网络都能访问互联网吗？1-4什么是园区网1-4.1什么是园区网？1-4.2网络三大类：1-5什么是运营商网络？1-6什么是数据中心网络？1-7什么是WAN、LAN?1-8什么是业务系统1-9什么是核心层、汇聚层、接
3月3日全球科技信息差：认知迷雾、数据坍缩与文明重构 Eqwaak00 信息差科技重构开发语言学习 python 开源软件
第一章量子加密下的“透明性暴政”1.1后量子时代的认知垄断中国"天河-量子"超级计算机集群突破1.6YFlops算力阈值，其构建的量子加密网络形成新型数字霸权：#量子加密数据流控制模型classQuantumDataFlow:def__init__(self,node_capacity):self.trust_score={#基于区块链的行为信用评分'北美':0.72,'欧盟':0.68,'东盟'
java责任链模式 3213213333332132 java 责任链模式村民告县长
责任链模式，通常就是一个请求从最低级开始往上层层的请求，当在某一层满足条件时，请求将被处理，当请求到最高层仍未满足时，则请求不会被处理。就是一个请求在这个链条的责任范围内，会被相应的处理，如果超出链条的责任范围外，请求不会被相应的处理。下面代码模拟这样的效果：创建一个政府抽象类,方便所有的具体政府部门继承它。 package 责任链模式; /** *
linux、mysql、nginx、tomcat 性能参数优化 ronin47
一、linux 系统内核参数 /etc/sysctl.conf文件常用参数 net.core.netdev_max_backlog = 32768 #允许送到队列的数据包的最大数目 net.core.rmem_max = 8388608 #SOCKET读缓存区大小 net.core.wmem_max = 8388608 #SOCKET写缓存区大
php命令行界面 dcj3sjt126com PHP cli
常用选项 php -v php -i PHP安装的有关信息 php -h 访问帮助文件 php -m 列出编译到当前PHP安装的所有模块执行一段代码 php -r 'echo "hello, world!";' php -r 'echo "Hello, World!\n";' php -r '$ts = filemtime("
Filter&Session 171815164 session
Filter HttpServletRequest requ = (HttpServletRequest) req; HttpSession session = requ.getSession(); if (session.getAttribute("admin") == null) { PrintWriter out = res.ge
连接池与Spring,Hibernate结合 g21121 Hibernate
前几篇关于Java连接池的介绍都是基于Java应用的，而我们常用的场景是与Spring和ORM框架结合，下面就利用实例学习一下这方面的配置。 1.下载相关内容： &nb
[简单]mybatis判断数字类型 53873039oycg mybatis
昨天同事反馈mybatis保存不了int类型的属性,一直报错，错误信息如下: Caused by: java.lang.NumberFormatException: For input string: "null" at sun.mis
项目启动时或者启动后ava.lang.OutOfMemoryError: PermGen space 程序员是怎么炼成的 eclipse jvm tomcat catalina.sh eclipse.ini
在启动比较大的项目时，因为存在大量的jsp页面，所以在编译的时候会生成很多的.class文件，.class文件是都会被加载到jvm的方法区中，如果要加载的class文件很多，就会出现方法区溢出异常 java.lang.OutOfMemoryError: PermGen space. 解决办法是点击eclipse里的tomcat，在
我的crm小结 aijuans crm
各种原因吧，crm今天才完了。主要是接触了几个新技术： Struts2、poi、ibatis这几个都是以前的项目中用过的。 Jsf、tapestry是这次新接触的，都是界面层的框架，用起来也不难。思路和struts不太一样，传说比较简单方便。不过个人感觉还是struts用着顺手啊，当然springmvc也很顺手，不知道是因为习惯还是什么。jsf和tapestry应用的时候需要知道他们的标签、主
spring里配置使用hibernate的二级缓存几步 antonyup_2006 java spring Hibernate xml cache
．在spring的配置文件中 applicationContent.xml，hibernate部分加入 xml 代码 <prop key="hibernate.cache.provider_class">org.hibernate.cache.EhCacheProvider</prop> <prop key="hi
JAVA基础面试题百合不是茶抽象实现接口 String类接口继承抽象类继承实体类自定义异常
/* * 栈（stack）：主要保存基本类型（或者叫内置类型）（char、byte、short、 *int、long、 float、double、boolean）和对象的引用，数据可以共享，速度仅次于 * 寄存器（register），快于堆。堆（heap）：用于存储对象。 */ &
让sqlmap文件 "继承" 起来 bijian1013 java ibatis sqlmap
多个项目中使用ibatis , 和数据库表对应的 sqlmap文件（增删改查等基本语句)，dao, pojo 都是由工具自动生成的, 现在将这些自动生成的文件放在一个单独的工程中，其它项目工程中通过jar包来引用，并通过"继承"为基础的sqlmap文件，dao,pojo 添加新的方法来满足项
精通Oracle10编程SQL(13)开发触发器 bijian1013 oracle 数据库 plsql
/* *开发触发器 */ --得到日期是周几 select to_char(sysdate+4,'DY','nls_date_language=AMERICAN') from dual; select to_char(sysdate,'DY','nls_date_language=AMERICAN') from dual; --建立BEFORE语句触发器 CREATE O
【EhCache三】EhCache查询 bit1129 ehcache
本文介绍EhCache查询缓存中数据，EhCache提供了类似Hibernate的查询API，可以按照给定的条件进行查询。要对EhCache进行查询，需要在ehcache.xml中设定要查询的属性数据准备 @Before public void setUp() { //加载EhCache配置文件 Inpu
CXF框架入门实例白糖_ spring Web 框架 webservice servlet
CXF是apache旗下的开源框架，由Celtix + XFire这两门经典的框架合成，是一套非常流行的web service框架。它提供了JAX-WS的全面支持，并且可以根据实际项目的需要，采用代码优先（Code First）或者 WSDL 优先（WSDL First）来轻松地实现 Web Services 的发布和使用，同时它能与spring进行完美结合。在apache cxf官网提供
angular.equals boyitech AngularJS AngularJS API AnguarJS 中文API angular.equals
angular.equals 描述: 比较两个值或者两个对象是不是相等。还支持值的类型，正则表达式和数组的比较。两个值或对象被认为是相等的前提条件是以下的情况至少能满足一项：两个值或者对象能通过=== （恒等）的比较两个值或者对象是同样类型，并且他们的属性都能通过angular
java-腾讯暑期实习生-输入一个数组A[1,2,...n]，求输入B，使得数组B中的第i个数字B[i]=A[0]*A[1]*...*A[i-1]*A[i+1] bylijinnan java
这道题的具体思路请参看何海涛的微博：http://weibo.com/zhedahht import java.math.BigInteger; import java.util.Arrays; public class CreateBFromATencent { /** * 题目：输入一个数组A[1,2,...n]，求输入B，使得数组B中的第i个数字B[i]=A
FastDFS 的安装和配置修订版 Chen.H linux fastDFS 分布式文件系统
FastDFS Home:http://code.google.com/p/fastdfs/ 1. 安装 http://code.google.com/p/fastdfs/wiki/Setup http://hi.baidu.com/leolance/blog/item/3c273327978ae55f93580703.html 安装libevent (对libevent的版本要求为1.4.
[强人工智能]拓扑扫描与自适应构造器 comsci 人工智能
当我们面对一个有限拓扑网络的时候,在对已知的拓扑结构进行分析之后,发现在连通点之后,还存在若干个子网络,且这些网络的结构是未知的,数据库中并未存在这些网络的拓扑结构数据....这个时候,我们该怎么办呢? 那么,现在我们必须设计新的模块和代码包来处理上面的问题
oracle merge into的用法 daizj oracle sql merget into
Oracle中merge into的使用 http://blog.csdn.net/yuzhic/article/details/1896878 http://blog.csdn.net/macle2010/article/details/5980965 该命令使用一条语句从一个或者多个数据源中完成对表的更新和插入数据. ORACLE 9i 中，使用此命令必须同时指定UPDATE 和INSE
不适合使用Hadoop的场景 datamachine hadoop
转自：http://dev.yesky.com/296/35381296.shtml。　　Hadoop通常被认定是能够帮助你解决所有问题的唯一方案。当人们提到“大数据”或是“数据分析”等相关问题的时候，会听到脱口而出的回答：Hadoop! 实际上Hadoop被设计和建造出来，是用来解决一系列特定问题的。对某些问题来说，Hadoop至多算是一个不好的选择，对另一些问题来说，选择Ha
YII findAll的用法 dcj3sjt126com yii
看文档比较糊涂，其实挺简单的： $predictions=Prediction::model()->findAll("uid=:uid",array(":uid"=>10)); 第一个参数是选择条件：”uid=10″。其中:uid是一个占位符，在后面的array(“:uid”=>10)对齐进行了赋值；更完善的查询需要
vim 常用 NERDTree 快捷键 dcj3sjt126com vim
下面给大家整理了一些vim NERDTree的常用快捷键了，这里几乎包括了所有的快捷键了，希望文章对各位会带来帮助。切换工作台和目录 ctrl + w + h 光标 focus 左侧树形目录ctrl + w + l 光标 focus 右侧文件显示窗口ctrl + w + w 光标自动在左右侧窗口切换ctrl + w + r 移动当前窗口的布局位置 o 在已有窗口中打开文件、目录或书签，并跳
Java把目录下的文件打印出来蕃薯耀列出目录下的文件文件夹下面的文件目录下的文件
Java把目录下的文件打印出来 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 蕃薯耀 2015年7月11日 11:02:
linux远程桌面----VNCServer与rdesktop hanqunfeng Desktop
windows远程桌面到linux，需要在linux上安装vncserver，并开启vnc服务，同时需要在windows下使用vnc-viewer访问Linux。vncserver同时支持linux远程桌面到linux。 linux远程桌面到windows，需要在linux上安装rdesktop，同时开启windows的远程桌面访问。下面分别介绍，以windo
guava中的join和split功能 jackyrong java
guava库中，包含了很好的join和split的功能，例子如下： 1）将LIST转换为使用字符串连接的字符串 List<String> names = Lists.newArrayList("John", "Jane", "Adam", "Tom");
Web开发技术十年发展历程 lampcy android Web 浏览器 html5
回顾web开发技术这十年发展历程： Ajax 03年的时候我上六年级，那时候网吧刚在小县城的角落萌生。传奇，大话西游第一代网游一时风靡。我抱着试一试的心态给了网吧老板两块钱想申请个号玩玩，然后接下来的一个小时我一直在，注，册，账，号。彼时网吧用的512k的带宽，注册的时候，填了一堆信息，提交，页面跳转，嘣，”您填写的信息有误，请重填”。然后跳转回注册页面，以此循环。我现在时常想，如果当时a
架构师之mima-----------------mina的非NIO控制IOBuffer(说得比较好) nannan408 buffer
1.前言。如题。 2.代码。 IoService IoService是一个接口，有两种实现：IoAcceptor和IoConnector；其中IoAcceptor是针对Server端的实现，IoConnector是针对Client端的实现；IoService的职责包括： 1、监听器管理 2、IoHandler 3、IoSession
ORA-00054:resource busy and acquire with NOWAIT specified Everyday都不同 oracle session Lock
[Oracle] 今天对一个数据量很大的表进行操作时，出现如题所示的异常。此时表明数据库的事务处于“忙”的状态，而且被lock了，所以必须先关闭占用的session。 step1，查看被lock的session： select t2.username, t2.sid, t2.serial#, t2.logon_time from v$locked_obj
javascript学习笔记 tntxia JavaScript
javascript里面有6种基本类型的值:number、string、boolean、object、function和undefined。number：就是数字值，包括整数、小数、NaN、正负无穷。string:字符串类型、单双引号引起来的内容。boolean:true、false object:表示所有的javascript对象，不用多说function:我们熟悉的方法，也就是
Java enum的用法详解 xieke90 enum 枚举
Java中枚举实现的分析：示例： public static enum SEVERITY{ INFO,WARN,ERROR } enum很像特殊的class，实际上enum声明定义的类型就是一个类。而这些类都是类库中Enum类的子类 (java.l