Zoey_Wu1016

Build a Credit Scoring Card Model with Python

1. Background Introduction

The credit scorecard model has been widely used in the field of credit risk assessment and financial risk control. The principle is to discretize the model variable WOE encoding method and then use the logistic regression model to perform a generalized linear model of the two categorical variables.

This paper will provide reference for risk control of financial lending institutions, including data preprocessing, feature variable selection, variable WOE coding discretization, logistic regression model development evaluation, credit scorecard and automatic scoring system creation.。

2. Data Description

The data comes from Kaggle’s Give Me Some Credit competition, where the cs-training.csv file has 150,000 sample data and contains 11 variables, as shown in the table below.

3. Import Data

3.1. read data

import numpy as np
import pandas as pd
data=pd.read_csv('cs-training.csv')
data=data.iloc[:,1:]
data.head()

3.2. overview data

data.shape

(150000, 11)

data.describe()

data.info()

It can be seen from the results that the feature quantity MonthlyIncome is missing a large number of 29,731; while the NumberOfDependts is missing a small number of 3,924.

4. Data Preprocessing

4.1. Missing Value Processing

Data loss is very common in real-world problems, which leads to some analysis methods that cannot handle missing values. Therefore, the first step in the development of credit risk rating model requires missing value processing.

Methods for missing value processing include the following:
(1) Directly delete samples with missing values.
(2) Fill in the missing values based on the similarity between the samples.
(3) Fill in the missing values based on the correlation between the variables.

The missing rate of the variable MonthlyIncome is relatively large, so we fill in the missing values according to the correlation between the variables and fill them with the random forest method.

from sklearn.ensemble import RandomForestRegressor
def add_missing(df):
    process_df=df.ix[:,[5,0,1,2,3,4,6,7,8,9]]
    #Divided into two parts: known feature values and position feature values
    known=process_df[process_df['MonthlyIncome'].notnull()].as_matrix()
    unknown=process_df[process_df['MonthlyIncome'].isnull()].as_matrix()
    Y=known[:,0]
    X=known[:,1:]
    rfr=RandomForestRegressor(random_state=0,n_estimators=200,max_depth=3,n_jobs=-1)
    rfr.fit(X,Y)
    predicted=rfr.predict(unknown[:,1:])
    df.loc[df['MonthlyIncome'].isnull(),'MonthlyIncome']=predicted
    return df
ta=add_missing(data)

The NumberOfDependents variable has fewer missing values and can be deleted directly without affecting the overall model. In addition, after the missing values are processed, the duplicates should be deleted.

data=data.dropna()
data=data.drop_duplicates()

4.2. Outlier Processing

Outliers are values that deviate significantly from most sampled data. For example, when an individual’s age is greater than 100 or less than 0, the value is generally considered to be an outlier. , We usually use outlier detection to find outliers in the sample population. Outlier detection methods include univariate outlier detection, local outlier detection, and outlier detection based on clustering.

In this data set, univariate outlier detection is used to determine outliers, and a box plot is used. For the age variable, we think that it is an abnormal value greater than 100 years old and less than or equal to 0 years old. It can be seen from the box plot that there are not many abnormal value samples, so we can delete it directly.

import matplotlib.pyplot as plt
%matplotlib inline
dataage=data[['age']]
dataage.boxplot()

data_box = data.iloc[:,[3]]
data_box.boxplot()

data_box = data.iloc[:,[7]]
data_box.boxplot()

data_box = data.iloc[:,[9]]
data_box.boxplot()

As we can see from the above four figures, four variables including age, NumberOfTime30-59, NumberOfTimes90, and NumberOfTime60-89 all have abnormal values. If we excluding the 96, 98 values of one of these three variables, the other variables will be removed accordingly.

The good customer in the data set is 0, and the default customer is 1. Considering the normal understanding, the customer who can perform the contract and pay interest is 1 and negated.

data=data[data['age']>0]
data=data[data['NumberOfTimes90DaysLate']<90]
data['SeriousDlqin2yrs']=1-data['SeriousDlqin2yrs']

4.3. Data Segmentation

In order to verify the fitting effect of the model, we need to segment the data set into a training set and a test set.

from sklearn.cross_validation import train_test_split
Y=data['SeriousDlqin2yrs']
X=data.ix[:,1:]
X_train,X_test,Y_train,Y_test=train_test_split(X,Y,test_size=0.3,random_state=0)
train=pd.concat([Y_train,X_train],axis=1)
test=pd.concat([Y_test,X_test],axis=1)
train.to_csv('TrainData.csv',index=False)
test.to_csv('TestData.csv',index=False)

4.4. Exploratory Data Analysis

Before building a model, we typically perform Exploratory Data Analysis on existing data. EDA refers to the exploration of existing data (especially the original data obtained from surveys or observations) with as few a priori assumptions as possible. Commonly used exploratory data analysis methods are: histogram, scatter plot and box plot. For example, we analyze the feature age and monthly income as follows.

import seaborn as sns
age=data['age']
sns.distplot(age)

The distribution of age is roughly normal, consistent with statistical analysis assumptions.

mi=data[data['MonthlyIncome']<50000]['MonthlyIncome']
sns.distplot(mi)

In order to make the graphics more intuitive, set the x-axis range to less than 50,000. The distribution of monthly income is roughly normal distribution, in line with statistical analysis assumptions.

This kind of chart analysis can also be done for other variables.

5. Variable Selection

Feature variable selection (sorting) is very important for data analysis and machine learning practitioners. Good feature selection can improve the performance of the model, and help us understand the characteristics of the data and the underlying structure. This will play an important role in further improving the model and algorithm. In this paper, we use the variable selection method of the credit scoring model to determine whether the indicator is economically meaningful through the WOE analysis method, that is, by comparing the default probability of the indicator bin and the corresponding bin.

First we discretize the variables (binning).

5.1. Variable Binning

Discretization of continuous variables—there commonly uses equidistant segments, equal-depth segments, and optimal segments in the development of credit scorecards. Firstly, the optimal segmentation of continuous variables is selected. When the distribution of continuous variables does not meet the requirements of optimal segmentation, then the continuous variables are equally segmented.

The code for the optimal bin is as follows:

import scipy.stats as stats
def mono_bin(Y,X,n):
    good=Y.sum()
    bad=Y.count()-good
    r=0
    while np.abs(r)<1:
        d1=pd.DataFrame({'X':X,'Y':Y,'Bucket':pd.qcut(X,n)})
        d2=d1.groupby(['Bucket'])
        r,p=stats.spearmanr(d2['X'].mean(),d2['Y'].mean())
        n=n-1
    print(r,n)
    d3=pd.DataFrame(d2['X'].min(),columns=['min'])
    d3['min']=d2['X'].min()
    d3['max']=d2['X'].max()
    d3['sum']=d2['Y'].sum()
    d3['total']=d2['Y'].count()
    d3['rate']=d2['Y'].mean()
    d3['goodattribute']=d3['sum']/good
    d3['badattribute']=(d3['total']-d3['sum'])/bad
    d3['woe']=np.log(d3['goodattribute']/d3['badattribute'])
    iv=((d3['goodattribute']-d3['badattribute'])*d3['woe']).sum()
    d4=d3.sort_index(by='min')
    woe=list(d4['woe'].values)
    print(d4)
    print('-'*30)
    cut=[]
    cut.append(float('-inf'))
    for i in range(1,n+1):
        qua=X.quantile(i/(n+1))
        cut.append(round(qua,4))
    cut.append(float('inf'))
    return d4,iv,woe,cut
dfx1,ivx1,woex1,cutx1=mono_bin(train['SeriousDlqin2yrs'],train['RevolvingUtilizationOfUnsecuredLines'],n=10)
dfx2, ivx2,woex2,cutx2=mono_bin(train['SeriousDlqin2yrs'], train['age'], n=10)
dfx4, ivx4,woex4,cutx4 =mono_bin(train['SeriousDlqin2yrs'],train['DebtRatio'], n=20)
dfx5, ivx5,woex5,cutx5=mono_bin(train['SeriousDlqin2yrs'], train['MonthlyIncome'], n=10)

For variables that are not optimally binned, the bins are as follows:

def self_bin(X,Y,cat):
    good=Y.sum()
    bad=Y.count()-good
    d1=pd.DataFrame({'X':X,'Y':Y,'Bucket':pd.cut(X,cat)})
    d2=d1.groupby(['Bucket'])
    d3=pd.DataFrame(d2['X'].min(),columns=['min'])
    d3['min']=d2['X'].min()
    d3['max']=d2['X'].max()
    d3['sum']=d2['Y'].sum()
    d3['total']=d2['Y'].count()
    d3['rate']=d2['Y'].mean()
    d3['goodattribute']=d3['sum']/good
    d3['badattribute']=(d3['total']-d3['sum'])/bad
    d3['woe']=np.log(d3['goodattribute']/d3['badattribute'])
    iv=((d3['goodattribute']-d3['badattribute'])*d3['woe']).sum()
    d4=d3.sort_index(by='min')
    print(d4)
    print('-'*40)
    woe=list(d3['woe'].values)
    return d4,iv,woe
ninf = float('-inf')
pinf = float('inf')
cutx3 = [ninf,0,1,3,5,pinf]
cutx6 = [ninf, 1, 2, 3, 5, pinf]
cutx7 = [ninf, 0, 1, 3, 5, pinf]
cutx8 = [ninf, 0,1,2, 3, pinf]
cutx9 = [ninf, 0, 1, 3, pinf]
cutx10 = [ninf, 0, 1, 2, 3, 5, pinf]
dfx3,ivx3,woex3=self_bin(train['SeriousDlqin2yrs'],train['NumberOfTime30-59DaysPastDueNotWorse'],cutx3)
dfx6, ivx6 ,woex6= self_bin(train['SeriousDlqin2yrs'], train['NumberOfOpenCreditLinesAndLoans'], cutx6)
dfx7, ivx7,woex7 = self_bin(train['SeriousDlqin2yrs'], train['NumberOfTimes90DaysLate'], cutx7)
dfx8, ivx8,woex8 = self_bin(train['SeriousDlqin2yrs'], train['NumberRealEstateLoansOrLines'], cutx8)
dfx9, ivx9,woex9 = self_bin(train['SeriousDlqin2yrs'], train['NumberOfTime60-89DaysPastDueNotWorse'], cutx9)
dfx10, ivx10,woex10 = self_bin(train['SeriousDlqin2yrs'], train['NumberOfDependents'], cutx10)

5.2. WOE

The full name of WOE is “Weight of Evidence”, which is the weight of evidence. The WoE analysis is to bin the indicator, calculate the WoE value of each gear position and observe the trend of the WoE value as a function of the indicator.

The mathematical definition of WoE is: woe=ln(goodattribute/badattribute). The goodattribute calculation method is the number of good customers in each box / the total number of good customers in the data set; badattribute is calculated as the number of bad customers in each box / the total number of bad customers in the data set.

Different characteristics, after the optimal binning, different number of boxes are generated, and each interval corresponds to a woe value. The last woex1 is the woe in the list with the x1 feature.

5.3. IV Strainer

The IV of the feature is a value, and its formula is: IV=sum((goodattribute-badattribute)*woe), the full name of the IV is Infomation Value, which is generally used to compare the predictive ability of the feature. IV0.1 above has predictive ability, and above 0.2 is more predictive. Generate an IV diagram with the following code:

import matplotlib.pyplot as plt
%matplotlib inline
ivall=pd.Series([ivx1,ivx2,ivx3,ivx4,ivx5,ivx6,ivx7,ivx8,ivx9,ivx10],index=['x1','x2','x3','x4','x5','x6','x7','x8','x9','x10'])
fig=plt.figure()
ax1=fig.add_subplot(111)
ivall.plot(kind='bar',ax=ax1)
plt.show()

As can be seen from the above figure, the DebtRatio, MonthlyIncome, NumberOfOpenCreditLinesAndLoans, NumberRealEstateLoansOrLines, and NumberOfDependents variables have significantly lower IV values and poor prediction capabilities, so they are deleted.

5.4. Variable Correlation Analysis

Check the correlation between the variables with the cleaned data. Note that the correlation analysis here is only a preliminary check, and the WOE of the model (evidence weight) is further examined as the basis for the variable screening. We use the seaborn package in Python to call the heatmap() drawing function to draw. The implementation code is as follows:

import seaborn as sns
corr=data.corr()
xticks=['x0','x1','x2','x3','x4','x5','x6','x7','x8','x9','x10']
fig=plt.figure()
fig.set_size_inches(16,6)
ax1=fig.add_subplot(111)
sns.heatmap(corr,vmin=-1,vmax=1,cmap='hsv',annot=True,square=True)
ax1.set_xticklabels(xticks,rotation=0)
plt.show()

As can be seen from the above figure:
(1) The correlation between variables is very small, and there is no multicollinearity problem. If there is multiple collinearity, there may be two variables that are highly correlated and need to be reduced or eliminated.
(2) It can be seen that the three characteristics of NumberOfTime30-59DaysPastDueNotWorse, NumberOfTimes90DaysLate and NumberOfTime60-89DaysPastDueNotWorse have strong correlation with the value of SeriousDlqin2yrs (dependent variable) we want to predict.

6. Model Analysis

The Weight of Evidence (WOE) transformation can transform the Logistic regression model into a standard scorecard format. Before building the model, we need to convert the filtered variables to WoE values for credit scoring.

6.1. Woe Conversion

We have been able to get the binned data and woe data for each variable, only need to be replaced according to the variable data, the implementation code is as follows:

data=pd.read_csv('TrainData.csv')
from pandas import Series
def replace_woe(series,cut,woe):
    list=[]
    i=0
    while i < len(series):
        valuek=series[i]
        j=len(cut)-2
        m=len(cut)-2
        while j>=0:
            if valuek<=cut[j]:
                j=-1
            else:
                j-=1
                m-=1
        list.append(woe[m])
        i+=1
    return list
data['RevolvingUtilizationOfUnsecuredLines'] = Series(replace_woe(data['RevolvingUtilizationOfUnsecuredLines'], cutx1, woex1))
data['age'] = Series(replace_woe(data['age'], cutx2, woex2))
data['NumberOfTime30-59DaysPastDueNotWorse'] = Series(replace_woe(data['NumberOfTime30-59DaysPastDueNotWorse'], cutx3, woex3))
data['DebtRatio'] = Series(replace_woe(data['DebtRatio'], cutx4, woex4))
data['MonthlyIncome'] = Series(replace_woe(data['MonthlyIncome'], cutx5, woex5))
data['NumberOfOpenCreditLinesAndLoans'] = Series(replace_woe(data['NumberOfOpenCreditLinesAndLoans'], cutx6, woex6))
data['NumberOfTimes90DaysLate'] = Series(replace_woe(data['NumberOfTimes90DaysLate'], cutx7, woex7))
data['NumberRealEstateLoansOrLines'] = Series(replace_woe(data['NumberRealEstateLoansOrLines'], cutx8, woex8))
data['NumberOfTime60-89DaysPastDueNotWorse'] = Series(replace_woe(data['NumberOfTime60-89DaysPastDueNotWorse'], cutx9, woex9))
data['NumberOfDependents'] = Series(replace_woe(data['NumberOfDependents'], cutx10, woex10))

test= pd.read_csv('TestDate.csv')
test['RevolvingUtilizationOfUnsecuredLines'] = Series(replace_woe(test['RevolvingUtilizationOfUnsecuredLines'], cutx1, woex1))
test['age'] = Series(replace_woe(test['age'], cutx2, woex2))
test['NumberOfTime30-59DaysPastDueNotWorse'] = Series(replace_woe(test['NumberOfTime30-59DaysPastDueNotWorse'], cutx3, woex3))
test['DebtRatio'] = Series(replace_woe(test['DebtRatio'], cutx4, woex4))
test['MonthlyIncome'] = Series(replace_woe(test['MonthlyIncome'], cutx5, woex5))
test['NumberOfOpenCreditLinesAndLoans'] = Series(replace_woe(test['NumberOfOpenCreditLinesAndLoans'], cutx6, woex6))
test['NumberOfTimes90DaysLate'] = Series(replace_woe(test['NumberOfTimes90DaysLate'], cutx7, woex7))
test['NumberRealEstateLoansOrLines'] = Series(replace_woe(test['NumberRealEstateLoansOrLines'], cutx8, woex8))
test['NumberOfTime60-89DaysPastDueNotWorse'] = Series(replace_woe(test['NumberOfTime60-89DaysPastDueNotWorse'], cutx9, woex9))
test['NumberOfDependents'] = Series(replace_woe(test['NumberOfDependents'], cutx10, woex10))

6.2. Logistic Model Building

import statsmodels.api as sm
Y=data['SeriousDlqin2yrs']
X=data.drop(['SeriousDlqin2yrs','DebtRatio','MonthlyIncome', 'NumberOfOpenCreditLinesAndLoans','NumberRealEstateLoansOrLines','NumberOfDependents'],axis=1)
X1=sm.add_constant(X)
logit=sm.Logit(Y,X1)
result=logit.fit()
print(result.summary2())

Assuming that the significance level is set to 0.01, as can be seen from the above figure, the logistic regression variables have passed the significance test to meet the requirements.

6.3. Model Verification

To verify the predictive power of this model, we evaluate the fit of the model through the ROC curve and AUC.

from sklearn.metrics import roc_curve,auc
import matplotlib
matplotlib.rcParams['font.sans-serif'] = ['FangSong']  
matplotlib.rcParams['axes.unicode_minus'] = False
Y_test=test['SeriousDlqin2yrs']
X_test=test.drop(['SeriousDlqin2yrs','DebtRatio','MonthlyIncome', 'NumberOfOpenCreditLinesAndLoans','NumberRealEstateLoansOrLines','NumberOfDependents'],axis=1)

X2=sm.add_constant(X_test)
resu=result.predict(X2)
fpr,tpr,threshold=roc_curve(Y_test,resu)

rocauc=auc(fpr,tpr)
plt.plot(fpr,tpr,'b',label='AUC=%0.2f'% rocauc)
plt.legend(loc='lower right')
plt.plot([0,1],[0,1],'r--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.show()

As can be seen from the above figure, the AUC value is 0.85, indicating that the prediction ability of the model is better and the correct rate is higher. It is proved that using the current five characteristics, it is effective to form a part of the score of the credit score card, and the prediction ability is better.

7. Credit Scoring Card Construction

Convert the Logistic model to the form of a standard scoring card.

Before setting up a standard scoring card, we need to select several scoring card parameters: base score, PDO (the score doubled), and the good/worst ratio. We take 600 points for the base score, PDO is 20 (the ratio is 20 times for every 20 points), and the ratio is 20 for good or bad.

Total personal score = base score + score for each part

def get_score(coe,woe,p):
    scores=[]
    for w in woe:
        score=round(coe*w*p,0)
        scores.append(score)
    return scores

def compute_score(series,cut,scores):
    i=0
    list=[]
    while i=0:
            if value>=cut[j]:
                j=-1
            else:
                j=j-1
                m=m-1
        list.append(scores[m])
        i=i+1  
    return list

coe=[9.738849,0.638002,0.505995,1.032246,1.790041,1.131956]    
import math
p = 20 / math.log(2)     
q = 600 - 20 * math.log(20) / math.log(2)    
basescore = round(q + p * coe[0], 0)

x1 = get_score(coe[1], woex1, p)
x2 = get_score(coe[2], woex2, p)
x3 = get_score(coe[3], woex3, p)
x7 = get_score(coe[4], woex7, p)
x9 = get_score(coe[5], woex9, p)

print(x1)
print(x2)
print(x3)
print(x7)
print(x9)
test1=pd.read_csv('TestData.csv')

test1['BaseScore']=Series(np.zeros(len(test1))+basescore)
test1['x1']=Series(compute_score(test1['RevolvingUtilizationOfUnsecuredLines'],cutx1,x1))
test1['x2'] = Series(compute_score(test1['age'], cutx2, x2))
test1['x3'] = Series(compute_score(test1['NumberOfTime30-59DaysPastDueNotWorse'], cutx3, x3))
test1['x7'] = Series(compute_score(test1['NumberOfTimes90DaysLate'], cutx7, x7))
test1['x9'] = Series(compute_score(test1['NumberOfTime60-89DaysPastDueNotWorse'], cutx9, x9))
test1['score']= test1['BaseScore']+test1['x1']+test1['x2']+test1['x3']+test1['x7']+test1['x9']
test1.to_csv('scoredata.csv')

test1.loc[:,['SeriousDlqin2yrs','BaseScore', 'x1', 'x2', 'x3', 'x7', 'x9', 'score']].head()

This paper uses the random forest algorithm to fit the missing values by using data on the Kaggle, the combination of the credit scorecard and the data preprocessing, variable selection, modeling analysis and prediction. The pandas package cleans up the data and visualizes the data using matplotlib, seaborn drawing packages, and uses a logistic regression model. Finally, a simple credit scoring system is created using some of the features verified by the model.

7. References

7.1. Modeling Analysis of Credit Score Card Based on Python——CSDN

QQ群采集助手，精准引流必备神器 2401_87347160 其他经验分享
功能概述微信群查找与筛选工具是一款专为微信用户设计的辅助工具，它通过关键词搜索功能，帮助用户快速找到相关的微信群，并提供筛选是否需要验证的群组的功能。主要功能关键词搜索：用户可以输入关键词，工具将自动查找包含该关键词的微信群。筛选功能：工具提供筛选机制，用户可以选择是否只显示需要验证或不需要验证的群组。精准引流：通过上述功能，用户可以更精准地找到目标群组，进行有效的引流操作。3.设备需求该工具可以
机器学习与深度学习间关系与区别 ℒℴѵℯ心·动ꦿ໊ོ꫞ 人工智能学习深度学习 python
一、机器学习概述定义机器学习（MachineLearning,ML）是一种通过数据驱动的方法，利用统计学和计算算法来训练模型，使计算机能够从数据中学习并自动进行预测或决策。机器学习通过分析大量数据样本，识别其中的模式和规律，从而对新的数据进行判断。其核心在于通过训练过程，让模型不断优化和提升其预测准确性。主要类型1.监督学习（SupervisedLearning）监督学习是指在训练数据集中包含输入
2021-08-26 影幽
在生活中，女人与男人的感悟往往有所不同。人生最大的舞台就是生活，大幕随时都可能拉开，关键是你愿不愿意表演都无法躲避。在生活中，遇事不要急躁，不要急于下结论，尤其生气时不要做决断，要学会换位思考，大事化小小事化了，把复杂的事情尽量简单处理，千万不要把简单的事情复杂化。永远不要扭曲，别人善意，无药可救。昨天是张过期的支票，明天是张信用卡，只有今天才是现金，要善加利用！执着的攀登者不必去与别人比较自己的
消息中间件有哪些常见类型 xmh-sxh-1314 java
消息中间件根据其设计理念和用途，可以大致分为以下几种常见类型：点对点消息队列（Point-to-PointMessagingQueues）：在这种模型中，消息被发送到特定的队列中，消费者从队列中取出并处理消息。队列中的消息只能被一个消费者消费，消费后即被删除。常见的实现包括IBM的MQSeries、RabbitMQ的部分使用场景等。适用于任务分发、负载均衡等场景。发布/订阅消息模型（Pub/Sub
LLM 词汇表落难Coder LLMs NLP 大语言模型大模型 llama 人工智能
Contextwindow“上下文窗口”是指语言模型在生成新文本时能够回溯和参考的文本量。这不同于语言模型训练时所使用的大量数据集，而是代表了模型的“工作记忆”。较大的上下文窗口可以让模型理解和响应更复杂和更长的提示，而较小的上下文窗口可能会限制模型处理较长提示或在长时间对话中保持连贯性的能力。Fine-tuning微调是使用额外的数据进一步训练预训练语言模型的过程。这使得模型开始表示和模仿微调数
探索OpenAI和LangChain的适配器集成：轻松切换模型提供商 nseejrukjhad langchain easyui 前端 python
#探索OpenAI和LangChain的适配器集成：轻松切换模型提供商##引言在人工智能和自然语言处理的世界中，OpenAI的模型提供了强大的能力。然而，随着技术的发展，许多人开始探索其他模型以满足特定需求。LangChain作为一个强大的工具，集成了多种模型提供商，通过提供适配器，简化了不同模型之间的转换。本篇文章将介绍如何使用LangChain的适配器与OpenAI集成，以便轻松切换模型提供商
使用Apify加载Twitter消息以进行微调的完整指南 nseejrukjhad twitter easyui 前端 python
#使用Apify加载Twitter消息以进行微调的完整指南##引言在自然语言处理领域，微调模型以适应特定任务是提升模型性能的常见方法。本文将介绍如何使用Apify从Twitter导出聊天信息，以便进一步进行微调。##主要内容###使用Apify导出推文首先，我们需要从Twitter导出推文。Apify可以帮助我们做到这一点。通过Apify的强大功能，我们可以批量抓取和导出数据，适用于各类应用场景。
如何部分格式化提示模板:LangChain中的高级技巧 nseejrukjhad langchain java 服务器 python
标题:如何部分格式化提示模板:LangChain中的高级技巧内容:如何部分格式化提示模板:LangChain中的高级技巧引言在使用大型语言模型(LLM)时,提示工程是一个关键环节。LangChain提供了强大的提示模板功能,让我们能更灵活地构建和管理提示。本文将介绍LangChain中一个高级特性-部分格式化提示模板,这个技巧可以让你的提示管理更加高效和灵活。什么是部分格式化提示模板?部分格式化提
人工智能时代，程序员如何保持核心竞争力？ jmoych 人工智能
随着AIGC（如chatgpt、midjourney、claude等）大语言模型接二连三的涌现，AI辅助编程工具日益普及，程序员的工作方式正在发生深刻变革。有人担心AI可能取代部分编程工作，也有人认为AI是提高效率的得力助手。面对这一趋势,程序员应该如何应对?是专注于某个领域深耕细作，还是广泛学习以适应快速变化的技术环境?又或者，我们是否应该将重点转向AI无法轻易替代的软技能？让我们一起探讨程序员
数据仓库——维度表一致性墨染丶eye 背诵数据仓库
数据仓库基础笔记思维导图已经整理完毕，完整连接为：数据仓库基础知识笔记思维导图维度一致性问题从逻辑层面来看，当一系列星型模型共享一组公共维度时，所涉及的维度称为一致性维度。当维度表存在不一致时，短期的成功难以弥补长期的错误。维度时确保不同过程中信息集成起来实现横向钻取货活动的关键。造成横向钻取失败的原因维度结构的差别，因为维度的差别，分析工作涉及的领域从简单到复杂，但是都是通过复杂的报表来弥补设计
【华为OD技术面试真题 - 技术面】- python八股文真题题库（1）算法大师华为od 面试 python
华为OD面试真题精选专栏：华为OD面试真题精选目录:2024华为OD面试手撕代码真题目录以及八股文真题目录文章目录华为OD面试真题精选1.数据预处理流程数据预处理的主要步骤工具和库2.介绍线性回归、逻辑回归模型线性回归（LinearRegression）模型形式：关键点：逻辑回归（LogisticRegression）模型形式：关键点：参数估计与评估：3.python浅拷贝及深拷贝浅拷贝（Shal
Xinference如何注册自定义模型玩人工智能的辣条哥人工智能 AI 大模型 Xinference
环境：Xinference问题描述：Xinference如何注册自定义模型解决方案：1.写个model_config.json，内容如下{"version":1,"context_length":2048,"model_name":"custom-llama-3","model_lang":["en","ch"],"model_ability":["generate","chat"],"model
网络编程基础记得开心一点啊网络
目录♫什么是网络编程♫Socket套接字♪什么是Socket套接字♪数据报套接字♪流套接字♫数据报套接字通信模型♪数据报套接字通讯模型♪DatagramSocket♪DatagramPacket♪实现UDP的服务端代码♪实现UDP的客户端代码♫流套接字通信模型♪流套接字通讯模型♪ServerSocket♪Socket♪实现TCP的服务端代码♪实现TCP的客户端代码♫什么是网络编程网络编程，指网络上
简单了解 JVM 记得开心一点啊 jvm
目录♫什么是JVM♫JVM的运行流程♫JVM运行时数据区♪虚拟机栈♪本地方法栈♪堆♪程序计数器♪方法区/元数据区♫类加载的过程♫双亲委派模型♫垃圾回收机制♫什么是JVMJVM是JavaVirtualMachine的简称，意为Java虚拟机。虚拟机是指通过软件模拟的具有完整硬件功能的、运行在一个完全隔离的环境中的完整计算机系统（如：JVM、VMwave、VirtualBox）。JVM和其他两个虚拟机
Python实现简单的机器学习算法 master_chenchengg python python 办公效率 python开发 IT
Python实现简单的机器学习算法开篇：初探机器学习的奇妙之旅搭建环境：一切从安装开始必备工具箱第一步：安装Anaconda和JupyterNotebook小贴士：如何配置Python环境变量算法初体验：从零开始的Python机器学习线性回归：让数据说话数据准备：从哪里找数据编码实战：Python实现线性回归模型评估：如何判断模型好坏逻辑回归：从分类开始理论入门：什么是逻辑回归代码实现：使用skl
推荐算法_隐语义-梯度下降 _feivirus_ 算法机器学习和数学推荐算法机器学习隐语义
importnumpyasnp1.模型实现"""inputrate_matrix:M行N列的评分矩阵，值为P*Q.P:初始化用户特征矩阵M*K.Q:初始化物品特征矩阵K*N.latent_feature_cnt:隐特征的向量个数max_iteration:最大迭代次数alpha:步长lamda:正则化系数output分解之后的P和Q"""defLFM_grad_desc(rate_matrix,l
BART&BERT Ambition_LAO 深度学习
BART和BERT都是基于Transformer架构的预训练语言模型。模型架构：BERT(BidirectionalEncoderRepresentationsfromTransformers)主要是一个编码器（Encoder）模型，它使用了Transformer的编码器部分来处理输入的文本，并生成文本的表示。BERT特别擅长理解语言的上下文，因为它在预训练阶段使用了掩码语言模型（MLM）任务，即
系统架构设计师需求分析篇二 AmHardy 软件架构设计师系统架构需求分析面向对象分析分析模型 UML和SysML
面向对象分析方法1.用例模型构建用例模型一般需要经历4个阶段：识别参与者：识别与系统交互的所有事物。合并需求获得用例：将需求分配给予其相关的参与者。细化用例描述：详细描述每个用例的功能。调整用例模型：优化用例之间的关系和结构，前三个阶段是必需的。2.用例图的三元素参与者：使用系统的用户或其他外部系统和设备。用例：系统所提供的服务。通信关联：参与者和用例之间的关系，或用例与用例之间的关系。3.识别参
如何用matlab灵活控制feko的求解 NingrLi matlab 开发语言
https://bbs.rfeda.cn/read.php?tid=3778Feko中的模型和求解设置等都可以通过editfeko进行设置，其文件存储为.pre文件，该文件可以用文本打开，因此，我们可以通过VB、VC、matlab等工具对.pre文件进行读写操作，以达到更灵活的使用feko。同样，对于.out文件，我们也可以进行读操作。熟练使用对.pre文件和.out文件的操作后，我们可以方便的计
计算机网络八股总结 Petrichorzncu 八股总结计算机网络笔记
这里写目录标题网络模型划分（五层和七层）及每一层的功能五层网络模型七层网络模型（OSI模型）==三次握手和四次挥手具体过程及原因==三次握手四次挥手TCP/IP协议组成==UDP协议与TCP/IP协议的区别==Http协议相关知识网络地址，子网掩码等相关计算网络模型划分（五层和七层）及每一层的功能五层网络模型应用层：负责处理网络应用程序，如电子邮件、文件传输和网页浏览。主要协议包括HTTP、FTP
yolov5＞onnx＞ncnn＞apk 图像处理大大大大大牛啊 opencv实战代码讲解 yolo onnx ncnn 安卓
一.yolov5pt模型转onnx条件：colabnotebookyolov51.安装环境!pipinstallonnx>=1.7.0#forONNXexport!pipinstallcoremltools==4.0#forCoreMLexport!pipinstallonnx-simplifier2.修改common.py在classFocus下面
免费的GPT可在线直接使用（一键收藏） kkai人工智能 gpt
1、LuminAI（https://kk.zlrxjh.top）LuminAI标志着一款融合了星辰大数据模型与文脉深度模型的先进知识增强型语言处理系统，旨在自然语言处理（NLP）的技术开发领域发光发热。此系统展现了卓越的语义把握与内容生成能力，轻松驾驭多样化的自然语言处理任务。VisionAI在NLP界的应用领域广泛，能够胜任从机器翻译、文本概要撰写、情绪分析到问答等众多任务。通过对大量文本数据的
融开心告诉你：银行怎么靠你的信用卡赚钱骊驰商学院
办卡送礼、刷卡返现……信用卡无论办卡还是用卡，福利不要太爽哦~~很多人出于种种原因，办理了多张信用卡。然而却不知道：只要你用信用卡消费，银行就已经开始用你的卡来赚钱了。银行怎么靠你的信用卡赚钱，骊驰融开心告诉你这些：银行怎么靠你的信用卡赚钱？1、信用卡利息收入。持卡人透支信用额度所支付的利息，这一部分是信用卡收入的主要部分。如持卡人逾期、还最低还款额时，利息就产生，日息万分之五哦~~2、信用卡年费
AI大模型的架构演进与最新发展季风泯灭的季节 AI大模型应用技术二人工智能架构
随着深度学习的发展，AI大模型（LargeLanguageModels,LLMs）在自然语言处理、计算机视觉等领域取得了革命性的进展。本文将详细探讨AI大模型的架构演进，包括从Transformer的提出到GPT、BERT、T5等模型的历史演变，并探讨这些模型的技术细节及其在现代人工智能中的核心作用。一、基础模型介绍：Transformer的核心原理Transformer架构的背景在Transfo
[实践应用] 深度学习之模型性能评估指标 YuanDaima2048 深度学习工具使用深度学习人工智能损失函数性能评估 pytorch python 机器学习
文章总览：YuanDaiMa2048博客文章总览深度学习之模型性能评估指标分类任务回归任务排序任务聚类任务生成任务其他介绍在机器学习和深度学习领域，评估模型性能是一项至关重要的任务。不同的学习任务需要不同的性能指标来衡量模型的有效性。以下是对一些常见任务及其相应的性能评估指标的详细解释和总结。分类任务分类任务是指模型需要将输入数据分配到预定义的类别或标签中。以下是分类任务中常用的性能指标：准确率(
[实践应用] 深度学习之优化器 YuanDaima2048 深度学习工具使用 pytorch 深度学习人工智能机器学习 python 优化器
文章总览：YuanDaiMa2048博客文章总览深度学习之优化器1.随机梯度下降（SGD）2.动量优化（Momentum）3.自适应梯度（Adagrad）4.自适应矩估计（Adam）5.RMSprop总结其他介绍在深度学习中，优化器用于更新模型的参数，以最小化损失函数。常见的优化函数有很多种，下面是几种主流的优化器及其特点、原理和PyTorch实现：1.随机梯度下降（SGD）原理:随机梯度下降通过
程序员如何在AI时代保持核心竞争力 nfgo chatgpt 人工智能
程序员如何在AI时代保持核心竞争力随着AIGC（如ChatGPT、MidJourney、Claude等）大语言模型的相继涌现，AI辅助编程工具逐渐普及，程序员的工作方式正在发生深刻的变革。AI不仅能够自动生成代码，还能优化、调试、甚至提出解决方案。这一趋势让许多人担心：AI会不会最终取代部分编程工作？然而，也有人认为AI是提升效率的得力助手。那么，程序员在这个AI崛起的时代该如何应对？是专注某个领
生成式地图制图 Bwywb_3 深度学习机器学习深度学习生成对抗网络
生成式地图制图（GenerativeCartography）是一种利用生成式算法和人工智能技术自动创建地图的技术。它结合了传统的地理信息系统（GIS）技术与现代生成模型（如深度学习、GANs等），能够根据输入的数据自动生成符合需求的地图。这种方法在城市规划、虚拟环境设计、游戏开发等多个领域具有应用前景。主要特点：自动化生成：通过算法和模型，系统能够根据输入的地理或空间数据自动生成地图，而无需人工逐
【大模型应用开发动手做AI Agent】第一轮行动：工具执行搜索 AI大模型应用之禅计算科学神经计算深度学习神经网络大数据人工智能大型语言模型 AI AGI LLM Java Python 架构设计 Agent RPA
【大模型应用开发动手做AIAgent】第一轮行动：工具执行搜索作者：禅与计算机程序设计艺术/ZenandtheArtofComputerProgramming1.背景介绍1.1问题的由来随着人工智能技术的飞速发展，大模型应用开发已经成为当下热门的研究方向。AIAgent作为人工智能领域的一个重要分支，旨在模拟人类智能行为，实现智能决策和自主行动。在AIAgent的构建过程中，工具执行搜索是至关重要
KVM虚拟机源代码分析【转】 xidianjiapei001 #虚拟化技术
1.KVM结构及工作原理1.1KVM结构KVM基本结构有两部分组成。一个是KVMDriver，已经成为Linux内核的一个模块。负责虚拟机的创建，虚拟内存的分配，虚拟CPU寄存器的读写以及虚拟CPU的运行等。另外一个是稍微修改过的Qemu，用于模拟PC硬件的用户空间组件，提供I/O设备模型以及访问外设的途径。KVM基本结构如图1所示。其中KVM加入到标准的Linux内核中，被组织成Linux中标准
mongodb3.03开启认证 21jhf mongodb
下载了最新mongodb3.03版本，当使用--auth 参数命令行开启mongodb用户认证时遇到很多问题，现总结如下：（百度上搜到的基本都是老版本的，看到db.addUser的就是，请忽略） Windows下我做了一个bat文件，用来启动mongodb，命令行如下： mongod --dbpath db\data --port 27017 --directoryperdb --logp
【Spark103】Task not serializable bit1129 Serializable
Task not serializable是Spark开发过程最令人头疼的问题之一，这里记录下出现这个问题的两个实例，一个是自己遇到的，另一个是stackoverflow上看到。等有时间了再仔细探究出现Task not serialiazable的各种原因以及出现问题后如何快速定位问题的所在，至少目前阶段碰到此类问题，没有什么章法 1. package spark.exampl
你所熟知的 LRU(最近最少使用) dalan_123 java
关于LRU这个名词在很多地方或听说，或使用，接下来看下lru缓存回收的实现 1、大体的想法 a、查询出最近最晚使用的项 b、给最近的使用的项做标记通过使用链表就可以完成这两个操作，关于最近最少使用的项只需要返回链表的尾部；标记最近使用的项，只需要将该项移除并放置到头部，那么难点就出现你如何能够快速在链表定位对应的该项？这时候多
Javascript 跨域周凡杨 JavaScript jsonp 跨域 cross-domain
linux下安装apache服务器 g21121 apache
安装apache 下载windows版本apache，下载地址：http://httpd.apache.org/download.cgi 1.windows下安装apache Windows下安装apache比较简单，注意选择路径和端口即可，这里就不再赘述了。 2.linux下安装apache：下载之后上传到linux的相关目录，这里指定为/home/apach
FineReport的JS编辑框和URL地址栏语法简介老A不折腾 finereport web报表报表软件语法总结
JS编辑框： 1.FineReport的js。作为一款BS产品，browser端的JavaScript是必不可少的。 FineReport中的js是已经调用了finereport.js的。大家知道，预览报表时，报表servlet会将cpt模板转为html，在这个html的head头部中会引入FineReport的js，这个finereport.js中包含了许多内置的fun
根据STATUS信息对MySQL进行优化墙头上一根草 status
mysql 查看当前正在执行的操作，即正在执行的sql语句的方法为: show processlist 命令 mysql> show global status;可以列出MySQL服务器运行各种状态值，我个人较喜欢的用法是show status like '查询值%';一、慢查询mysql> show variab
我的spring学习笔记7-Spring的Bean配置文件给Bean定义别名 aijuans Spring 3
本文介绍如何给Spring的Bean配置文件的Bean定义别名？原始的 <bean id="business" class="onlyfun.caterpillar.device.Business"> <property name="writer"> <ref b
高性能mysql 之性能剖析 annan211 性能 mysql mysql 性能剖析剖析
1 定义性能优化 mysql服务器性能，此处定义为响应时间。在解释性能优化之前，先来消除一个误解，很多人认为，性能优化就是降低cpu的利用率或者减少对资源的使用。这是一个陷阱。资源时用来消耗并用来工作的，所以有时候消耗更多的资源能够加快查询速度，保持cpu忙绿，这是必要的。很多时候发现编译进了新版本的InnoDB之后，cpu利用率上升的很厉害，这并不
主外键和索引唯一性约束百合不是茶索引唯一性约束主外键约束联机删除
目标;第一步;创建两张表用户表和文章表第二步;发表文章 1,建表; ---用户表 BlogUsers --userID唯一的 --userName --pwd --sex create
线程的调度 bijian1013 java 多线程 thread 线程的调度 java多线程
1. Java提供一个线程调度程序来监控程序中启动后进入可运行状态的所有线程。线程调度程序按照线程的优先级决定应调度哪些线程来执行。 2. 多数线程的调度是抢占式的（即我想中断程序运行就中断，不需要和将被中断的程序协商） a)
查看日志常用命令 bijian1013 linux 命令 unix
一.日志查找方法，可以用通配符查某台主机上的所有服务器grep "关键字" /wls/applogs/custom-*/error.log 二.查看日志常用命令1.grep '关键字' error.log：在error.log中搜索'关键字'2.grep -C10 '关键字' error.log：显示关键字前后10行记录3.grep '关键字' error.l
【持久化框架MyBatis3一】MyBatis版HelloWorld bit1129 helloworld
MyBatis这个系列的文章，主要参考《Java Persistence with MyBatis 3》。样例数据本文以MySQL数据库为例，建立一个STUDENTS表，插入两条数据，然后进行单表的增删改查 CREATE TABLE STUDENTS ( stud_id int(11) NOT NULL AUTO_INCREMENT,
【Hadoop十五】Hadoop Counter bit1129 hadoop
1. 只有Map任务的Map Reduce Job File System Counters FILE: Number of bytes read=3629530 FILE: Number of bytes written=98312 FILE: Number of read operations=0 FILE: Number of lar
解决Tomcat数据连接池无法释放 ronin47 tomcat 连接池　优化
近段时间，公司的检测中心报表系统(SMC)的开发人员时不时找到我，说用户老是出现无法登录的情况。前些日子因为手头上有Jboss集群的测试工作，发现用户不能登录时，都是在Tomcat中将这个项目Reload一下就好了，不过只是治标而已，因为大概几个小时之后又会再次出现无法登录的情况。今天上午，开发人员小毛又找到我，要我协助将这个问题根治一下，拖太久用户难保不投诉。简单分析了一
java-75-二叉树两结点的最低共同父结点 bylijinnan java
import java.util.LinkedList; import java.util.List; import ljn.help.*; public class BTreeLowestParentOfTwoNodes { public static void main(String[] args) { /* * node data is stored in
行业垂直搜索引擎网页抓取项目 carlwu Lucene Nutch Heritrix Solr
公司有一个搜索引擎项目，希望各路高人有空来帮忙指导，谢谢！这是详细需求：（1）通过提供的网站地址(大概100-200个网站)，网页抓取程序能不断抓取网页和其它类型的文件（如Excel、PDF、Word、ppt及zip类型），并且程序能够根据事先提供的规则，过滤掉不相干的下载内容。（2）程序能够搜索这些抓取的内容，并能对这些抓取文件按照油田名进行分类，然后放到服务器不同的目录中。
[通讯与服务]在总带宽资源没有大幅增加之前,不适宜大幅度降低资费 comsci 资源
降低通讯服务资费，就意味着有更多的用户进入，就意味着通讯服务提供商要接待和服务更多的用户，在总体运维成本没有由于技术升级而大幅下降的情况下，这种降低资费的行为将导致每个用户的平均带宽不断下降，而享受到的服务质量也在下降，这对用户和服务商都是不利的。。。。。。。。 &nbs
Java时区转换及时间格式 Cwind java
本文介绍Java API 中 Date, Calendar, TimeZone和DateFormat的使用，以及不同时区时间相互转化的方法和原理。问题描述：向处于不同时区的服务器发请求时需要考虑时区转换的问题。譬如，服务器位于东八区（北京时间，GMT+8:00），而身处东四区的用户想要查询当天的销售记录。则需把东四区的“今天”这个时间范围转换为服务器所在时区的时间范围。
readonly,只读，不可用 dashuaifu js jsp disable readOnly readOnly
readOnly 和 readonly 不同，在做js开发时一定要注意函数大小写和jsp黄线的警告！！！我就经历过这么一件事：使用readOnly在某些浏览器或同一浏览器不同版本有的可以实现“只读”功能，有的就不行，而且函数readOnly有黄线警告！！！就这样被折磨了不短时间！！！（期间使用过disable函数，但是发现disable函数之后后台接收不到前台的的数据！！！）
LABjs、RequireJS、SeaJS 介绍 dcj3sjt126com js Web
LABjs 的核心是 LAB（Loading and Blocking）：Loading 指异步并行加载，Blocking 是指同步等待执行。LABjs 通过优雅的语法（script 和 wait）实现了这两大特性，核心价值是性能优化。LABjs 是一个文件加载器。RequireJS 和 SeaJS 则是模块加载器，倡导的是一种模块化开发理念，核心价值是让 JavaScript 的模块化开发变得更
[应用结构]入口脚本 dcj3sjt126com PHP yii2
入口脚本入口脚本是应用启动流程中的第一环，一个应用（不管是网页应用还是控制台应用）只有一个入口脚本。终端用户的请求通过入口脚本实例化应用并将将请求转发到应用。 Web 应用的入口脚本必须放在终端用户能够访问的目录下，通常命名为 index.php，也可以使用 Web 服务器能定位到的其他名称。控制台应用的入口脚本一般在应用根目录下命名为 yii（后缀为.php），该文
haoop shell命令 eksliang hadoop hadoop shell
cat chgrp chmod chown copyFromLocal copyToLocal cp du dus expunge get getmerge ls lsr mkdir movefromLocal mv put rm rmr setrep stat tail test text
MultiStateView不同的状态下显示不同的界面 gundumw100 android
只要将指定的view放在该控件里面，可以该view在不同的状态下显示不同的界面，这对ListView很有用，比如加载界面，空白界面，错误界面。而且这些见面由你指定布局，非常灵活。 PS：ListView虽然可以设置一个EmptyView，但使用起来不方便，不灵活，有点累赘。 <com.kennyc.view.MultiStateView xmlns:android=&qu
jQuery实现页面内锚点平滑跳转 ini JavaScript html jquery html5 css
平时我们做导航滚动到内容都是通过锚点来做，刷的一下就直接跳到内容了，没有一丝的滚动效果，而且 url 链接最后会有“小尾巴”，就像#keleyi，今天我就介绍一款 jquery 做的滚动的特效，既可以设置滚动速度，又可以在 url 链接上没有“小尾巴”。效果体验：http://keleyi.com/keleyi/phtml/jqtexiao/37.htmHTML文件代码： &
kafka offset迁移 kane_xie kafka
在早前的kafka版本中（0.8.0），offset是被存储在zookeeper中的。到当前版本（0.8.2）为止，kafka同时支持offset存储在zookeeper和offset manager（broker）中。从官方的说明来看，未来offset的zookeeper存储将会被弃用。因此现有的基于kafka的项目如果今后计划保持更新的话，可以考虑在合适
android > 搭建 cordova 环境 mft8899 android
1 , 安装 node.js http://nodejs.org node -v 查看版本 2, 安装 npm 可以先从 https://github.com/isaacs/npm/tags 下载源码解压到
java封装的比较器，比较是否全相同，获取不同字段名字 qifeifei
非常实用的java比较器，贴上代码： import java.util.HashSet; import java.util.List; import java.util.Set; import net.sf.json.JSONArray; import net.sf.json.JSONObject; import net.sf.json.JsonConfig; i
记录一些函数用法 .Aky. 位运算 PHP 数据库函数 IP
高手们照旧忽略。想弄个全天朝IP段数据库，找了个今天最新更新的国内所有运营商IP段，copy到文件，用文件函数，字符串函数把玩下。分割出startIp和endIp这样格式写入.txt文件，直接用phpmyadmin导入.csv文件的形式导入。（生命在于折腾，也许你们觉得我傻X，直接下载人家弄好的导入不就可以，做自己的菜鸟，让别人去说吧）当然用到了ip2long()函数把字符串转为整型数
sublime text 3 rust wudixiaotie Sublime Text
1.sublime text 3 => install package => Rust 2.cd ~/.config/sublime-text-3/Packages 3.mkdir rust 4.git clone https://github.com/sp0/rust-style 5.cd rust-style 6.cargo build --release 7.ctrl

Build a Credit Scoring Card Model with Python

Build a Credit Scoring Card Model with Python

1. Background Introduction

2. Data Description

3. Import Data

4. Data Preprocessing

5. Variable Selection

6. Model Analysis

7. Credit Scoring Card Construction

7. References

你可能感兴趣的:(信用评分卡模型,信用评分模型)