culuo4781

sql server 入门_SQL Server中的数据挖掘入门

sql server 入门

介绍 (Introduction)

In past chats, we have had a look at a myriad of different Business Intelligence techniques that one can utilize to turn data into information. In today’s get together we are going to have a look at a technique dear to my heart and often overlooked. We are going to be looking at data mining with SQL Server, from soup to nuts.

在过去的聊天中，我们了解了无数种可以用来将数据转换为信息的不同商业智能技术。在今天的聚会中，我们将了解一种我心中常常被忽视的技术。我们将研究使用SQL Server进行数据挖掘的过程，从无所不包。

Microsoft has come up with a fantastic set of data mining tools which are often underutilized by Business Intelligence folks, not because they are of poor quality but rather because not many folks know of their existence OR due to the fact that people have never had to opportunity to get to utilize them.

微软提供了一套出色的数据挖掘工具，这些工具经常被商业情报人员利用，这不是因为它们的质量很差，而是因为没有多少人知道他们的存在，或者是因为人们从来没有机会去利用它们。

Rest assured that you are NOW going to get a bird’s eye view of the power of the mining algorithms in our ‘fire-side’ chat today.

请放心，您现在将在今天的“火边”聊天中大致了解挖掘算法的功能。

As I wish to describe the “getting started” process in detail, this article has been split into two parts. The first describes exactly this (getting started), whilst the second part will discuss turning the data into real information.

正如我希望详细描述“入门”过程一样，本文分为两部分。第一部分准确地描述了这一点（入门），而第二部分将讨论将数据转换为真实信息。

So ‘grab a pick and shovel’ and let us get to it!

因此，“抢一把铲子”，让我们开始吧！

入门 ( Getting started )

For today’s exercise, we start by having a quick look at our source data. It is a simple relational table within the SQLShackFinancial database that we have utilized in past exercises.

对于今天的练习，我们首先快速查看源数据。它是我们在过去的练习中使用SQLShackFinancial数据库中的简单关系表。

As a disclosure, I have changed the names and addresses of the true customers for the “production data” that we shall be utilizing. The names and addresses of the folks that we shall utilize come from the Microsoft Contoso database. Further, I have split the client data into two distinct tables: one containing customer numbers under 25000 and the other with customer numbers greater than 25000. The reason for doing so will become clear as we progress.

作为披露，我更改了我们将要使用的“生产数据”的真实客户的名称和地址。我们将利用的人员的姓名和地址来自Microsoft Contoso数据库。此外，我已经将客户数据分为两个不同的表：一个包含25000以下的客户编号，另一个包含大于25000的客户编号。这样做的原因将随着我们的发展而变得清楚。

Having a quick look at the customer table (containing customer numbers less than 25000), we find the following data.

快速浏览客户表（包含少于25000的客户号），我们发现以下数据。

The screenshot above shows the residential addresses of people who have applied for financial loans from SQLShack Finance.

上面的屏幕截图显示了从SQLShack Finance申请了金融贷款的人的住所。

Moreover, the data shows criteria such as the number of cars that the applicant owns, his or her marital status and whether or not he or she owns a house. NOTE that I have not mentioned the person’s income or net worth. This is will come into play going forward.

此外，数据还显示一些标准，例如申请人拥有的汽车数量，他或她的婚姻状况以及他或她是否拥有房屋。注意，我没有提及该人的收入或净资产。这将在未来发挥作用。

创建我们的采矿项目 ( Creating our mining project )

Now that we have had a quick look at our raw data, we open SQL Server Data Tools (henceforward referred to as SSDT) to begin our adventure into the “wonderful world of data mining”.

现在，我们已经快速浏览了原始数据，我们将打开SQL Server数据工具（以下称为SSDT）开始我们的冒险，进入“精彩的数据挖掘世界”。

Opening SSDT, we select “New” from the “File” tab on the activity ribbon and select “Project” (see above).

打开SSDT，我们从活动功能区的“文件”选项卡中选择“新建”，然后选择“项目”（见上文）。

We select the “Analysis Services Multidimensional and Data Mining” option. We give our new project a name and click OK to continue.

我们选择“ Analysis Services多维和数据挖掘”选项。我们给新项目起一个名字，然后单击“确定”继续。

Having clicked “OK”, we find ourselves on our working surface.

单击“确定”后，我们发现自己在工作表面上。

Our first task is to establish a connection to our relational data. We do this by creating a new “Data Source” (see below).

我们的首要任务是建立与我们的关系数据的连接。为此，我们创建了一个新的“数据源”（见下文）。

We right-click on the “Data Sources” folder (see above and to the right) and select the “New Data Source” option.

我们右键单击“数据源”文件夹（请参见上方和右侧），然后选择“新数据源”选项。

The “New Data Source” Wizard is brought up. We click “Next”.

出现“新数据源”向导。我们点击“下一步”。

We now find ourselves looking at connections that we have used in past and SSDT wishes to know which (if any) of these connections we wish to utilize. We choose our “SQLShackFinancial” connection.

现在，我们发现自己正在查看过去使用的连接，SSDT希望知道我们希望使用这些连接中的哪些（如果有）。我们选择“ SQLShackFinancial”连接。

We select “Next”

我们选择“下一步”

We are asked for our credentials (see above) and click next.

要求我们提供凭据（见上文），然后单击下一步。

We are now asked to give a name to our connection (see above).

现在，我们被要求给我们的连接起一个名字（见上文）。

We click finish.

我们点击完成。

创建我们的数据源视图 ( Creating our Data Source View )

Our next task is to create a Data Source View. This is different to what we have done in past exercises.

我们的下一个任务是创建一个数据源视图。这与我们在过去的练习中所做的不同。

The data source view permits us to create relationships (from our relational data) which we wish to carry forward into the ‘analytic world’. One may think of a “Data Source View” as a staging area for our relational data prior to its importation into our cubes and mining models.

数据源视图使我们能够（希望从关系数据中）创建关系，并希望将这些关系推向“分析世界”。在将关系数据导入多维数据集和挖掘模型之前，可以将“数据源视图”视为关系数据的暂存区域。

We right-click on the “Data Source Views” folder and select “New Data Source View”.

我们右键单击“数据源视图”文件夹，然后选择“新数据源视图”。

The “Data Source View” wizard is brought up (see below).

出现“数据源视图”向导（请参见下文）。

We click “Next” (see above).

我们单击“下一步”（见上文）。

We select our “Data Source” that we defined above (see above).

我们选择上面定义的“数据源”（请参见上文）。

The “Name Matching” dialogue box is brought into view. As we shall be working with one table for this exercise, there is not much impact from this screen HOWEVER if we were creating a relationship between two or more tables we would indicate to the system that we want it to create the necessary logical relationships between the two or more tables to ensure that our tables are correctly joined.

出现“名称匹配”对话框。由于我们将使用一个表进行此练习，因此，如果在两个或多个表之间创建关系，则此屏幕不会产生太大影响，但会向系统指示我们希望系统在表之间创建必要的逻辑关系。两个或更多表，以确保我们的表正确连接。

In our case we merely select “Next” (see above).

在我们的情况下，我们仅选择“下一步”（请参见上文）。

We are now asked to select the table or tables that we wish to utilize.

现在，要求我们选择希望使用的一个或多个表。

For our current exercise, I select the “Customer” table (See above) and move the table to the “Included Objects” (see below).

对于我们当前的练习，我选择“客户”表（见上文），然后将该表移至“包含的对象”（见下文）。

We then click “Next”.

然后，我们单击“下一步”。

We are now asked to give our “Data Source View” a name (see above) and we then click “Finish” to complete this task.

现在，我们要求给“数据源视图”命名（请参见上文），然后单击“完成”以完成此任务。

We find ourselves back on our work surface. Note that the Customer entity is now showing in the center of the screenshot above, as is the name of the “Data Source View” (see upper right).

我们发现自己回到了工作现场。请注意，Customer实体现在显示在上方屏幕快照的中心，“数据源视图”的名称也是如此（请参见右上方）。

We now right click on the ‘Mining Structure” folder and select “New Mining Structure” (see above).

现在，我们右键单击“ Mining Structure”文件夹，然后选择“ New Mining Structure”（请参见上文）。

The “Data Mining Wizard” now appears (see below).

现在出现“数据挖掘向导”（见下文）。

We click “Next”.

我们点击“下一步”。

For the “Select the Definition Method” screen we shall accept the default “From existing relational database or data warehouse” option (see below).

对于“选择定义方法”屏幕，我们将接受默认的“来自现有关系数据库或数据仓库”选项（请参见下文）。

We then click “Next”.

然后，我们单击“下一步”。

The “Create the Data Mining Structure” screen is brought into view. The wizard asks us which mining technique we wish to use. In total for this exercise, we shall be creating four structure. “Microsoft Decision Trees” is one of the four. That said, we shall leave the default setting “Microsoft Decision Trees” as is.

进入“创建数据挖掘结构”屏幕。向导将询问我们希望使用哪种挖掘技术。总的来说，我们将创建四个结构。 “ Microsoft决策树”是这四个之一。也就是说，我们将保留默认设置“ Microsoft Decision Trees”。

We ignore the warning shown in the message box as we shall create the necessary connectivity on the next few screens.

我们将忽略消息框中显示的警告，因为我们将在接下来的几个屏幕上创建必要的连接。

The reader will note that the system wishes to know which “Data Source View” we wish to utilize. We select the one that we created above. We then click “Next”.

读者会注意到，该系统希望知道我们希望使用哪种“数据源视图”。我们选择上面创建的那个。然后，我们单击“下一步”。

The mining wizard now asks us to let it know where the source data resides. We select the “Customer” table (see above) and we click next.

现在，挖掘向导会要求我们告知源数据所在的位置。我们选择“客户”表（见上文），然后单击下一步。

At this point, we need to understand that once the model is created we shall “process” the model. Processing the model achieves two important things. First it “Trains” the model as to what type of data we are utilizing and runs that data against the data mining model that we have selected. After obtaining the necessary results, the process compares the actual results with the predicted results. The closer the actuals are to the predicted results the more accurate the model that we selected. The reader should note that whilst Microsoft provides us with +/- twelve mining models NOT ALL will provide a satisfactory solution and therefore a different model may need to be used. We shall see just this within a few minutes.

在这一点上，我们需要了解，一旦创建了模型，我们将“处理”该模型。处理模型可以实现两件重要的事情。首先，它“训练”该模型以了解我们正在使用什么类型的数据，并根据我们选择的数据挖掘模型来运行该数据。在获得必要的结果后，该过程会将实际结果与预测结果进行比较。实际值与预测结果越接近，我们选择的模型越准确。读者应注意，尽管Microsoft为我们提供了+/-十二种采矿模型，但并非ALL会提供令人满意的解决方案，因此可能需要使用其他模型。我们将在几分钟内看到这一点。

We now must specify the “training data” or in simple terms “introduce the Microsoft mining models to the customer raw data and see what the mining model detects”. In our case, it is the data from the “Customer” table. What we must do is to provide the system with a Primary Key field. Further, we must tell the system what data fields/criteria will be the data inputs that will be utilized with the mining model to see what correlation (if any) there is between these input fields (Does the client owns a house? How many cars does the person own? Is he or she married?) and what we wish to ascertain from the “Predicted” field (Is the person a good credit risk?) .

现在，我们必须指定“训练数据”或简单地说“将Microsoft挖掘模型引入客户原始数据并查看挖掘模型检测到的内容”。在我们的例子中，它是“客户”表中的数据。我们必须做的是为系统提供主键字段。此外，我们必须告诉系统哪些数据字段 / 条件将是挖掘模型将使用的数据输入，以查看这些输入字段之间存在什么关联（如果有）（客户是否拥有房屋？有多少辆汽车？该人拥有吗？是否已婚？）以及我们希望从“预测”字段中确定的内容（该人是否存在良好的信用风险？）。

设置主键 ( Setting the Primary Key )

For the primary key we select the fields “PK_Customer_Name” (see above).

对于主键，我们选择“ PK_Customer_Name”字段（请参见上文）。

选择输入参数/字段 ( Selecting the input parameters / fields )

We select “Houseowner” and “Marital_Status” (see above)

我们选择“房主”和“婚姻状况”（见上文）

and “number of cars owned” (see above)

和“拥有的汽车数量”（见上文）

As the reader will see from the two screen shots above, we selected

读者将从上面的两个屏幕截图中看到，我们选择了

Does the applicant own a house?
申请人是否拥有房屋？
Is he or she married?
他或她结婚了吗？
How many cars does he / she owns.
他/她拥有多少辆汽车。

NOTE that I have not included income and this was deliberate for our example.

注意，我没有包括收入，这是我们的示例所故意的。

Bernie Madoff’s income was large however we KNOW that he would not be a good risk.

伯尼·麦道夫（Bernie Madoff）的收入很大，但是我们知道他不会冒很大的风险。

Lastly, included within the raw data was a field called Credit Class which are KNOWN credit ratings for the clients concerned.

最后，原始数据中包含一个名为“信用等级”的字段，该字段是有关客户的已知信用等级。

选择PREDICTED字段 ( Selecting the PREDICTED field )

Last but not least, we must select the field that we wish the mining model to predict. This field is the “Credit Class” as may be seen below:

最后但并非最不重要的一点是，我们必须选择希望挖掘模型预测的字段。该字段是“信用等级”，如下所示：

回顾一下： ( To recap: )

The primary key is “PK_Customer_name”
主键是“ PK_Customer_name”
1. Is the person married?
  这个人结婚了吗？
2. Is the person a house owner?
  这个人是房主吗？
3. How many cars does the person own?
  该人拥有几辆车？
The field that we want the SQL Server Data Mining Algorithm to predict is the credit “bucket” that the person should fall into.. 0 being a good candidate and 4 being the worst possible candidate.
我们希望SQL Server数据挖掘算法进行预测的字段是该人应该落入的信誉“桶”。0为好候选人，4为最差的候选人。

We now click “Next”.

现在，我们单击“下一步”。

Having clicked “Next we arrive at the “Specify Columns’ Content and Data Type Screen”.

单击“下一步，我们进入”指定列的内容和数据类型屏幕”。

Credit class (the predicted field) is either a 0, 1, 2, 3, 4. These are discrete values (see above).

信用等级（预测字段）为0、1、2、3、4。这些是离散值（请参见上文）。

The number of cars owned is also a discrete value. No person owns 1.2 cars.

拥有的汽车数量也是一个离散值。没有人拥有1.2辆汽车。

House Owner is a Boolean (Y or N).

房主是布尔值（Y或N）。

Marital (Married) status is also a Boolean Value (Y or N).

婚姻（已婚）状态也是布尔值（Y或N）。

We click next.

我们点击下一步。

SQL Server now wishes to know of all the records within the customer table, what percentage of the data (RANDOMLY SELECTED BY THE MINING ALGORITHM) should be utilized to test just how closely the predicted values of “Credit class” tie with the actual values of “Credit Class”. One normally accept 30% as a good sample (of the population). As a reminder to the reader, the accounts within the data ALL have account numbers under 25000. We shall see why I have mentioned this again (in a few minutes).

现在，SQL Server希望了解客户表中的所有记录，应使用百分之几的数据（由挖掘算法随机选择）来测试“信用等级”的预测值与实际值之间的紧密关系。 “信用等级”。通常情况下，有30％的人作为良好样本（人口中的一员）。提醒读者，数据ALL中的帐户的帐户号均低于25000。我们将看到为什么我在几分钟后再次提到了这一点。

We then click next.

然后单击下一步。

The system wants us to give our mining model a name. In this case, we choose. “SQLShackMainMiningModel”. This is the “mommy”. “SQLShackMainMiningModel” has four children, one being the Decision Tree algorithm that we just created and three more which we shall create in a few moments. For the mining model name, we select “DecisionTreeSQLShackModel”.

系统希望我们给我们的挖掘模型起一个名字。在这种情况下，我们选择。 “ SQLShackMainMiningModel”。这是“妈妈”。 “ SQLShackMainMiningModel”有四个子代，一个是我们刚创建的决策树算法，另外三个是我们稍后将创建的子代。对于挖掘模型名称，我们选择“ DecisionTreeSQLShackModel”。

We now click “Finish”.

现在，我们单击“完成”。

We are returned to our main working surface as may be seen above.

如上所示，我们回到了主要工作表面。

创建其余三个模型 ( Creating the remaining three models )

From the “Mining Structures” folder we double-click our “SQLShackMainMiningModel” that we just created.

从“挖掘结构”文件夹中，双击我们刚刚创建的“ SQLShackMainMiningModel”。

The “Mining Structure” opens. In the upper left-hand side, we can see the fields for which we opted. They are shown under the Mining structure directory (see above).

“采矿结构”打开。在左上角，我们可以看到我们选择的字段。它们显示在“采矿结构”目录下（请参见上文）。

Clicking on the “Mining Models” tab, we can see the first model that we just created.

单击“挖掘模型”选项卡，我们可以看到刚创建的第一个模型。

What we now wish to do is to create the remaining three models that we discussed above.

现在，我们要做的是创建上面讨论的其余三个模型。

The first of the three will be a Naïve-Bayes Model. This is commonly used in a predictive analysis. The principles behind the Naïve-Bayes model are beyond the scope of this paper and the reader is redirected to any good predictive analysis book.

这三个中的第一个将是朴素贝叶斯模型。这通常用于预测分析中。朴素贝叶斯模型背后的原理超出了本文的范围，读者可以重新定向到任何好的预测分析书。

We select the “Create a related mining model” option (see above with the pick and shovel).

我们选择“创建相关的挖掘模型”选项（请参见上文中的“镐和铲”）。

The “New Mining Model” dialogue box is brought up to be completed (see above).

弹出“ New Mining Model”对话框，以完成该操作（请参见上文）。

We give our model a name and select the algorithm type (see above).

我们给我们的模型起一个名字，然后选择算法类型（见上文）。

In a similar manner, we shall create a “Clustering Model” and a “Neural Network”. The final results may be seen below:

以类似的方式，我们将创建一个“聚类模型”和一个“神经网络”。最终结果如下所示：

We have now completed all the heavy work and are in a position to process our models.

现在，我们已经完成了所有繁重的工作，并且能够处理我们的模型。

设置Analysis Services数据库的属性 ( Setting the properties of the Analysis Services database )

We click on the “Project” tab on the main ribbon and select “SQLShackDataMining” properties (see above).

我们单击主功能区上的“项目”选项卡，然后选择“ SQLShackDataMining”属性（请参见上文）。

The “SQLShackDataMining Property Pages” are brought into view. Clicking on the “Deployment” tab, we select the server to which we wish to deploy our OLAP database, and in addition, give the database a name.

进入“ SQLShackDataMining属性页”。单击“部署”选项卡，我们选择希望将OLAP数据库部署到的服务器，此外，为数据库命名。

We then click “OK”.

然后，我们单击“确定”。

处理我们的模型 ( Processing our models )

We right click on the “SQLShackMainMiningModel” and select “Process”.

我们右键单击“ SQLShackMainMiningModel”，然后选择“进程”。

We are told that our data is old and do we want to reprocess the models (see below).

有人告诉我们我们的数据很旧，我们是否要重新处理模型（请参见下文）。

We answer “Yes”.

我们回答“是”。

We are then asked for our credentials (see above). Once completed, we select “OK”.

然后，要求我们提供凭据（见上文）。完成后，我们选择“确定”。

One the build is complete, we are taken to the “Process Mining Structure” screen. We select the run option found at the bottom of the screen (see below in the blue oval).

一个构建完成后，我们进入“过程挖掘结构”屏幕。我们选择在屏幕底部找到的运行选项（请参见下面的蓝色椭圆形）。

Processing occurs and the results are shown above.

进行处理，结果如上所示。

Upon completion of processing, we click the “Close” button to leave the processing routine (see above). We now find ourselves back on our work surface.

处理完成后，我们单击“关闭”按钮以退出处理例程（请参见上文）。现在，我们回到工作表面。

让乐趣开始！！ ( Let the fun begin!! )

Now that our models have been processed and tested (this occurred during the processing that we just performed), it is time to have a look at the results.

既然我们的模型已经过处理和测试（这是在我们刚刚执行的处理过程中发生的），那么现在该看看结果了。

We click on the third tab “Mining Model Viewer”

我们点击第三个标签“ Mining Model Viewer”

Selecting our “Decision Tree” model as a starting point, we select zero as our background value. The astute reader will remember that zero is the best risk from our lending department. THE DARKER THE COLOUR OF THE BOXES is the direction that we should be following (according to the predicted results of the processing).

选择“决策树”模型作为起点，我们选择零作为背景值。精明的读者会记住，零是我们贷款部门的最大风险。暗箱颜色是我们应该遵循的方向（根据处理的预测结果）。

That said, we should be looking at folks who own no cars, are not married and do not own a house. You say weird!! Not entirely. It can indicate that the person has no debt. We all know what happens after getting married and having children to raise

就是说，我们应该看看那些没有汽车，没有结婚，也没有房子的人。你说很奇怪！不是完全。它可以表明该人没有债务。我们都知道结婚和生孩子后会发生什么情况

Clicking the “Dependency Network” tab we see that the mining model has found that the credit class is dependent Houseowner, Marital Status and Num Cars Owned.

单击“依赖关系网络”选项卡，我们看到挖掘模型已发现信贷类别为依赖的房主，婚姻状况和拥有的汽车数量。

By sliding the “more selective” slider found under the text “All Links” (see above) we are telling the model to go down to the grain of the wood “to see which one of the three is the most decisive” in determining the relationship between it and the credit class (see below).

通过滑动在“所有链接”（见上文）下找到的“更具选择性”滑块，我们告诉模型下降到木纹，“以确定三个因素中哪一个是最具决定性的”，以确定与信用等级之间的关系（请参见下文）。

We note that “Num Cars Owned” seems to play a major role. In other words, the mining model believes that there is a strong relationship between the credit class and the number of cars that the person either owns OR is currently financing. Now the “doubting Thomas” will say why? Mainly because cars cost money. Most people finance the purchase of cars. Credit plays a big role in financing.

我们注意到，“拥有的Num Cars”似乎起着主要作用。换句话说，挖掘模型认为，信用等级与该人拥有或目前正在融资的汽车数量之间存在很强的关系。现在“怀疑托马斯”会说为什么？主要是因为汽车要花钱。大多数人资助购车。信贷在融资中起着重要作用。

其余三种算法 ( The remaining three algorithms )

A full discussion of all four algorithms, how they work and what to look for to justify selecting any of the four (over and above the others) is certainly in order, however in the interests of brevity and driving home the importance of data mining itself, we shall put this discussion off until a future get together.

对这四种算法，它们如何工作以及寻找什么来证明选择这四种算法中的任何一种的全面讨论当然是有序的，但是为了简洁起见，驱使数据挖掘本身的重要性，我们将推迟进行讨论，直到将来聚会。

We shall, however, continue to see how the system has ranked these algorithms and which of the four the process recommends.

但是，我们将继续观察系统如何对这些算法进行排名，以及该过程建议使用哪种算法。

采矿精度图 ( The Mining Accuracy Chart )

Having now created our four mining models, we now wish to ascertain which of the four have the best fit for our data and has the highest probability of rendering the best possible information.

创建了四个挖掘模型之后，我们现在要确定四个模型中哪个最适合我们的数据，并且最有可能呈现最佳信息。

We click on the “Mining Accuracy Chart” tab

我们点击“采矿精度图表”标签

Note that the accuracy chart has four tabs itself. The first of the tabs is the “Input Selection”. We also note that our four mining models are present on the screen (see above).

请注意，精度图表本身具有四个选项卡。选项卡中的第一个是“输入选择”。我们还注意到，屏幕上显示了我们的四个挖掘模型（请参见上文）。

As SQLShack financial makes most of its earnings from lending money and as we all realize that they wish to lend funds to only clients that they believe are a good risk (i.e. a rating of 0 ), they set the “Predict Value” to zero for all four algorithms (see below).

由于SQLShack financial的大部分收入都来自放贷，并且我们都认识到他们只希望将资金借给他们认为有良好风险（即评级为0）的客户，因此他们将“预测值”设置为零。所有四种算法（请参见下文）。

and when complete, our screen should look as follows (see below):

完成后，我们的屏幕应如下所示（如下所示）：

The astute reader will note that in the lower portion of the screen we are asked which dataset we wish to utilize. We accept the dataset “the mining model test cases” used by the system in the creation of our model. Later in this discussion, we shall utilize data that we have held back from the model to verify that the mining models hold for that data as well. That will be the proof of the pudding!

精明的读者会注意到，在屏幕的下部，询问了我们希望使用哪个数据集。我们接受系统在创建模型时使用的数据集“挖掘模型测试用例”。在本讨论的稍后部分，我们将利用从模型中保留的数据来验证挖掘模型是否也适用于该数据。那将是布丁的证明！

提升图 ( The Lift Chart )

The lift chart (in my humble opinion) tells all. Its purpose is the show us which of the four models the system believes is the best fit for the data that we have.

电梯图表（以我的拙见）可以说明一切。其目的是向我们展示系统认为这四个模型中的哪个最适合我们拥有的数据 。

I like to call the Lift Chart “the race to the top”. It informs us how much of the population should be sampled (check their real credit rating) to make a decision on the credit risk which is most beneficial to SQLShack Financial. In simple terms, it is saying to us “This model requires checking x% of the population before you can be fairly certain that your model is accurate”. Keeping things extremely simple, the lower the required sampling amount, the more certain that one can be that the model is accurate and is, in fact, one of the models that we should be utilizing.

我喜欢称举升图表为“顶级比赛”。它告诉我们应该抽样多少人口（检查其真实信用等级）以决定对SQLShack Financial最有利的信用风险。简而言之，这是对我们说的：“此模型需要先检查x％的人口，然后才能确定模型是正确的”。使事情保持极其简单，所需的采样量越低，就越可以肯定该模型是准确的，并且实际上是我们应该使用的模型之一。

That said, the line graph in pink (see above) is a line generated by the mining structure processing. It shows the best possible outcome. It essentially is telling us “with the best possible LUCK, we only need to check the true credit ratings of 22% of our applicants”. The light blue undulating lines represent the “Decision Tree” model and the “Neural Network” model and they peak (reach one hundred percent on the Y axis at a population sampling of just over 50 % (X-axis see the graph above). This said they are the most promising algorithms to use. The “Naïve Bayes” and “Clustering algorithms” peak closer to 100% on the X-axis and are therefore not as reliable as the “Decision Tree” and the “Neural Network” algorithms. The straight line in blue from (0,0) to (100,100) is the “Shear dumb luck” line. Enough said. More on accuracy in a few minute. Please stay tuned.

就是说，粉红色的线图（请参见上文）是由挖掘结构处理生成的线。它显示了最好的结果。它实质上是在告诉我们“如果运气最好，我们只需要检查22％的申请人的真实信用等级”。淡蓝色的起伏线代表“决策树”模型和“神经网络”模型，并且达到峰值（在人口抽样刚超过50％时，Y轴达到100％（X轴请参见上图）。这表示它们是最有前途的算法，“朴素贝叶斯”和“聚类算法”在X轴上的峰值接近100％，因此不如“决策树”和“神经网络”算法可靠。从（0,0）到（100,100）的蓝色直线是“ Shear dumb luck”线，足够多了，请稍后再说。

分类矩阵 ( The Classification Matrix )

As proof of our assertions immediately above, we now have a quick look at the next tab, the “Classification Matrix”

作为我们上面的断言的证明，我们现在快速浏览下一个选项卡“分类矩阵”

In this fantastic and informative tool, the model shows us how many instances the system found “where the ‘predicted’ was the SAME as the ‘actual’”. Note the first matrix for the “Decision Tree” (the first matrix) and note the strong diagonal between the “Actuals” on the X axis and the “Predicted” on the Y-axis. The same is reasonably true for the “Neural Network” model (see the bottom of the screenshot below).

在这个奇妙而有用的工具中，模型向我们显示了系统发现了多少个实例“ 其中“预测的”是SAME的“实际””。 注意“决策树”的第一个矩阵（第一个矩阵），并注意X轴上的“ Actuals”和Y轴上“ Predicted”之间的强对角线。对于“神经网络”模型也是如此（请参见下面的屏幕截图的底部）。

The reader will note that the predicted vs. actuals for the remaining two models are randomly dispersed. The more the entropy, the more doubtful the accuracy of the model (with regards to our data).

读者会注意到，其余两个模型的预测值与实际值是随机分散的。熵越多，模型的准确性（就我们的数据而言）就越值得怀疑。

利用挖掘模型并验证其结果 ( Utilizing the Mining Models and verifying their results )

At this point, we supposedly have two relatively reliable models with which to work. What we now must do is to verify the predicted versus the actuals. As a reminder to the reader, the only way to ensure that our algorithms are yielding correct predictions is by comparing what “they say” with the “truth” from other credit agencies. The more that match, the more accurate the algorithm will be in predicting; as more and more data is added to our systems.

至此，我们应该可以使用两个相对可靠的模型。现在，我们要做的是验证预测值与实际值。提醒读者，确保我们的算法得出正确预测的唯一方法是将“他们说的话”与其他信贷机构的“真相”进行比较。匹配越多，算法预测的准确性就越高；随着越来越多的数据添加到我们的系统中。

We now click on the “Mining Model Predictions” tab.

现在，我们单击“挖掘模型预测”选项卡。

We note that “Decision Tree” model has been selected. Obviously, we could have selected one of the other models but in the interest of brevity, we choose to look at the “Decision Tree”. We must now select the physical input table (with all its records) that we wish the model to act upon.

我们注意到已选择“决策树”模型。显然，我们可以选择其他模型之一，但是为了简洁起见，我们选择查看“决策树”。现在，我们必须选择希望模型对其进行操作的物理输入表（及其所有记录）。

We click “Select Case Table” as shown above and select the “Customer” table (see below).

我们如上图所示单击“选择案例表”，然后选择“客户”表（见下文）。

Note that the fields of the mining model are joined to the actual field of the “Customer” table. What is now required, is to remove the link from the credit class of the model to the actual credit class in the relational table. The field in the table being the known credit class. Simplistically we want the model to predict the credit class, and we shall then see how many matches we obtain.

请注意，挖掘模型的字段已连接到“客户”表的实际字段。现在需要的是，从模型的信用等级到关系表中的实际信用等级删除链接。表中的字段是已知的信用等级。简单地说，我们希望模型预测信用等级，然后我们将看到获得了多少个匹配项。

We right click on the credit class link and delete it (see above).

我们右键单击信用等级链接并将其删除（请参见上文）。

Our screen now looks as follows (see above).

现在，我们的屏幕如下所示（请参见上文）。

We are now in a position to create our first Data Mining Query (DMX or Data Mining Expression) to “prove out” our model.

我们现在是在一个位置来创建我们的第一个数据挖掘查询（DMX或d ATA 中号都进不去E X PRESSION）“证明了”我们的模型。

In the screen above we select the first field to be the “Credit Risk” from the data mining model.

在上面的屏幕中，我们从数据挖掘模型中选择第一个字段为“信用风险”。

We set its “Criteria / Argument” to 0 (see above).

我们将其“条件/参数”设置为0（请参见上文）。

We now add more fields from the “Customer” table to identify these folks!

现在，我们从“客户”表中添加更多字段来识别这些人！

We have now added a few more fields (from the source table) as may be seen above. Let us see what we have.

如上所示，我们现在（从源表中）添加了一些其他字段。让我们看看我们拥有什么。

Under the words “Mining Structure” (upper left in the screen shot above) we click on the drop down box. We select the “Query” option (see below).

在“采矿结构”一词（在上面的屏幕截图的左上方）下，我们单击下拉框。我们选择“查询”选项（见下文）。

The resulting DMX code from our design screen is brought into view.

从我们的设计屏幕中生成的DMX代码被显示。

This DMX code should not to be confused with MDX code. Personally I LOVE the feature of having access to the code, as now that we have the code, we can utilize this code for reporting. More about this in my next part of this article.

请勿将此DMX代码与MDX代码混淆。我个人喜欢访问代码的功能，因为现在有了代码，我们可以利用此代码进行报告。在本文的下一部分中，将对此进行更多介绍。

Once again we select the drop down box below the words “Mining Structure” and select “Result” (see below).

我们再次选择“采矿结构”下方的下拉框，然后选择“结果”（见下文）。

We now obtain our result set.

现在，我们获得了结果集。

We note that the model has rendered 1330 rows that it believes would be a good credit risk.

我们注意到，该模型已呈现1330行，它认为这将是一个很好的信用风险。

Let us now throw a spanner into the works and add one more field to the mix. This field is the TRUE CREDIT RISK and we shall tell the system that we wish to see only those records whose “true credit risk” was a 0. In short, actual equals predicted.

现在让我们将一把扳手投入作品中，并在混合中再添加一个字段。该字段是“ 真实信用风险” ，我们将告诉系统我们希望仅查看那些“真实信用风险”为0的记录。总之，实际等于预测值。

Our design now looks as follows.

现在，我们的设计如下所示。

Running our query again, we now find that:

再次运行查询，我们现在发现：

994 rows were returned, that the algorithm predicted would be a 0 and were, in fact, a credit class of 0. This represents 74.7% accuracy which is surprisingly good.

返回了994行，预测的算法将为0，并且实际上是信用等级为0。这表示74.7％的准确性，这令人惊讶地好。

很好，丹迪！ ( Fine and Dandy!! )

Thus far we have performed our exercises with account numbers less than 25000.

到目前为止，我们已经使用少于25000的帐号进行了练习。

Let us now add a few more accounts. These accounts will have account numbers over 25000. We now load these accounts and reprocess the results. We now set up our query as follows (see below).

现在让我们添加更多帐户。这些帐户的帐号将超过25000。我们现在加载这些帐户并重新处理结果。现在，我们按照以下步骤设置查询（请参见下文）。

The result set may be seen above. We retrieved 974 rows.

结果集可以在上面看到。我们检索了974行。

Changing this query slightly and once again telling the query to only show the rows where the predicted and actual credit classes are 0 we find…

略微更改此查询，然后再次告诉该查询以仅显示预测信用等级和实际信用等级为0的行，我们发现…

924 rows are returned which indicates a 95% accuracy. The only one take away from this 95% is that we are still on track with our model.. nothing more, nothing less.

返回924行，表明准确度为95％。 唯一可以摆脱这95％的因素的是，我们仍在使用我们的模型 。仅此 而已。

结论（如果有） ( Conclusions (if any) )

It cannot be stressed enough that data mining is an iterative process. In real life business scenarios, one would take into consideration more degrees of freedom.

数据挖掘是一个反复的过程，这已经足够强调了。在现实生活中的业务场景中，人们会考虑更多的自由度。

Attributes such as

诸如以下的属性

Salary
薪水
Length of time at current job
当前工作的时间长度
Age
年龄
References (on a numeric scale 0 = bad 5 = great)
参考（以数字标度表示0 =差5 =差）

Etc.

等等。

Further, time must be spent in refining the actual combinations of these parameters with the myriad of mining models in order to ensure we are utilizing the most effective model(s).

此外，必须花费时间来完善这些参数与大量采矿模型的实际组合，以确保我们使用的是最有效的模型。

In short, it is a trial and error exercise. The more data that we have and the more degrees of freedom that we utilize, the closer we come to the ‘truth’ and ‘reality’.

简而言之，这是一个反复试验的练习。我们拥有的数据越多，利用的自由度越高，我们就越接近“真相”和“现实”。

In the second part of this article (to be published soon), we shall see how we may utilize the information emanating from the models, in our day to day reporting activities.

在本文的第二部分（即将发布）中，我们将在日常报告活动中看到如何利用模型产生的信息。

Lastly, common sense and understanding are prerequisites to any successful data mining project. There are no correct answers, merely close approximates. This is the reality! Happy programming!

最后，常识和理解是任何成功的数据挖掘项目的先决条件。没有正确答案，只有近似值。这是现实！编程愉快！

翻译自: https://www.sqlshack.com/getting-started-sql-server-data-mining/

sql server 入门

你可能感兴趣的:(sql server 入门_SQL Server中的数据挖掘入门)

我在广州学 Mysql 系列——数据表查询命令详解练小杰数据库相关 mysql 数据库学习经验分享 adb 后端
ℹ️大家好，我是LXJ，今天星期二了，本文将讲述MYSQL查询数据的详细命令以及相关例题~~复习：《Mysql函数的练习题》同时，数据库相关内容查看专栏【数据库专栏】~想要了解更多内容请点击我的主页:【练小杰的CSDN】“倒霉，倒霉，倒霉！”——龙叔文章目录前言基本查询语句单个表格查询查询所有字段查询指定字段查询指定记录带IN关键字的查询带BETWEENAND的范围查询带LIKE的字符匹配查询查询
我在广州学 Mysql 系列——存储过程与存储函数详解练小杰数据库相关 mysql android 数据库学习 adb sql
ℹ️大家好，我是练小杰，今天周五了，一周就这样从手上溜走了，还有两星期过年！！本文将学习MYSQL中存储过程与存储函数的概念~~回顾：【索引详解】【索引相关练习】数据库专栏【数据库专栏】~想要了解更多内容，主页【练小杰的CSDN】文章目录存储过程与存储函数存储过程（StoredProcedure）存储函数（StoredFunction）⚠️主要区别选择存储过程还是存储函数创建存储过程命令解释创建存
c++_sort函数惊讶的猫 c语言算法 c++
sort介绍在C/C++中，要想应用排序算法，可以使用c语言的qsort，也可以使用c++的sort。1)qsort是C标准库提供的一个通用排序函数，位于stdlib.h头文件中。qsort适用于C语言中的数组。2)sort是C++中STL的泛型算法（即函数）sort可以排数组，vector(以及其他的容器)sort可以自定义排序规则。引入：#include排静态数组c语言中arr是一个数组名作为
自动化脚本在Xshell中的应用这多冒昧啊运维 github git 运维自动化自动化脚本脚本
Xshell是一款功能强大的终端模拟软件，主要用于远程连接和管理服务器。它支持多种协议，包括SSH、Telnet、SFTP等，使用户能够通过命令行界面对远程服务器进行操作。Xshell提供了丰富的功能和特点，使其成为系统管理员、开发人员和网络工程师的得力工具。目录一、概述二、自动化脚本在Xshell中的应用案例案例一：自动化系统更新与维护案例二：自动化备份与恢复案例三：自动化网络安全检查三、总结一
SpringCloud/Boot集成LogBack azoon.top spring cloud logback spring log4j slf4j
一.简要介绍什么是SLF4J？官网介绍：SimpleLoggingFacadeforJava（SLF4J）充当简单的各种日志记录框架的Facade或抽象（e.g.java.util.logging、logback、log4j）允许最终用户在部署时插入所需的日志记录框架。类似java中的接口，如果只集成SLF4J，日志只能输出在控制台，并没有输出到文件的能力，要实现真正的日志能力，需要引入其实现层：
从入门到精通，解锁AI新高度——DeepSeek学习手册周师姐学习
资料链接：https://pan.quark.cn/s/c927326f70c5你是否渴望掌握前沿AI技术，却在复杂的理论和实践中迷茫？现在，一本由清华大学出品的《DeepSeek：从入门到精通》学习手册横空出世，为你开启AI新世界的大门。作为人工智能领域的新兴力量，DeepSeek以其卓越的性能和创新的技术，正在重塑我们对AI的认知。这本手册，由清华大学顶尖科研团队精心编写，是DeepSeek技
Pytorch使用手册—使用TACOTRON2进行文本到语音转换（专题二十四） AI专题精讲 Pytorch入门到精通 pytorch 人工智能 python
一、概述本教程展示了如何使用torchaudio中的预训练Tacotron2构建文本到语音的管道。文本到语音的管道流程如下：文本预处理首先，输入的文本被编码为一系列符号。在本教程中，我们将使用英语字符和音标作为符号。谱图生成从编码后的文本中生成谱图。我们使用Tacotron2模型来完成这一步。3.时域转换最后一步是将谱图转换为波形。从谱图生成语音的过程也称为Vocder（声码器）。在本教程中，我们
Pytorch使用手册--将 PyTorch 模型导出为 ONNX（专题二十六） AI专题精讲 Pytorch入门到精通 pytorch 人工智能 python
注意截至PyTorch2.1，ONNX导出器有两个版本。torch.onnx.dynamo_export是最新的（仍处于测试阶段）导出器，基于PyTorch2.0发布的TorchDynamo技术。torch.onnx.export基于TorchScript后端，自PyTorch1.2.0起可用。一、torch.onnx.dynamo_export使用在60分钟入门中，我们有机会从高层次上了解PyT
Apache Lucene 详解及示例微笑听雨。 java 进阶教程 apache lucene java 全文检索
ApacheLucene详解及示例1.简介ApacheLucene是一个开源的高性能全文搜索引擎库，广泛应用于构建各种搜索系统和信息检索应用。Lucene提供了丰富的API来进行索引和搜索，支持高效的文本处理和查询。本文将深入解析Lucene的核心概念和主要功能，并通过示例代码演示其使用方法。2.核心概念2.1倒排索引倒排索引（InvertedIndex）是Lucene的核心数据结构。它将文档中的
根据Excel生成建表语句sql——源码设计说明忙碌的菠萝 java 环境搭建 sql java 数据库
根据Excel生成建表语句sql设计的人跟开发的人总不是同一个，这就导致了设计是设计的思路，开发是开发的思路，表也是一样，开发给加了字段不同步给设计人员，设计加了字段开发可能这个环境加了，另一个没加。为了避免比对和扯皮，以设计为准！序号内容连接地址1工具使用说明https://blog.csdn.net/qq_21271511/article/details/1219010642工具下载地址htt
Python连接SQL SEVER数据库全流程 m0_74824865 面试学习路线阿里巴巴数据库 python sql
背景介绍在数据分析领域，经常需要从数据库中获取数据进行分析和处理。而SQLServer是一种常用的关系型数据库管理系统，因此学习如何使用Python连接SQLServer数据库并获取数据是非常有用的。以下是Python使用pymssql连接SQLServer数据库的全流程：安装pymssql库本地账号设置脚本连接数据导入函数实现一、安装pymssqlpymssql是Python连接SQLServe
类和对象——const修饰的类的对象和函数 Darkwanderor c++学习 c++const
const修饰的类的对象和函数const成员函数和const对象1const成员函数2调用关系3const在成员函数中的位置4取地址&及const取地址操作符重载const成员函数和const对象1const成员函数将const修饰的“成员函数”称之为const成员函数，const修饰类成员函数，实际修饰该成员函数隐含的this指针，表明在该成员函数中不能对类的任何成员进行修改。例如：#inclu
解释SQL和NoSQL数据库的区别，各自的适用场景是什么？破碎的天堂鸟学习教程 nosql 数据库
SQL与NoSQL数据库的深度对比及适用场景分析一、核心定义与数据模型差异1：SQL数据库结构化数据模型：基于关系型模型，数据以表格（行和列）形式存储，表之间通过外键建立关联。例如，客户表与订单表通过客户ID关联，形成严格的逻辑结构。预定义模式（Schema）：需提前定义表结构（字段类型、主键、外键等），修改结构需通过ALTER等命令，灵活性较低。标准化查询语言：使用SQL（StructuredQ
2024年BCSP-X小学低年级组初赛测试题（模拟题解析）天秀信奥编程培训 #BCXP-X模拟题北京BCSP-X试题讲解专栏 BCXP-X 信息学奥赛 c++
一、单项选择（共15题，每题2分，共计30分，每题有且仅有一个正确选项）以下是题目和解析的完整格式:不可以作为c++中的变量名的是（）。A.I以下loveChinaB.I_loveChinaC.I_love_ChinaD.i_loveChina正确答案：A.I以下loveChina解析：在C++中，变量名命名需要遵循一定的规则。变量名可以由字母、数字和下划线组成，但是第一个字符不能是数字。此外，变
前端开发中的常见问题与疑惑：解析与应对策略 lina_mua javascript vue.js html 前端 es6
1.引言1.1前端开发的复杂性前端开发涉及HTML、CSS、JavaScript等多种技术，同时还需要考虑性能优化、跨浏览器兼容性、用户体验等问题。随着前端技术的快速发展，开发者面临的挑战也越来越多。1.2本文的目标本文旨在总结前端开发中常见的问题与疑惑，并提供相应的解决方案和应对策略，帮助开发者更好地应对挑战。2.HTML/CSS常见问题2.1布局问题：如何实现复杂的页面布局？问题描述：实现复杂
[NOIP2007 提高组] 矩阵取数游戏题解 ◥༺ʚ 无聊鸭本鸭 ɞ༻◤ 洛谷刷题(C/C++)矩阵算法深度优先线性代数图论开发语言
题目描述帅帅经常跟同学玩一个矩阵取数游戏：对于一个给定的n×mn×m的矩阵，矩阵中的每个元素ai,jai,j均为非负整数。游戏规则如下：每次取数时须从每行各取走一个元素，共nn个。经过mm次后取完矩阵内所有元素；每次取走的各个元素只能是该元素所在行的行首或行尾；每次取数都有一个得分值，为每行取数的得分之和，每行取数的得分=被取走的元素值×2i×2i，其中ii表示第ii次取数（从11开始编号）；游戏
Spring Cloud Alibaba Spring Cloud Spring Boot 版本对应关系马丁半只瞄 java spring spring boot spring cloud
版本不对应可能有以下报错：Failedtobindpropertiesundermybatis-plus.configuration.result-maps[0]NoClassDefFoundError:reactor/netty/http/server/WebsocketServerSpec$Builderreactor.netty.resources.ConnectionProvider.el
游戏开放经济系统的部分思考 ArimaMisaki 大数据人工智能
游戏内的经济系统设计确实与现实中的宏观经济调控有相似逻辑，而现实中的对抗“非法经济组织”（如黑市、洗钱集团、垄断企业）的策略，经过适当改造后可以迁移到游戏内对抗工作室。下文是具体对比与可借鉴方案:一、现实中的“工作室”类比与应对手段1.打击非法金融活动（类比游戏内黑市交易）现实手段：央行监控大额资金流动（如反洗钱系统）。对异常账户冻结调查（如频繁跨行转账、多账户资金归集）。游戏借鉴：交易链路追踪：
第14天：C++异常处理实战指南 - 构建安全的文件解析系统 JuicyActiveGilbert C++教程 c++安全开发语言
第14天：C++异常处理实战指南-构建安全的文件解析系统一、今日学习目标掌握C++异常处理的核心语法与流程️理解RAII在资源管理中的关键作用创建自定义文件解析异常体系实现安全的文件解析器原型二、C++异常处理核心机制1.异常处理基础语法#include#include#includevoidparseConfiguration(conststd::string&path){std::ifstre
Java微服务的注册中心Nacos 铁锤学代码微服务 java 微服务开发语言
文章目录Nacos的主要作用Nacos实现动态配置更新的技术Nacos实现CAPNacos实现CAP原理Nacos使用Distro和Raft分别干什么用？ZAB与Raft的区别Nacos的主要作用配置中心:可以将微服务中的一些配置信息放到Nacos进行统一管理，也可以通过Nacos实现动态配置管理。也可以将不同环境的配置放在不同的Namespace下的group下，实现动态选择配置发布部署。服务注
免费虚拟主机天道大帝 python django pygame virtualenv scrapy
天道论坛云服务免费虚拟主机https://www.pantd.com解锁高效开发：免费虚拟主机助力你的项目腾飞在当今数字化浪潮中，无论是初出茅庐的新手开发者渴望一展身手，还是经验丰富的编程大咖想要快速验证创意，一款优质的虚拟主机都至关重要。今天，就为大家揭开一款免费虚拟主机的神秘面纱，让你轻松开启线上项目之旅。一、便捷入门，零成本启航对于刚踏入编程世界的小白来说，资金往往是开启项目的一大阻碍。这款
ArrayList 源码分析 2401_85327573 java 开发语言
ArrayList简介ArrayList的底层是数组队列，相当于动态数组。与Java中的数组相比，它的容量能动态增长。在添加大量元素前，应用程序可以使用ensureCapacity操作来增加ArrayList实例的容量。这可以减少递增式再分配的数量。ArrayList继承于AbstractList，实现了List,RandomAccess,Cloneable,java.io.Serializabl
C++ 游戏开发入门安年CJ C++游戏 c++开发语言 c#游戏
一、为什么选择C++进行游戏开发C++在游戏开发领域具有独特的地位。它兼具高效性与对底层硬件的良好控制能力，这使得它非常适合开发对性能要求极高的游戏核心引擎部分。许多知名的大型游戏，如《使命召唤》系列、《虚幻竞技场》等，其底层架构都是基于C++构建的。C++能够直接操作内存，在处理复杂的游戏逻辑、大规模数据运算（如物理模拟、图形渲染中的大量计算）以及优化游戏性能方面有着卓越的表现。同时，丰富的类库
Exception:data did not match any variant of untagged enum PyPreTokenizerTypeWrapper at line 69 解决方案爱编程的喵喵 Python基础课程 python tokenizer PyPreTokenizer 解决方案
大家好，我是爱编程的喵喵。双985硕士毕业，现担任全栈工程师一职，热衷于将数据思维应用到工作与生活中。从事机器学习以及相关的前后端开发工作。曾在阿里云、科大讯飞、CCF等比赛获得多次Top名次。现为CSDN博客专家、人工智能领域优质创作者。喜欢通过博客创作的方式对所学的知识进行总结与归纳，不仅形成深入且独到的理解，而且能够帮助新手快速入门。本文主要介绍了Exception:datadidn
C++游戏开发系列教程之第二篇：面向对象编程与游戏架构设计放氮气的蜗牛深度博客游戏
大家好，欢迎回到C++游戏开发系列教程！在第一篇中，我们介绍了C++游戏开发的基本概念和如何搭建一个简单的游戏循环，为新手打开了C++游戏开发的大门。本篇博客将深入讲解面向对象编程（OOP）在游戏开发中的重要性，以及如何设计一个简单而有效的游戏架构。通过本篇文章，你将学到如何利用C++的类与继承构建游戏中的各个对象（如玩家、敌人等），并结合游戏循环实现一个基础的游戏状态管理系统。所有代码均附有详细
Spring Boot @Component注解介绍 CnLg.NJ Java spring boot 后端 java
@Component是Spring中的一个核心注解，用于声明一个类为Spring管理的组件（Bean）。它是一个通用的注解，可以用于任何层次的类（如服务层、控制器层、持久层等）。通过@Component注解，Spring会自动检测并注册该类为一个Bean，从而实现依赖注入和生命周期管理。1.@Component的作用@Component是一个元注解，它本身被@Configuration、@Serv
redis集群迅速搭建（个人学习和测试用） yinhezhanshen redis 学习 java
笔者使用ubuntu操作系统下载redis地址：Indexof/releases/，选择最新的版本下载。解压后进入目录，直接make就可以编译。编译成功后在src目录下会生成redis-server和redis-cli可执行文件。进入redis目录下的utils/create-cluster目录，执行./create-clusterstart,快速启动6个实例zy@zy-VirtualBox:~/
驱动开发系列39 - Linux Graphics 3D 绘制流程（二）- 设置渲染管线黑不溜秋的 GPU驱动专栏驱动开发
一：概述Intel的Iris驱动是Mesa中的Gallium驱动，主要用于IntelGen8+GPU（Broadwell及更新架构）。它负责与i915内核DRM驱动交互，并通过Vulkan（ANV）、OpenGL（IrisGallium）、或OpenCL（Clover）来提供3D加速。在Iris驱动中，GPUPipeline设置涉及多个部分，包括编译和上传着色器、设置渲染目标、绑定缓冲区、配置固定
神经网络中的Adam 化作星辰神经网络人工智能深度学习
Adam（AdaptiveMomentEstimation）是一种广泛使用的优化算法，结合了RMSprop和动量（Momentum）的优点。它通过计算梯度的一阶矩估计（mean）和二阶矩估计（uncenteredvariance），为每个参数提供自适应学习率。Adam由DiederikP.Kingma和JimmyBa在2014年的论文《Adam:AMethodforStochasticOptimi
神经网络中的Nesterov Momentum 化作星辰神经网络人工智能深度学习
NesterovAcceleratedGradient(NAG)，也称为NesterovMomentum，是一种改进版的动量优化算法，旨在加速梯度下降过程中的收敛速度，并提高对最优解的逼近效率。它由YuriiNesterov在1983年提出，是对传统动量方法的一种增强。###传统动量法回顾在传统的动量方法中，更新规则不仅考虑当前的梯度，还包含了之前所有梯度的方向和大小的累积（即“动量”），以帮助克
数据采集高并发的架构应用 3golden .net
问题的出发点：最近公司为了发展需要，要扩大对用户的信息采集，每个用户的采集量估计约2W。如果用户量增加的话，将会大量照成采集量成3W倍的增长，但是又要满足日常业务需要，特别是指令要及时得到响应的频率次数远大于预期。 &n
不停止 MySQL 服务增加从库的两种方式 brotherlamp linux linux视频 linux资料 linux教程 linux自学
现在生产环境MySQL数据库是一主一从，由于业务量访问不断增大，故再增加一台从库。前提是不能影响线上业务使用，也就是说不能重启MySQL服务，为了避免出现其他情况，选择在网站访问量低峰期时间段操作。一般在线增加从库有两种方式，一种是通过mysqldump备份主库，恢复到从库，mysqldump是逻辑备份，数据量大时，备份速度会很慢，锁表的时间也会很长。另一种是通过xtrabacku
Quartz——SimpleTrigger触发器 eksliang SimpleTrigger TriggerUtils quartz
转载请出自出处：http://eksliang.iteye.com/blog/2208166 一.概述 SimpleTrigger触发器，当且仅需触发一次或者以固定时间间隔周期触发执行；二.SimpleTrigger的构造函数 SimpleTrigger(String name, String group)：通过该构造函数指定Trigger所属组和名称； Simpl
Informatica应用（1） 18289753290 sql workflow lookup 组件 Informatica
1.如果要在workflow中调用shell脚本有一个command组件，在里面设置shell的路径；调度wf可以右键出现schedule，现在用的是HP的tidal调度wf的执行。 2.designer里面的router类似于SSIS中的broadcast（多播组件）;Reset_Workflow_Var：参数重置（比如说我这个参数初始是1在workflow跑得过程中变成了3我要在结束时还要
python 获取图片验证码中文字酷的飞上天空 python
根据现成的开源项目 http://code.google.com/p/pytesser/改写在window上用easy_install安装不上看了下源码发现代码很少于是就想自己改写一下添加支持网络图片的直接解析 #coding:utf-8 #import sys #reload(sys) #sys.s
AJAX 永夜-极光 Ajax
1.AJAX功能:动态更新页面,减少流量消耗,减轻服务器负担 2.代码结构: <html> <head> <script type="text/javascript"> function loadXMLDoc() { .... AJAX script goes here ...
创业OR读研随便小屋创业
现在研一，有种想创业的想法，不知道该不该去实施。因为对于的我情况这两者是矛盾的，可能就是鱼与熊掌不能兼得。研一的生活刚刚过去两个月，我们学校主要的是
需求做得好与坏直接关系着程序员生活质量 aijuans IT 生活
这个故事还得从去年换工作的事情说起，由于自己不太喜欢第一家公司的环境我选择了换一份工作。去年九月份我入职现在的这家公司，专门从事金融业内软件的开发。十一月份我们整个项目组前往北京做现场开发，从此苦逼的日子开始了。系统背景：五月份就有同事前往甲方了解需求一直到6月份，后续几个月也完
如何定义和区分高级软件开发工程师 aoyouzi
在软件开发领域，高级开发工程师通常是指那些编写代码超过 3 年的人。这些人可能会被放到领导的位置，但经常会产生非常糟糕的结果。Matt Briggs 是一名高级开发工程师兼 Scrum 管理员。他认为，单纯使用年限来划分开发人员存在问题，两个同样具有 10 年开发经验的开发人员可能大不相同。近日，他发表了一篇博文，根据开发者所能发挥的作用划分软件开发工程师的成长阶段。　　初
Servlet的请求与响应百合不是茶 servlet get提交 java处理post提交
Servlet是tomcat中的一个重要组成,也是负责客户端和服务端的中介 1,Http的请求方式(get ,post); 客户端的请求一般都会都是Servlet来接受的,在接收之前怎么来确定是那种方式提交的,以及如何反馈,Servlet中有相应的方法, http的get方式 servlet就是都doGet(
web.xml配置详解之listener bijian1013 java web.xml listener
一.定义 <listener> <listen-class>com.myapp.MyListener</listen-class> </listener> 二.作用该元素用来注册一个监听器类。可以收到事件什么时候发生以及用什么作为响
Web页面性能优化（yahoo技术） Bill_chen JavaScript Ajax Web css Yahoo
1.尽可能的减少HTTP请求数 content 2.使用CDN server 3.添加Expires头(或者 Cache-control) server 4.Gzip 组件 server 5.把CSS样式放在页面的上方。 css 6.将脚本放在底部(包括内联的) javascript 7.避免在CSS中使用Expressions css 8.将javascript和css独立成外部文
【MongoDB学习笔记八】MongoDB游标、分页查询、查询结果排序 bit1129 mongodb
游标游标，简单的说就是一个查询结果的指针。游标作为数据库的一个对象，使用它是包括声明打开循环抓去一定数目的文档直到结果集中的所有文档已经抓取完关闭游标游标的基本用法，类似于JDBC的ResultSet(hasNext判断是否抓去完,next移动游标到下一条文档)，在获取一个文档集时，可以提供一个类似JDBC的FetchSize
ORA-12514 TNS 监听程序当前无法识别连接描述符中请求服务的解决方法白糖_ ORA-12514
今天通过Oracle SQL*Plus连接远端服务器的时候提示“监听程序当前无法识别连接描述符中请求服务”，遂在网上找到了解决方案： ①打开Oracle服务器安装目录\NETWORK\ADMIN\listener.ora文件，你会看到如下信息： # listener.ora Network Configuration File: D:\database\Oracle\net
Eclipse 问题 A resource exists with a different case bozch eclipse
在使用Eclipse进行开发的时候，出现了如下的问题： Description Resource Path Location TypeThe project was not built due to "A resource exists with a different case: '/SeenTaoImp_zhV2/bin/seentao'.&
编程之美-小飞的电梯调度算法 bylijinnan 编程之美
public class AptElevator { /** * 编程之美小飞电梯调度算法 * 在繁忙的时间，每次电梯从一层往上走时，我们只允许电梯停在其中的某一层。 * 所有乘客都从一楼上电梯，到达某层楼后，电梯听下来，所有乘客再从这里爬楼梯到自己的目的层。 * 在一楼时，每个乘客选择自己的目的层，电梯则自动计算出应停的楼层。 * 问：电梯停在哪
SQL注入相关概念 chenbowen00 sql Web 安全
SQL Injection：就是通过把SQL命令插入到Web表单递交或输入域名或页面请求的查询字符串，最终达到欺骗服务器执行恶意的SQL命令。具体来说，它是利用现有应用程序，将（恶意）的SQL命令注入到后台数据库引擎执行的能力，它可以通过在Web表单中输入（恶意）SQL语句得到一个存在安全漏洞的网站上的数据库，而不是按照设计者意图去执行SQL语句。首先让我们了解什么时候可能发生SQ
[光与电]光子信号战防御原理 comsci 原理
无论是在战场上,还是在后方,敌人都有可能用光子信号对人体进行控制和攻击,那么采取什么样的防御方法,最简单,最有效呢? 我们这里有几个山寨的办法,可能有些作用,大家如果有兴趣可以去实验一下根据光
oracle 11g新特性:Pending Statistics daizj oracle dbms_stats
oracle 11g新特性:Pending Statistics 转从11g开始，表与索引的统计信息收集完毕后，可以选择收集的统信息立即发布，也可以选择使新收集的统计信息处于pending状态，待确定处于pending状态的统计信息是安全的，再使处于pending状态的统计信息发布，这样就会避免一些因为收集统计信息立即发布而导致SQL执行计划走错的灾难。在 11g 之前的版本中，D
快速理解RequireJs dengkane jquery requirejs
RequireJs已经流行很久了，我们在项目中也打算使用它。它提供了以下功能：声明不同js文件之间的依赖可以按需、并行、延时载入js库可以让我们的代码以模块化的方式组织初看起来并不复杂。在html中引入requirejs 在HTML中，添加这样的 <script> 标签： <script src="/path/to
C语言学习四流程控制if条件选择、for循环和强制类型转换 dcj3sjt126com c
# include <stdio.h> int main(void) { int i, j; scanf("%d %d", &i, &j); if (i > j) printf("i大于j\n"); else printf("i小于j\n"); retu
dictionary的使用要注意 dcj3sjt126com IO
NSDictionary *dict = [NSDictionary dictionaryWithObjectsAndKeys: user.user_id , @"id", user.username , @"username",
Android 中的资源访问(Resource) finally_m xml android String drawable color
简单的说，Android中的资源是指非代码部分。例如，在我们的Android程序中要使用一些图片来设置界面，要使用一些音频文件来设置铃声，要使用一些动画来显示特效，要使用一些字符串来显示提示信息。那么，这些图片、音频、动画和字符串等叫做Android中的资源文件。在Eclipse创建的工程中，我们可以看到res和assets两个文件夹，是用来保存资源文件的，在assets中保存的一般是原生
Spring使用Cache、整合Ehcache 234390216 spring cache ehcache @Cacheable
Spring使用Cache 从3.1开始，Spring引入了对Cache的支持。其使用方法和原理都类似于Spring对事务管理的支持。Spring Cache是作用在方法上的，其核心思想是这样的：当我们在调用一个缓存方法时会把该方法参数和返回结果作为一个键值对存放在缓存中，等到下次利用同样的
当druid遇上oracle blob(clob) jackyrong oracle
http://blog.csdn.net/renfufei/article/details/44887371 众所周知，Oracle有很多坑, 所以才有了去IOE。在使用Druid做数据库连接池后，其实偶尔也会碰到小坑，这就是使用开源项目所必须去填平的。【如果使用不开源的产品，那就不是坑，而是陷阱了，你都不知道怎么去填坑】用Druid连接池，通过JDBC往Oracle数据库的
easyui datagrid pagination获得分页页码、总页数等信息 ldzyz007
var grid = $('#datagrid'); var options = grid.datagrid('getPager').data("pagination").options; var curr = options.pageNumber; var total = options.total; var max =
浅析awk里的数组 nigelzeng 二维数组 array 数组 awk
awk绝对是文本处理中的神器，它本身也是一门编程语言，还有许多功能本人没有使用到。这篇文章就单单针对awk里的数组来进行讨论，如何利用数组来帮助完成文本分析。有这么一组数据： abcd,91#31#2012-12-31 11:24:00 case_a,136#19#2012-12-31 11:24:00 case_a,136#23#2012-12-31 1
搭建 CentOS 6 服务器(6) - TigerVNC rensanning centos
安装GNOME桌面环境 # yum groupinstall "X Window System" "Desktop" 安装TigerVNC # yum -y install tigervnc-server tigervnc 启动VNC服务 # /etc/init.d/vncserver restart # vncser
Spring 数据库连接整理 tomcat_oracle spring bean jdbc
1、数据库连接jdbc.properties配置详解　　jdbc.url=jdbc:hsqldb:hsql://localhost/xdb 　　jdbc.username=sa 　　jdbc.password= 　　jdbc.driver=不同的数据库厂商驱动，此处不一一列举　　接下来，详细配置代码如下：　　 Spring连接池
Dom4J解析使用xpath java.lang.NoClassDefFoundError: org/jaxen/JaxenException异常 xp9802
用Dom4J解析xml,以前没注意,今天使用dom4j包解析xml时在xpath使用处报错异常栈：java.lang.NoClassDefFoundError: org/jaxen/JaxenException异常导入包 jaxen-1.1-beta-6.jar 解决; &nb