cunzai1985

h.265系列快速操作指南_H2O-快速指南

h.265系列快速操作指南

H2O-快速指南 (H2O - Quick Guide)

H2O-简介 (H2O - Introduction)

Have you ever been asked to develop a Machine Learning model on a huge database? Typically, the customer will provide you the database and ask you to make certain predictions such as who will be the potential buyers; if there can be an early detection of fraudulent cases, etc. To answer these questions, your task would be to develop a Machine Learning algorithm that would provide an answer to the customer’s query. Developing a Machine Learning algorithm from scratch is not an easy task and why should you do this when there are several ready-to-use Machine Learning libraries available in the market.

您是否曾被要求在庞大的数据库上开发机器学习模型？通常，客户将向您提供数据库，并要求您做出某些预测，例如谁将成为潜在买家；如果可以及早发现欺诈案件等。要回答这些问题，您的任务是开发一种机器学习算法，为客户的查询提供答案。从头开始开发机器学习算法并不是一件容易的事，为什么在市场上有几个现成的机器学习库可用时，为什么要这样做呢？

These days, you would rather use these libraries, apply a well-tested algorithm from these libraries and look at its performance. If the performance were not within acceptable limits, you would try to either fine-tune the current algorithm or try an altogether different one.

如今，您宁愿使用这些库，从这些库中应用经过测试的算法，并查看其性能。如果性能不在可接受的范围内，则可以尝试微调当前算法或尝试完全不同的算法。

Likewise, you may try multiple algorithms on the same dataset and then pick up the best one that satisfactorily meets the customer’s requirements. This is where H2O comes to your rescue. It is an open source Machine Learning framework with full-tested implementations of several widely-accepted ML algorithms. You just have to pick up the algorithm from its huge repository and apply it to your dataset. It contains the most widely used statistical and ML algorithms.

同样，您可以在同一个数据集上尝试多种算法，然后选择满意地满足客户要求的最佳算法。这就是H2O拯救您的地方。它是一个开放源代码的机器学习框架，其中包含对几种广为接受的ML算法进行全面测试的实现。您只需要从庞大的存储库中提取算法并将其应用于数据集即可。它包含使用最广泛的统计和ML算法。

To mention a few here it includes gradient boosted machines (GBM), generalized linear model (GLM), deep learning and many more. Not only that it also supports AutoML functionality that will rank the performance of different algorithms on your dataset, thus reducing your efforts of finding the best performing model. H2O is used worldwide by more than 18000 organizations and interfaces well with R and Python for your ease of development. It is an in-memory platform that provides superb performance.

这里仅举几例，其中包括梯度提升机(GBM)，广义线性模型(GLM)，深度学习等等。它不仅还支持AutoML功能，该功能将对数据集上不同算法的性能进行排名，从而减少了寻找最佳性能模型的工作。 H2O在全球范围内有18000多家组织使用，并且可以轻松地与R和Python进行接口。它是一个提供出色性能的内存平台。

In this tutorial, you will first learn to install the H2O on your machine with both Python and R options. We will understand how to use this in the command line so that you understand its working line-wise. If you are a Python lover, you may use Jupyter or any other IDE of your choice for developing H2O applications. If you prefer R, you may use RStudio for development.

在本教程中，您将首先学习同时使用Python和R选项在计算机上安装H2O。我们将了解如何在命令行中使用它，以便您逐行理解它的工作方式。如果您是Python爱好者，则可以使用Jupyter或您选择的任何其他IDE来开发H2O应用程序。如果您更喜欢R，则可以使用RStudio进行开发。

In this tutorial, we will consider an example to understand how to go about working with H2O. We will also learn how to change the algorithm in your program code and compare its performance with the earlier one. The H2O also provides a web-based tool to test the different algorithms on your dataset. This is called Flow.

在本教程中，我们将考虑一个示例，以了解如何使用H2O。我们还将学习如何在程序代码中更改算法，并将其性能与早期算法进行比较。 H2O还提供了基于Web的工具来测试数据集上的不同算法。这称为流。

The tutorial will introduce you to the use of Flow. Alongside, we will discuss the use of AutoML that will identify the best performing algorithm on your dataset. Are you not excited to learn H2O? Keep reading!

本教程将向您介绍Flow的用法。同时，我们将讨论AutoML的使用，该方法将识别数据集上性能最佳的算法。您对学习H2O感到不兴奋吗？继续阅读！

H2O-安装 (H2O - Installation)

H2O can be configured and used with five different options as listed below −

可以配置H2O并使用以下五个不同的选项-

Install in Python
在Python中安装
Install in R
在R中安装
Web-based Flow GUI
基于Web的Flow GUI
Hadoop
Hadoop
Anaconda Cloud
Python云

In our subsequent sections, you will see the instructions for installation of H2O based on the options available. You are likely to use one of the options.

在我们的后续章节中，您将根据可用选项查看安装H2O的说明。您可能会使用其中一个选项。

在Python中安装 (Install in Python)

To run H2O with Python, the installation requires several dependencies. So let us start installing the minimum set of dependencies to run H2O.

要使用Python运行H2O，安装需要几个依赖项。因此，让我们开始安装最小的依赖关系集以运行H2O。

安装依赖项 (Installing Dependencies)

To install a dependency, execute the following pip command −

要安装依赖项，请执行以下pip命令-


$ pip install requests

Open your console window and type the above command to install the requests package. The following screenshot shows the execution of the above command on our Mac machine −

打开控制台窗口，然后键入以上命令以安装请求包。以下屏幕截图显示了在Mac机器上执行上述命令的过程-

After installing requests, you need to install three more packages as shown below −

安装请求后，您需要再安装三个软件包，如下所示：


$ pip install tabulate
$ pip install "colorama >= 0.3.8"
$ pip install future

The most updated list of dependencies is available on H2O GitHub page. At the time of this writing, the following dependencies are listed on the page.

H2O GitHub页面上提供了最新的依赖关系列表。在撰写本文时，页面上列出了以下依赖项。


python 2. H2O — Installation
pip >= 9.0.1
setuptools
colorama >= 0.3.7
future >= 0.15.2

删除旧版本 (Removing Older Versions)

After installing the above dependencies, you need to remove any existing H2O installation. To do so, run the following command −

安装以上依赖项后，您需要删除所有现有的H2O安装。为此，请运行以下命令-


$ pip uninstall h2o

安装最新版本 (Installing the Latest Version)

Now, let us install the latest version of H2O using the following command −

现在，让我们使用以下命令安装最新版本的H2O-


$ pip install -f http://h2o-release.s3.amazonaws.com/h2o/latest_stable_Py.html h2o

After successful installation, you should see the following message display on the screen −

成功安装后，您应该在屏幕上看到以下消息显示-


Installing collected packages: h2o
Successfully installed h2o-3.26.0.1

测试安装 (Testing the Installation)

To test the installation, we will run one of the sample applications provided in the H2O installation. First start the Python prompt by typing the following command −

为了测试安装，我们将运行H2O安装中提供的示例应用程序之一。首先通过键入以下命令来启动Python提示符-


$ Python3

Once the Python interpreter starts, type the following Python statement on the Python command prompt −

Python解释器启动后，在Python命令提示符下键入以下Python语句-


>>>import h2o

The above command imports the H2O package in your program. Next, initialize the H2O system using the following command −

上面的命令将H2O软件包导入程序中。接下来，使用以下命令初始化H2O系统-


>>>h2o.init()

Your screen would show the cluster information and should look the following at this stage −

您的屏幕将显示集群信息，并且在此阶段应显示以下内容：

Now, you are ready to run the sample code. Type the following command on the Python prompt and execute it.

现在，您可以运行示例代码了。在Python提示符下键入以下命令并执行它。


>>>h2o.demo("glm")

The demo consists of a Python notebook with a series of commands. After executing each command, its output is shown immediately on the screen and you will be asked to hit the key to continue with the next step. The partial screenshot on executing the last statement in the notebook is shown here −

该演示由一个带有一系列命令的Python笔记本组成。执行完每个命令后，其输出将立即显示在屏幕上，并且将要求您按一下键以继续下一步。在此处显示有关在笔记本中执行最后一条语句的部分屏幕截图-

At this stage your Python installation is complete and you are ready for your own experimentation.

在这一阶段，您的Python安装已完成，并且可以进行自己的实验了。

在R中安装 (Install in R)

Installing H2O for R development is very much similar to installing it for Python, except that you would be using R prompt for the installation.

为R开发安装H2O与为Python安装非常相似，除了您将使用R提示符进行安装。

启动R Console (Starting R Console)

Start R console by clicking on the R application icon on your machine. The console screen would appear as shown in the following screenshot −

通过单击计算机上的R应用程序图标来启动R控制台。控制台屏幕将出现，如以下屏幕截图所示-

Your H2O installation would be done on the above R prompt. If you prefer using RStudio, type the commands in the R console subwindow.

您的H2O安装将在上述R提示符下完成。如果您更喜欢使用RStudio，请在R控制台子窗口中键入命令。

删除旧版本 (Removing Older Versions)

To begin with, remove older versions using the following command on the R prompt −

首先，在R提示符下使用以下命令删除旧版本-


> if ("package:h2o" %in% search()) { detach("package:h2o", unload=TRUE) }
> if ("h2o" %in% rownames(installed.packages())) { remove.packages("h2o") }

下载依赖项 (Downloading Dependencies)

Download the dependencies for H2O using the following code −

使用以下代码下载H2O的依赖关系-


> pkgs <- c("RCurl","jsonlite")
for (pkg in pkgs) {
   if (! (pkg %in% rownames(installed.packages()))) { install.packages(pkg) }
}

安装水 (Installing H2O)

Install H2O by typing the following command on the R prompt −

通过在R提示符下键入以下命令来安装H2O-


> install.packages("h2o", type = "source", repos = (c("http://h2o-release.s3.amazonaws.com/h2o/latest_stable_R")))

The following screenshot shows the expected output −

以下屏幕截图显示了预期的输出-

There is another way of installing H2O in R.

还有另一种在R中安装H2O的方法。

从CRAN在R中安装 (Install in R from CRAN)

To install R from CRAN, use the following command on R prompt −

要从CRAN安装R，请在R提示符下使用以下命令-


> install.packages("h2o")

You will be asked to select the mirror −

您将被要求选择镜子-


--- Please select a CRAN mirror for use in this session ---

A dialog box displaying the list of mirror sites is shown on your screen. Select the nearest location or the mirror of your choice.

屏幕上会显示一个对话框，其中显示了镜像站点列表。选择最近的位置或您选择的镜子。

测试安装 (Testing Installation)

On the R prompt, type and run the following code −

在R提示符下，键入并运行以下代码-


> library(h2o)
> localH2O = h2o.init()
> demo(h2o.kmeans)

The output generated will be as shown in the following screenshot −

生成的输出将如以下屏幕截图所示-

Your H2O installation in R is complete now.

R中的H2O安装现已完成。

安装Web GUI流 (Installing Web GUI Flow)

To install GUI Flow download the installation file from the H20 site. Unzip the downloaded file in your preferred folder. Note the presence of h2o.jar file in the installation. Run this file in a command window using the following command −

要安装GUI Flow，请从H20站点下载安装文件。将下载的文件解压缩到您的首选文件夹中。请注意在安装中存在h2o.jar文件。使用以下命令在命令窗口中运行此文件-


$ java -jar h2o.jar

After a while, the following will appear in your console window.

一段时间后，以下内容将出现在控制台窗口中。


07-24 16:06:37.304 192.168.1.18:54321 3294 main INFO: H2O started in 7725ms
07-24 16:06:37.304 192.168.1.18:54321 3294 main INFO:
07-24 16:06:37.305 192.168.1.18:54321 3294 main INFO: Open H2O Flow in your web browser: http://192.168.1.18:54321
07-24 16:06:37.305 192.168.1.18:54321 3294 main INFO:

To start the Flow, open the given URL http://localhost:54321 in your browser. The following screen will appear −

要启动流，请在浏览器中打开给定的URL http：// localhost：54321 。将出现以下屏幕-

At this stage, your Flow installation is complete.

至此，您的Flow安装完成。

在Hadoop / Anaconda Cloud上安装 (Install on Hadoop / Anaconda Cloud)

Unless you are a seasoned developer, you would not think of using H2O on Big Data. It is sufficient to say here that H2O models run efficiently on huge databases of several terabytes. If your data is on your Hadoop installation or in the Cloud, follow the steps given on H2O site to install it for your respective database.

除非您是经验丰富的开发人员，否则您不会考虑在大数据上使用H2O。在这里足以说H2O模型可以在数TB的大型数据库上高效运行。如果您的数据在Hadoop安装中或在Cloud中，请按照H2O站点上给出的步骤为各自的数据库安装数据。

Now that you have successfully installed and tested H2O on your machine, you are ready for real development. First, we will see the development from a Command prompt. In our subsequent lessons, we will learn how to do model testing in H2O Flow.

既然您已经在计算机上成功安装并测试了H2O，那么就可以进行实际开发了。首先，我们将在Command提示符下看到开发情况。在接下来的课程中，我们将学习如何在H2O Flow中进行模型测试。

在命令提示符下进行开发 (Developing in Command Prompt)

Let us now consider using H2O to classify plants of the well-known iris dataset that is freely available for developing Machine Learning applications.

现在让我们考虑使用H2O对可免费用于开发机器学习应用程序的著名虹膜数据集的植物进行分类。

Start the Python interpreter by typing the following command in your shell window −

通过在您的shell窗口中键入以下命令来启动Python解释器-


$ Python3

This starts the Python interpreter. Import h2o platform using the following command −

这将启动Python解释器。使用以下命令导入h2o平台-


>>> import h2o

We will use Random Forest algorithm for classification. This is provided in the H2ORandomForestEstimator package. We import this package using the import statement as follows −

我们将使用随机森林算法进行分类。这在H2ORandomForestEstimator包中提供。我们使用import语句如下导入这个包：


>>> from h2o.estimators import H2ORandomForestEstimator

We initialize the H2o environment by calling its init method.

我们通过调用其init方法来初始化H2o环境。


>>> h2o.init()

On successful initialization, you should see the following message on the console along with the cluster information.

成功初始化后，您应该在控制台上看到以下消息以及集群信息。


Checking whether there is an H2O instance running at http://localhost:54321 . connected.

Now, we will import the iris data using the import_file method in H2O.

现在，我们将在H2O中使用import_file方法导入虹膜数据。


>>> data = h2o.import_file('iris.csv')

The progress will display as shown in the following screenshot −

进度将显示，如以下屏幕截图所示-

After the file is loaded in the memory, you can verify this by displaying the first 10 rows of the loaded table. You use the head method to do so −

将文件加载到内存中后，您可以通过显示已加载表的前10行来验证这一点。您使用head方法这样做-


>>> data.head()

You will see the following output in tabular format.

您将以表格格式看到以下输出。

The table also displays the column names. We will use the first four columns as the features for our ML algorithm and the last column class as the predicted output. We specify this in the call to our ML algorithm by first creating the following two variables.

该表还显示列名。我们将使用前四列作为ML算法的功能，并使用最后一列类作为预测的输出。通过首先创建以下两个变量，我们在ML算法的调用中指定了这一点。


>>> features = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
>>> output = 'class'

Next, we split the data into training and testing by calling the split_frame method.

接下来，我们通过调用split_frame方法将数据分为训练和测试。


>>> train, test = data.split_frame(ratios = [0.8])

The data is split in the 80:20 ratio. We use 80% data for training and 20% for testing.

数据以80:20的比例分割。我们将80％的数据用于培训，将20％的数据用于测试。

Now, we load the built-in Random Forest model into the system.

现在，我们将内置的随机森林模型加载到系统中。


>>> model = H2ORandomForestEstimator(ntrees = 50, max_depth = 20, nfolds = 10)

In the above call, we set the number of trees to 50, the maximum depth for the tree to 20 and number of folds for cross validation to 10. We now need to train the model. We do so by calling the train method as follows −

在上面的调用中，我们将树的数量设置为50，将树的最大深度设置为20，将交叉验证的折叠数设置为10。现在，我们需要训练模型。我们通过如下调用train方法来做到这一点-


>>> model.train(x = features, y = output, training_frame = train)

The train method receives the features and the output that we created earlier as first two parameters. The training dataset is set to train, which is the 80% of our full dataset. During training, you will see the progress as shown here −

训练方法接收特征和我们之前创建的输出作为前两个参数。训练数据集设置为训练，这是我们完整数据集的80％。在训练期间，您将看到如下所示的进度-

Now, as the model building process is over, it is time to test the model. We do this by calling the model_performance method on the trained model object.

现在，随着模型构建过程的结束，是时候测试模型了。我们通过在训练好的模型对象上调用model_performance方法来实现。


>>> performance = model.model_performance(test_data=test)

In the above method call, we sent test data as our parameter.

在上述方法调用中，我们发送了测试数据作为参数。

It is time now to see the output, which is the performance of our model. You do this by simply printing the performance.

现在是时候看到输出了，这是我们模型的性能。您可以通过简单地打印演奏来做到这一点。


>>> print (performance)

This will give you the following output −

这将为您提供以下输出-

The output shows the Mean Square Error (MSE), Root Mean Square Error (RMSE), LogLoss and even the Confusion Matrix.

输出显示均方误差(MSE)，均方根误差(RMSE)，LogLoss甚至混淆矩阵。

在Jupyter中运行 (Running in Jupyter)

We have seen the execution from the command and also understood the purpose of each line of code. You may run the entire code in a Jupyter environment, either line by line or the whole program at a time. The complete listing is given here −

我们已经从命令中看到了执行过程，并且也了解了每一行代码的用途。您可以在Jupyter环境中逐行或一次运行整个程序来运行整个代码。完整的清单在这里给出-


import h2o
from h2o.estimators import H2ORandomForestEstimator
h2o.init()
data = h2o.import_file('iris.csv')
features = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
output = 'class'
train, test = data.split_frame(ratios=[0.8])
model = H2ORandomForestEstimator(ntrees = 50, max_depth = 20, nfolds = 10)
model.train(x = features, y = output, training_frame = train)
performance = model.model_performance(test_data=test)
print (performance)

Run the code and observe the output. You can now appreciate how easy it is to apply and test a Random Forest algorithm on your dataset. The power of H20 goes far beyond this capability. What if you want to try another model on the same dataset to see if you can get better performance. This is explained in our subsequent section.

运行代码并观察输出。现在，您可以了解在数据集上应用和测试随机森林算法有多么容易。 H20的功能远远超出了此功能。如果要在同一数据集上尝试另一个模型，看看是否可以获得更好的性能该怎么办。这将在我们的后续部分中进行解释。

应用不同的算法 (Applying a Different Algorithm)

Now, we will learn how to apply a Gradient Boosting algorithm to our earlier dataset to see how it performs. In the above full listing, you will need to make only two minor changes as highlighted in the code below −

现在，我们将学习如何将梯度增强算法应用于我们之前的数据集，以了解其性能。在上面的完整清单中，您只需要进行两个较小的更改，如下面的代码中突出显示的那样：


import h2o 
from h2o.estimators import H2OGradientBoostingEstimator
h2o.init()
data = h2o.import_file('iris.csv')
features = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
output = 'class'
train, test = data.split_frame(ratios = [0.8]) 
model = H2OGradientBoostingEstimator
(ntrees = 50, max_depth = 20, nfolds = 10)
model.train(x = features, y = output, training_frame = train)
performance = model.model_performance(test_data = test)
print (performance)

Run the code and you will get the following output −

运行代码，您将获得以下输出-

Just compare the results like MSE, RMSE, Confusion Matrix, etc. with the previous output and decide on which one to use for production deployment. As a matter of fact, you can apply several different algorithms to decide on the best one that meets your purpose.

只需将MSE，RMSE，Confusion Matrix等结果与之前的输出进行比较，然后决定使用哪一个进行生产部署即可。实际上，您可以应用几种不同的算法来确定最适合您目的的算法。

H2O-流量 (H2O - Flow)

In the last lesson, you learned to create H2O based ML models using command line interface. H2O Flow fulfils the same purpose, but with a web-based interface.

在上一课中，您学习了如何使用命令行界面创建基于H2O的ML模型。 H2O Flow可以实现相同的目的，但是具有基于Web的界面。

In the following lessons, I will show you how to start H2O Flow and to run a sample application.

在以下课程中，我将向您展示如何启动H2O Flow和如何运行示例应用程序。

启动H2O流动 (Starting H2O Flow)

The H2O installation that you downloaded earlier contains the h2o.jar file. To start H2O Flow, first run this jar from the command prompt −

您先前下载的H2O安装包含h2o.jar文件。要启动H2O Flow，请首先在命令提示符下运行此jar-


$ java -jar h2o.jar

When the jar runs successfully, you will get the following message on the console −

当jar成功运行时，您将在控制台上收到以下消息-


Open H2O Flow in your web browser: http://192.168.1.10:54321

Now, open the browser of your choice and type the above URL. You would see the H2O web-based desktop as shown here −

现在，打开您选择的浏览器并输入上面的URL。您将看到基于H2O网络的桌面，如下所示-

This is basically a notebook similar to Colab or Jupyter. I will show you how to load and run a sample application in this notebook while explaining the various features in Flow. Click on the view example Flows link on the above screen to see the list of provided examples.

这基本上是类似于Colab或Jupyter的笔记本。在说明Flow的各种功能时，我将向您展示如何在此笔记本中加载和运行示例应用程序。单击上面屏幕上的查看示例流链接，以查看提供的示例列表。

I will describe the Airlines delay Flow example from the sample.

我将从样本中描述航空公司延误流程示例。

H2O-运行示例应用程序 (H2O - Running Sample Application)

Click on the Airlines Delay Flow link in the list of samples as shown in the screenshot below −

单击样本列表中的Airlines Delay Flow链接，如以下屏幕截图所示-

After you confirm, the new notebook would be loaded.

确认后，将加载新笔记本。

清除所有输出 (Clearing All Outputs)

Before we explain the code statements in the notebook, let us clear all the outputs and then run the notebook gradually. To clear all outputs, select the following menu option −

在解释笔记本中的代码语句之前，让我们清除所有输出，然后逐步运行笔记本。要清除所有输出，请选择以下菜单选项-


Flow / Clear All Cell Contents

This is shown in the following screenshot −

这显示在以下屏幕截图中-

Once all outputs are cleared, we will run each cell in the notebook individually and examine its output.

清除所有输出后，我们将分别运行笔记本中的每个单元并检查其输出。

运行第一个单元 (Running the First Cell)

Click the first cell. A red flag appears on the left indicating that the cell is selected. This is as shown in the screenshot below −

单击第一个单元格。左侧会出现一个红色标记，指示已选中该单元格。如下面的屏幕截图所示-

The contents of this cell are just the program comment written in MarkDown (MD) language. The content describes what the loaded application does. To run the cell, click the Run icon as shown in the screenshot below −

该单元格的内容只是用MarkDown(MD)语言编写的程序注释。内容描述了已加载的应用程序的功能。要运行单元格，请单击“运行”图标，如下面的屏幕截图所示-

You will not see any output underneath the cell as there is no executable code in the current cell. The cursor now moves automatically to the next cell, which is ready to execute.

您将不会在该单元格下方看到任何输出，因为当前单元格中没有可执行代码。光标现在自动移动到下一个可以执行的单元格。

汇入资料 (Importing Data)

The next cell contains the following Python statement −

下一个单元格包含以下Python语句-


importFiles ["https://s3.amazonaws.com/h2o-airlines-unpacked/allyears2k.csv"]

The statement imports the allyears2k.csv file from Amazon AWS into the system. When you run the cell, it imports the file and gives you the following output.

该语句从Amazon AWS导入allyears2k.csv文件到系统中。运行单元时，它将导入文件并提供以下输出。

设置数据解析器 (Setting Up Data Parser)

Now, we need to parse the data and make it suitable for our ML algorithm. This is done using the following command −

现在，我们需要解析数据，使其适合我们的ML算法。这是使用以下命令完成的-


setupParse paths: [ "https://s3.amazonaws.com/h2o-airlines-unpacked/allyears2k.csv" ]

Upon execution of the above statement, a setup configuration dialog appears. The dialog allows you several settings for parsing the file. This is as shown in the screenshot below −

执行上述语句后，将出现一个设置配置对话框。该对话框允许您使用多种设置来解析文件。如下面的屏幕截图所示-

In this dialog, you can select the desired parser from the given drop-down list and set other parameters such as the field separator, etc.

在此对话框中，您可以从给定的下拉列表中选择所需的解析器，并设置其他参数，例如字段分隔符等。

解析数据 (Parsing Data)

The next statement, which actually parses the datafile using the above configuration, is a long one and is as shown here −

下一条实际上使用上述配置解析数据文件的语句很长，如下所示：


parseFiles
paths: ["https://s3.amazonaws.com/h2o-airlines-unpacked/allyears2k.csv"]
destination_frame: "allyears2k.hex"
parse_type: "CSV"
separator: 44
number_columns: 31
single_quotes: false
column_names: ["Year","Month","DayofMonth","DayOfWeek","DepTime","CRSDepTime",
   "ArrTime","CRSArrTime","UniqueCarrier","FlightNum","TailNum",
   "ActualElapsedTime","CRSElapsedTime","AirTime","ArrDelay","DepDelay",
   "Origin","Dest","Distance","TaxiIn","TaxiOut","Cancelled","CancellationCode",
   "Diverted","CarrierDelay","WeatherDelay","NASDelay","SecurityDelay",
   "LateAircraftDelay","IsArrDelayed","IsDepDelayed"]
column_types: ["Enum","Enum","Enum","Enum","Numeric","Numeric","Numeric"
   ,"Numeric","Enum","Enum","Enum","Numeric","Numeric","Numeric","Numeric",
   "Numeric","Enum","Enum","Numeric","Numeric","Numeric","Enum","Enum",
   "Numeric","Numeric","Numeric","Numeric","Numeric","Numeric","Enum","Enum"]
delete_on_done: true
check_header: 1
chunk_size: 4194304

Observe that the parameters you have set up in the configuration box are listed in the above code. Now, run this cell. After a while, the parsing completes and you will see the following output −

请注意，以上代码中列出了您在配置框中设置的参数。现在，运行此单元格。一段时间后，解析完成，您将看到以下输出-

检查数据框 (Examining Dataframe)

After the processing, it generates a dataframe, which can be examined using the following statement −

处理之后，它将生成一个数据帧，可以使用以下语句对其进行检查-


getFrameSummary "allyears2k.hex"

Upon execution of the above statement, you will see the following output −

执行以上语句后，您将看到以下输出-

Now, your data is ready to be fed into a Machine Learning algorithm.

现在，您的数据已准备好输入到机器学习算法中。

The next statement is a program comment that says we will be using the regression model and specifies the preset regularization and the lambda values.

下一条语句是程序注释，该注释表明我们将使用回归模型并指定预设正则化和lambda值。

建立模型 (Building the Model)

Next, comes the most important statement and that is building the model itself. This is specified in the following statement −

接下来，是最重要的声明，那就是构建模型本身。这在以下语句中指定-


buildModel 'glm', {
   "model_id":"glm_model","training_frame":"allyears2k.hex",
   "ignored_columns":[
      "DayofMonth","DepTime","CRSDepTime","ArrTime","CRSArrTime","TailNum",
      "ActualElapsedTime","CRSElapsedTime","AirTime","ArrDelay","DepDelay",
      "TaxiIn","TaxiOut","Cancelled","CancellationCode","Diverted","CarrierDelay",
      "WeatherDelay","NASDelay","SecurityDelay","LateAircraftDelay","IsArrDelayed"],
   "ignore_const_cols":true,"response_column":"IsDepDelayed","family":"binomial",
   "solver":"IRLSM","alpha":[0.5],"lambda":[0.00001],"lambda_search":false,
   "standardize":true,"non_negative":false,"score_each_iteration":false,
   "max_iterations":-1,"link":"family_default","intercept":true,
   "objective_epsilon":0.00001,"beta_epsilon":0.0001,"gradient_epsilon":0.0001,
   "prior":-1,"max_active_predictors":-1
}

We use glm, which is a Generalized Linear Model suite with family type set to binomial. You can see these highlighted in the above statement. In our case, the expected output is binary and that is why we use the binomial type. You may examine the other parameters by yourself; for example, look at alpha and lambda that we had specified earlier. Refer to the GLM model documentation for the explanation of all the parameters.

我们使用glm，这是一个通用线性模型套件，其族类型设置为二项式。您可以在上面的语句中看到突出显示的内容。在我们的例子中，期望的输出是二进制的，这就是为什么我们使用二项式的原因。您可以自己检查其他参数。例如，查看我们之前指定的alpha和lambda。有关所有参数的说明，请参阅GLM模型文档。

Now, run this statement. Upon execution, the following output will be generated −

现在，运行此语句。执行后，将生成以下输出-

Certainly, the execution time would be different on your machine. Now, comes the most interesting part of this sample code.

当然，您的计算机上的执行时间会有所不同。现在，这个示例代码中最有趣的部分出现了。

检查输出 (Examining Output)

We simply output the model that we have built using the following statement −

我们只需使用以下语句输出已构建的模型-


getModel "glm_model"

Note the glm_model is the model ID that we specified as model_id parameter while building the model in the previous statement. This gives us a huge output detailing the results with several varying parameters. A partial output of the report is shown in the screenshot below −

请注意，glm_model是我们在上一条语句中构建模型时指定为model_id参数的模型ID。这为我们提供了巨大的输出，其中详细说明了具有多个可变参数的结果。该报告的部分输出显示在下面的屏幕截图中-

As you can see in the output, it says that this is the result of running the Generalized Linear Modeling algorithm on your dataset.

正如您在输出中看到的那样，它表示这是在数据集上运行通用线性建模算法的结果。

Right above the SCORING HISTORY, you see the MODEL PARAMETERS tag, expand it and you will see the list of all parameters that are used while building the model. This is shown in the screenshot below.

在“评分历史记录”的正上方，您会看到“模型参数”标签，将其展开，您将看到构建模型时使用的所有参数的列表。如下面的屏幕快照所示。

Likewise, each tag provides a detailed output of a specific type. Expand the various tags yourself to study the outputs of different kinds.

同样，每个标签都提供特定类型的详细输出。自己扩展各种标签，以研究不同种类的输出。

建立另一个模型 (Building Another Model)

Next, we will build a Deep Learning model on our dataframe. The next statement in the sample code is just a program comment. The following statement is actually a model building command. It is as shown here −

接下来，我们将在数据框架上构建深度学习模型。示例代码中的下一条语句只是程序注释。以下语句实际上是模型构建命令。如下所示-


buildModel 'deeplearning', {
   "model_id":"deeplearning_model","training_frame":"allyear
   s2k.hex","ignored_columns":[
      "DepTime","CRSDepTime","ArrTime","CRSArrTime","FlightNum","TailNum",
      "ActualElapsedTime","CRSElapsedTime","AirTime","ArrDelay","DepDelay",
      "TaxiIn","TaxiOut","Cancelled","CancellationCode","Diverted",
      "CarrierDelay","WeatherDelay","NASDelay","SecurityDelay",
      "LateAircraftDelay","IsArrDelayed"],
   "ignore_const_cols":true,"res   ponse_column":"IsDepDelayed",
   "activation":"Rectifier","hidden":[200,200],"epochs":"100",
   "variable_importances":false,"balance_classes":false,
   "checkpoint":"","use_all_factor_levels":true,
   "train_samples_per_iteration":-2,"adaptive_rate":true,
   "input_dropout_ratio":0,"l1":0,"l2":0,"loss":"Automatic","score_interval":5,
   "score_training_samples":10000,"score_duty_cycle":0.1,"autoencoder":false,
   "overwrite_with_best_model":true,"target_ratio_comm_to_comp":0.02,
   "seed":6765686131094811000,"rho":0.99,"epsilon":1e-8,"max_w2":"Infinity",
   "initial_weight_distribution":"UniformAdaptive","classification_stop":0,
   "diagnostics":true,"fast_mode":true,"force_load_balance":true,
   "single_node_mode":false,"shuffle_training_data":false,"missing_values_handling":
   "MeanImputation","quiet_mode":false,"sparse":false,"col_major":false,
   "average_activation":0,"sparsity_beta":0,"max_categorical_features":2147483647,
   "reproducible":false,"export_weights_and_biases":false
}

As you can see in the above code, we specify deeplearning for building the model with several parameters set to the appropriate values as specified in the documentation of deeplearning model. When you run this statement, it will take longer time than the GLM model building. You will see the following output when the model building completes, albeit with different timings.

如您在上面的代码中看到的，我们指定了深度学习来构建模型，其中多个参数设置为深度学习模型文档中指定的适当值。当您运行此语句时，将花费比GLM模型构建更长的时间。尽管建立模型的时间不同，但您将在模型构建完成时看到以下输出。

检查深度学习模型输出 (Examining Deep Learning Model Output)

This generates the kind of output, which can be examined using the following statement as in the earlier case.

这将生成一种输出，可以像以前的情况一样使用以下语句检查该输出。


getModel "deeplearning_model"

We will consider the ROC curve output as shown below for quick reference.

我们将考虑如下所示的ROC曲线输出，以供快速参考。

Like in the earlier case, expand the various tabs and study the different outputs.

与之前的情况一样，展开各个选项卡并研究不同的输出。

保存模型 (Saving the Model)

After you have studied the output of different models, you decide to use one of those in your production environment. H20 allows you to save this model as a POJO (Plain Old Java Object).

在研究了不同模型的输出之后，您决定在生产环境中使用其中之一。 H20允许您将此模型另存为POJO(普通的旧Java对象)。

Expand the last tag PREVIEW POJO in the output and you will see the Java code for your fine-tuned model. Use this in your production environment.

在输出中扩展最后一个标签PREVIEW POJO，您将看到微调模型的Java代码。在生产环境中使用它。

Next, we will learn about a very exciting feature of H2O. We will learn how to use AutoML to test and rank various algorithms based on their performance.

接下来，我们将学习H2O的一个非常令人兴奋的功能。我们将学习如何使用AutoML来根据性能对各种算法进行测试和排名。

H2O-AutoML (H2O - AutoML)

To use AutoML, start a new Jupyter notebook and follow the steps shown below.

要使用AutoML，请启动新的Jupyter笔记本并按照以下步骤操作。

导入AutoML (Importing AutoML)

First import H2O and AutoML package into the project using the following two statements −

首先使用以下两个语句将H2O和AutoML包导入项目：


import h2o
from h2o.automl import H2OAutoML

初始化H2O (Initialize H2O)

Initialize h2o using the following statement −

使用以下语句初始化h2o-


h2o.init()

You should see the cluster information on the screen as shown in the screenshot below −

您应该在屏幕上看到群集信息，如下面的屏幕快照所示-

加载数据中 (Loading Data)

We will use the same iris.csv dataset that you used earlier in this tutorial. Load the data using the following statement −

我们将使用与本教程前面使用的相同的iris.csv数据集。使用以下语句加载数据-


data = h2o.import_file('iris.csv')

准备数据集 (Preparing Dataset)

We need to decide on the features and the prediction columns. We use the same features and the predication column as in our earlier case. Set the features and the output column using the following two statements −

我们需要确定特征和预测列。我们使用与先前案例相同的功能和谓词列。使用以下两个语句设置功能部件和输出列：


features = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
output = 'class'

Split the data in 80:20 ratio for training and testing −

以80:20的比例拆分数据以进行培训和测试-


train, test = data.split_frame(ratios=[0.8])

应用AutoML (Applying AutoML)

Now, we are all set for applying AutoML on our dataset. The AutoML will run for a fixed amount of time set by us and give us the optimized model. We set up the AutoML using the following statement −

现在，我们已经准备好将AutoML应用于我们的数据集。 AutoML将在我们设置的固定时间内运行，并为我们提供优化的模型。我们使用以下语句设置AutoML-


aml = H2OAutoML(max_models = 30, max_runtime_secs=300, seed = 1)

The first parameter specifies the number of models that we want to evaluate and compare.

第一个参数指定我们要评估和比较的模型数量。

The second parameter specifies the time for which the algorithm runs.

第二个参数指定算法运行的时间。

We now call the train method on the AutoML object as shown here −

我们现在在AutoML对象上调用train方法，如下所示：


aml.train(x = features, y = output, training_frame = train)

We specify the x as the features array that we created earlier, the y as the output variable to indicate the predicted value and the dataframe as train dataset.

我们将x指定为我们先前创建的特征数组，将y指定为输出变量以指示预测值，并将数据框指定为训练数据集。

Run the code, you will have to wait for 5 minutes (we set the max_runtime_secs to 300) until you get the following output −

运行代码，您将不得不等待5分钟(我们将max_runtime_secs设置为300)，直到获得以下输出-

打印排行榜 (Printing the Leaderboard)

When the AutoML processing completes, it creates a leaderboard ranking all the 30 algorithms that it has evaluated. To see the first 10 records of the leaderboard, use the following code −

AutoML处理完成后，它将创建一个排行榜，对已评估的所有30种算法进行排名。要查看排行榜的前10条记录，请使用以下代码-


lb = aml.leaderboard
lb.head()

Upon execution, the above code will generate the following output −

执行后，上面的代码将生成以下输出-

Clearly, the DeepLearning algorithm has got the maximum score.

显然，DeepLearning算法获得了最高分。

预测测试数据 (Predicting on Test Data)

Now, you have the models ranked, you can see the performance of the top-rated model on your test data. To do so, run the following code statement −

现在，您已经对模型进行了排名，您可以在测试数据上看到顶级模型的性能。为此，请运行以下代码语句-


preds = aml.predict(test)

The processing continues for a while and you will see the following output when it completes.

处理持续一会儿，完成后您将看到以下输出。

打印结果 (Printing Result)

Print the predicted result using the following statement −

使用以下语句打印预测结果-


print (preds)

Upon execution of the above statement, you will see the following result −

执行以上语句后，您将看到以下结果-

打印所有人的排名 (Printing the Ranking for All)

If you want to see the ranks of all the tested algorithms, run the following code statement −

如果要查看所有经过测试的算法的排名，请运行以下代码语句-


lb.head(rows = lb.nrows)

Upon execution of the above statement, the following output will be generated (partially shown) −

执行上述语句后，将生成以下输出(部分显示)-

结论 (Conclusion)

H2O provides an easy-to-use open source platform for applying different ML algorithms on a given dataset. It provides several statistical and ML algorithms including deep learning. During testing, you can fine tune the parameters to these algorithms. You can do so using command-line or the provided web-based interface called Flow. H2O also supports AutoML that provides the ranking amongst the several algorithms based on their performance. H2O also performs well on Big Data. This is definitely a boon for Data Scientist to apply the different Machine Learning models on their dataset and pick up the best one to meet their needs.

H2O提供了一个易于使用的开源平台，可以在给定的数据集上应用不同的ML算法。它提供了包括深度学习在内的几种统计和ML算法。在测试期间，您可以将参数微调为这些算法。您可以使用命令行或提供的名为Flow的基于Web的界面来执行此操作。 H2O还支持AutoML，后者可根据其性能在几种算法之间进行排名。 H2O在大数据上也表现出色。对于数据科学家来说，将不同的机器学习模型应用于其数据集并挑选出最能满足他们需求的模型无疑是一个福音。

翻译自: https://www.tutorialspoint.com/h2o/h2o_quick_guide.htm

h.265系列快速操作指南

你可能感兴趣的:(大数据,python,机器学习,人工智能,深度学习)

深度应用场景：DeepSeek —— 探索AI赋能的智慧未来人工智能专属驿站人工智能
深度应用场景：DeepSeek——探索AI赋能的智慧未来随着人工智能的迅猛发展，数据的价值已不再局限于简单的存储与处理，它们正变得更加智能与高效。DeepSeek，这一创新的AI技术平台，正以其独特的深度学习能力，开启了各行各业的智能化变革。让我们走进一个由DeepSeek打造的深度应用场景，探索它如何推动未来的发展。1.智能医疗：精准诊断，拯救生命想象一下，医生们不再是唯一的诊断专家，而是与AI
如何用 python 获取实时的股票数据？_python efinance(2) 元点三 2024年程序员学习 python java linux
先自我介绍一下，小编浙江大学毕业，去过华为、字节跳动等大厂，目前阿里P7深知大多数程序员，想要提升技能，往往是自己摸索成长，但自己不成体系的自学效果低效又漫长，而且极易碰到天花板技术停滞不前！因此收集整理了一份《2024年最新Python全套学习资料》，初衷也很简单，就是希望能够帮助到想自学提升又不知道该从何学起的朋友。既有适合小白学习的零基础资料，也有适合3年以上经验的小伙伴深入学习提升的进阶课
如何用 python 获取实时的股票数据？_python efinance，2024年最新pdf面试简历元点三 2024年程序员学习 python pdf 面试
先自我介绍一下，小编浙江大学毕业，去过华为、字节跳动等大厂，目前阿里P7深知大多数程序员，想要提升技能，往往是自己摸索成长，但自己不成体系的自学效果低效又漫长，而且极易碰到天花板技术停滞不前！因此收集整理了一份《2024年最新Python全套学习资料》，初衷也很简单，就是希望能够帮助到想自学提升又不知道该从何学起的朋友。既有适合小白学习的零基础资料，也有适合3年以上经验的小伙伴深入学习提升的进阶课
python中enumerate()函数的用法 neu_张康
python中enumerate()函数的用法enumerate是翻译过来是枚举的意思，看下它的方法原型：enumerate(sequence,start=0)，返回一个枚举对象。sequence必须是序列或迭代器iterator，或者支持迭代的对象。enumerate()返回对象的每个元素都是一个元组，每个元组包括两个值，一个是计数，一个是sequence的值，计数是从start开始的，star
卷积神经网络之AlexNet经典神经网络，实现手写数字0~9识别知识鱼丸深度学习神经网络 cnn 人工智能深度学习 AlexNet 经典神经网络
深度学习中较为常见的神经网络模型AlexNet，AlexNet是一个采用GPU训练的深层CNN，本质是种LeNet变体。由特征提取层的5个卷积层两个下采样层和分类器中的三个全连接层构成。先看原理：AlexNet网络特点采用ReLU激活函数，使训练速度提升6倍采用dropout层，防止模型过拟合通过平移和翻转的方式对数据进行增强采用LRN局部响应归一化，限制数据大小，防止梯度消失和爆炸。但后续证明批
【python】懒人福利，通过Python的JIRA库操作JIRA，自动批量提交关闭bug，提高效率 bulabula2022 #CI持续集成 Python jira
简介：Jira是目前比较流行的基于Java架构的管理系统（Atlassian公司支持），有开源代码，方便做二次开发（可扩展性）。Jira是一款功能非常强大的管理工具，广泛的用来缺陷跟踪、用例管理、需求收集、任务跟踪、工时管理、项目计划管理等工作领域。python有支持操作Jira的第三方包，方便自定义一些自动化操作。需要安装jira库：pipinstalljiraJira认证fromjiraimp
在 DeepSeek 驱动的编程变革中抓住机遇并脱颖而出智想天开 AI技术人工智能 deep learning
公众号地址:在DeepSeek驱动的编程变革中抓住机遇并脱颖而出更多内容请关注公众号：智想天开前言在DeepSeek引领的新一轮AI技术革新中，程序员们正面临着前所未有的挑战。随着DeepSeek等人工智能工具的迅猛发展，编程领域正在发生深刻变革。这些先进的工具不仅能够自动化完成繁重的代码生成和调试任务，还能够根据大量数据提供优化建议，改变了传统编程的工作流程。虽然这些技术为提高工作效率和解放开发
项目管理新趋势！2024年，Jira与Codes你更倾向谁？ Codes_AndyLiu jira teambition redmine 项目管理软件项目管理工具项目管理 jira 国产平替
一、项目管理软件新趋势概述2024年，项目管理软件呈现出诸多新趋势，这些趋势对于项目管理的重要性日益凸显。在数字化转型方面，项目管理软件成为企业实现数字化转型的关键工具。让老板感知数据，让中层管理者感受先进，让基层员工感到舒心.人工智能与自动化在项目管理软件中的应用也越来越广泛。项目管理软件正朝着智能化、自动化的方向迈进，利用AI技术提供个性化和场景化解决方案。例如，工作周报AI化，自动化测试，代
使用Python获取在线股票交易网站的实时交易数据嵌入式开发项目 2025年爬虫精通专栏 python 开发语言爬虫
目录步骤1：选择股票交易网站步骤2：使用requests库发送HTTP请求步骤3：解析HTML内容步骤4：提取实时交易数据步骤5：存储和使用数据在金融市场中，实时交易数据对于投资者来说具有重要的价值。实时的股票价格、交易量和其他市场指标可以帮助投资者做出更准确的决策，同时也是进行金融分析和建模的重要数据源。在本篇博客中，我们将学习如何使用Python获取在线股票交易网站的实时交易数据。在开始之前，
【python】连接Jira获取token以及jira对象唐古乌梁海 python jira
此脚本可以连接Jira，通过Jira的token，Jira对象可以实现与Jira的交互，从而完成jira与pytest的交互，或者其他自动化测试框架也行，例如：将pytest运行结果推送jira；将jira用例与自动化测试用例建立映射关系，将功能用例对应的自动化测试用例脚本路径推送到功能用例的描述栏，或者自动化栏里面#!/usr/bin/envpython#-*-coding:utf-8-*-#@
【人工智能】提升编程效率的6种GPT实用应用技巧！保姆级讲解！ ChatGPT-千鑫人工智能 AI领域人工智能 gpt AI编程
文章目录实用教程：六大AI编程技巧解锁效率提升技巧1：快速实现需求demo操作步骤技巧2：代码审查——AI帮你提升代码质量操作步骤技巧3：错误排查——AI助你快速定位问题操作步骤技巧4：代码注释——AI帮你理解复杂逻辑操作步骤技巧5：数据整理——AI帮你高效准备测试数据操作步骤技巧6：学习未知代码库——AI助你快速掌握新工具操作步骤使用教程：全面掌握CodeMoss的高效编程工具（1）VSCode
利用人工智能增强可读性：自动为文本添加标点符号姚家湾 AI 标点符号
在数字通信时代，文本的清晰度和可读性至关重要。无论是转录口语、处理原始文本数据还是改进用户生成的内容，标点符号在传达预期信息方面都起着至关重要的作用。但是，手动编辑文本以添加标点符号可能非常耗时且容易出错。这就是人工智能(AI)发挥作用的地方，它提供了一种强大的解决方案，可以自动将标点符号插入句子中。目前，利用大模型的能力，完全可以胜任添加标点符号的工作，不需要其它特别的处理程序。参考代码from
Python 基础-循环赔罪 Python 系统学习 python windows 服务器
目录简介breakcontinue小结简介要计算1+2+3，我们可以直接写表达式：>>>1+2+36要计算1+2+3+...+10，勉强也能写出来。但是，要计算1+2+3+...+10000，直接写表达式就不可能了。为了让计算机能计算成千上万次的重复运算，我们就需要循环语句。Python的循环有两种，一种是for...in循环，依次把list或tuple中的每个元素迭代出来，看例子：names=[
用 TensorFlow 搭建简单的手写数字识别模型 lozhyf 工作面试学习 tensorflow 人工智能 python
一、引言手写数字识别是机器学习领域中一个经典且基础的问题，它在很多实际场景中都有广泛的应用，比如邮政系统中的邮件分拣、银行支票金额识别等。TensorFlow是一个强大的开源机器学习框架，由Google开发并维护，它提供了丰富的工具和接口，能帮助我们快速搭建和训练深度学习模型。在这篇博客中，我们将使用TensorFlow构建一个简单的神经网络模型，用于识别手写数字。二、环境准备在开始之前，你需要安
【FastAPI 】FastAPI 模板：提供静态文件 iFakeCoder Flask fastapi python 开发语言
FastAPI是一个现代、快速（高性能）的Web框架，用于基于标准Python类型提示使用Python3.7+构建API。虽然它的主要用例是构建API，但FastAPI还可以轻松提供静态文件和HTML模板，从而让您可以构建全栈Web应用程序。在此博客中，我们将探讨如何使用FastAPI提供静态文件。我们将介绍基础知识并提供演示以帮助您入门。为什么要提供静态文件？静态文件是不经常更改的资产，并按原样
深入解析：Tableau在数据可视化中的高级应用 Echo_Wish 实战高阶大数据信息可视化数据分析数据挖掘
深入解析：Tableau在数据可视化中的高级应用引言在大数据时代，数据可视化已成为数据分析中不可或缺的一部分。作为一款广受欢迎的数据可视化工具，Tableau以其强大的功能和灵活性，赢得了众多数据分析师的青睐。然而，许多人在使用Tableau时，仅停留在基本操作层面，未能充分发挥其潜力。本文将深入探讨Tableau的高级应用，展示其在复杂数据分析中的强大能力，并以具体实例说明其实际应用效果。数据预
深度学习在医疗影像分析中的革命性应用 Echo_Wish 人工智能前沿技术深度学习人工智能
深度学习在医疗影像分析中的革命性应用引言医疗影像分析是现代医学中不可或缺的一部分，特别是在疾病诊断和治疗过程中发挥了至关重要的作用。随着深度学习技术的发展，医疗影像分析的效率和准确性得到了显著提升。本文将探讨如何利用深度学习技术，特别是Python编程语言，来优化医疗影像分析，展示具体的代码实例，并举例说明其实际应用效果。深度学习与医疗影像分析深度学习（DeepLearning）是一种基于人工神经
DeepSeek使用中的问题及解决方案（部分） WeiLai1112 DeepSeek 人工智能
1.模型部署与配置问题问题1：环境依赖冲突现象：安装模型依赖库时出现版本不兼容（如Python、PyTorch版本冲突）。解决方案：使用虚拟环境（如conda或venv）隔离依赖。严格按照官方文档的版本要求安装依赖，例如：condacreate-ndeepseekpython=3.9condaactivatedeepseekpipinstalltorch==2.0.1transformers==4
python whoosh clisy python 开源搜索
原文地址：http://whoosh.ca/wikiWhoosh:高效的纯python全文搜索组件Whoosh是一个纯python实现的全文搜索组件。Whoosh不但功能完善，还非常的快。Whoosh的作者是MattChaput，由SideEffectsSoftware公司开发。项目的最初用于Houdini（SideEffectsSoftware公司开发的3D动画软件）的在线帮助系统。SideEf
Python性能优化：懒加载与其他高级技巧车载testing pytest数据驱动框架开发 python python 数据库开发语言
Python性能优化：懒加载与其他高级技巧在软件开发中，我们经常会遇到一些需要大量资源或时间来初始化的对象。如果这些对象在程序的整个生命周期中只被使用一次或很少使用，那么在程序启动时就立即初始化它们将是一种资源浪费。什么是懒加载？懒加载是一种设计模式，它推迟了对象的初始化直到其被实际需要的时候。这种方式可以提高程序的启动速度，减少内存消耗，并在某些情况下提高性能。实现懒加载的步骤定义类和属性：首先
Click：构建Python命令行界面的利器车载testing python python linux 开发语言
Click：构建Python命令行界面的利器Click是一个Python包，它允许开发者以最少的代码创建出美观、功能丰富的命令行界面（CLI）。它以其高度的可配置性、合理的默认设置以及简洁的API而受到广泛欢迎。本文将详细介绍Click的核心API组件，并提供示例代码，帮助你快速掌握Click的基本用法。1.Decorators（装饰器）装饰器是Click中用于定义命令和参数的强大工具。click
Python 队列的使用：掌握先进先出的数据结构车载testing python
Python队列的使用：掌握先进先出的数据结构队列是一种先进先出（FIFO）的数据结构，它在多种编程场景中都非常有用，比如任务调度、事件处理等。在Python中，我们可以通过标准库中的queue模块来实现队列。本文将详细介绍如何使用Python的queue模块来创建和操作队列。导入Queue模块使用queue模块之前，我们需要先导入它：fromqueueimportQueue创建队列创建一个队列实
Whoosh: 一个功能强大的纯Python全文搜索引擎富珂祯
Whoosh:一个功能强大的纯Python全文搜索引擎whooshWhooshisafast,featurefulfull-textindexingandsearchinglibraryimplementedinpurePython.项目地址:https://gitcode.com/gh_mirrors/wh/whooshWhoosh是一个快速且功能丰富的全文索引和搜索库，完全使用Python实现
pycdc 安装和配置指南左洋蔷Rory
pycdc安装和配置指南pycdcC++pythonbytecodedisassembleranddecompiler项目地址:https://gitcode.com/gh_mirrors/py/pycdc1.项目基础介绍和主要的编程语言项目名称:pycdc项目简介:pycdc是一个用C++编写的Python字节码反编译器和反汇编器。它的目标是帮助开发者将编译后的Python字节码（.pyc文件）
Whoosh：一款优秀的纯Python全文搜索库沈书苹Peter
Whoosh：一款优秀的纯Python全文搜索库whooshPure-Pythonfull-textsearchlibrary项目地址:https://gitcode.com/gh_mirrors/who/whooshWhoosh是一个快速、功能丰富的全文索引和搜索库，完全使用Python编写。它允许程序员轻松地将搜索功能添加到他们的应用程序和网站中。项目基础介绍Whoosh是一个纯Python项
yolov8人脸识别与脸部关键点检测（代码+原理） QQ_1309399183 计算机视觉实战项目集锦 YOLO 人工智能人脸识别 yolo人脸检测
YOLOv8脸部识别是一个基于YOLOv8算法的人脸检测项目，旨在实现快速、准确地检测图像和视频中的人脸。该项目是对YOLOv8算法的扩展和优化，专门用于人脸检测任务。YOLOv8是一种基于深度学习的目标检测算法，通过将目标检测问题转化为一个回归问题，可以实现实时的目标检测。YOLOv8Face项目在YOLOv8的基础上进行了改进，使其更加适用于人脸检测。以下是YOLOv8Face项目的一些特点和
deepseek_各个版本django特性终是蝶衣梦晓楼 django 数据库 python
以下是Django2.0至5.0的主要区别总结，按版本特性分类说明：1.Django2.0的主要变化Python支持仅支持Python3.4+，不再兼容Python2.x。路由系统弃用url()，引入path()和re_path()替代，path()默认不支持正则表达式，但提供内置转换器（如）进行参数类型匹配。支持更简洁的URL配置语法（例如path('articles//',views.year
【机器学习】基于3D CNN通过CT图像分类预测肺炎 MUKAMO AI Python应用机器学习深度学习人工智能神经网络 3D CNN
1.引言1.1.研究背景在医学诊断中，医生通过分析CT影像来预测疾病时，面临一些挑战和局限性：图像信息的广度与复杂性：CT扫描生成的大量图像对医生来说既是信息的宝库也是处理上的负担。每组CT数据可能包含数百张切片，医生必须迅速审阅这些图像，以便捕捉到病变的微小细节。这种庞大的信息量要求医生在有限的时间内做出精准诊断，但同时也增加了漏诊或误诊的风险。部分容积效应也可能模糊小病变的边界，使得准确诊断变
一个Python的轻量级搜索工具--Whose Ai_绘画小南 python 开发语言
本文将简单介绍Python中的一个轻量级搜索工具Whoosh，并给出相应的使用示例代码。Whoosh简介Whoosh由MattChaput创建，它一开始是一个为Houdini3D动画软件包的在线文档提供简单、快速的搜索服务工具，之后便慢慢成为一个成熟的搜索解决工具并已开源。Whoosh纯由Python编写而成，是一个灵活的，方便的，轻量级的搜索引擎工具，现在同时支持Python2、3，其优点如下：
Selenium使用指南程序员杰哥 selenium 测试工具 python 测试用例职场和发展程序人生功能测试
概述selenium是网页应用中最流行的自动化测试工具，可以用来做自动化测试或者浏览器爬虫等。官网地址为：相对于另外一款web自动化测试工具QTP来说有如下优点：免费开源轻量级，不同语言只需要一个体积很小的依赖包支持多种系统，包括Windows，Mac，Linux支持多种浏览器，包括Chrome，FireFox，IE，safari，opera等支持多语言，包括Java，C，python，c#等主流
linux系统服务器下jsp传参数乱码 3213213333332132 java jsp linux windows xml
在一次解决乱码问题中，发现jsp在windows下用js原生的方法进行编码没有问题，但是到了linux下就有问题， escape,encodeURI,encodeURIComponent等都解决不了问题但是我想了下既然原生的方法不行，我用el标签的方式对中文参数进行加密解密总该可以吧。于是用了java的java.net.URLDecoder,结果还是乱码，最后在绝望之际，用了下面的方法解决了
Spring 注解区别以及应用 BlueSkator spring
1. @Autowired @Autowired是根据类型进行自动装配的。如果当Spring上下文中存在不止一个UserDao类型的bean，或者不存在UserDao类型的bean，会抛出 BeanCreationException异常，这时可以通过在该属性上再加一个@Qualifier注解来声明唯一的id解决问题。 2. @Qualifier 当spring中存在至少一个匹
printf和sprintf的应用 dcj3sjt126com PHP sprintf printf
<?php printf('b: %b c: %c d: %d <bf>f: %f', 80,80, 80, 80); echo ' '; printf('%0.2f %+d %0.2f ', 8, 8, 1235.456); printf('th
config.getInitParameter 171815164 parameter
web.xml <servlet> <servlet-name>servlet1</servlet-name> <jsp-file>/index.jsp</jsp-file> <init-param> <param-name>str</param-name>
Ant标签详解--基础操作 g21121 ant
Ant的一些核心概念： build.xml：构建文件是以XML 文件来描述的，默认构建文件名为build.xml。 project：每个构建文
[简单]代码片段_数据合并 53873039oycg 代码
合并规则:删除家长phone为空的记录,若一个家长对应多个孩子,保留一条家长记录,家长id修改为phone,对应关系也要修改。代码如下:
java 通信技术云端月影 Java 远程通信技术
在分布式服务框架中，一个最基础的问题就是远程服务是怎么通讯的，在Java领域中有很多可实现远程通讯的技术，例如：RMI、MINA、ESB、Burlap、Hessian、SOAP、EJB和JMS等，这些名词之间到底是些什么关系呢，它们背后到底是基于什么原理实现的呢，了解这些是实现分布式服务框架的基础知识，而如果在性能上有高的要求的话，那深入了解这些技术背后的机制就是必须的了，在这篇blog中我们将来
string与StringBuilder 性能差距到底有多大 aijuans
之前也看过一些对string与StringBuilder的性能分析，总感觉这个应该对整体性能不会产生多大的影响，所以就一直没有关注这块！由于学程序初期最先接触的string拼接，所以就一直没改变过自己的习惯！
今天碰到 java.util.ConcurrentModificationException 异常 antonyup_2006 java 多线程工作 IBM
今天改bug，其中有个实现是要对map进行循环，然后有删除操作，代码如下： Iterator<ListItem> iter = ItemMap.keySet.iterator(); while(iter.hasNext()){ ListItem it = iter.next(); //...一些逻辑操作 ItemMap.remove(it); } 结果运行报Con
PL/SQL的类型和JDBC操作数据库百合不是茶 PL/SQL表标量类型游标 PL/SQL记录
PL/SQL的标量类型: 字符,数字,时间,布尔,%type五中类型的 --标量：数据库中预定义类型的变量 --定义一个变长字符串 v_ename varchar2(10); --定义一个小数,范围 -9999.99~9999.99 v_sal number(6,2); --定义一个小数并给一个初始值为5.4 :=是pl/sql的赋值号
Mockito：一个强大的用于 Java 开发的模拟测试框架实例 bijian1013 mockito 单元测试
Mockito框架： Mockito是一个基于MIT协议的开源java测试框架。 Mockito区别于其他模拟框架的地方主要是允许开发者在没有建立“预期”时验证被测系统的行为。对于mock对象的一个评价是测试系统的测
精通Oracle10编程SQL(10)处理例外 bijian1013 oracle 数据库 plsql
/* *处理例外 */ --例外简介 --处理例外-传递例外 declare v_ename emp.ename%TYPE; begin SELECT ename INTO v_ename FROM emp where empno=&no; dbms_output.put_line('雇员名：'||v_ename); exceptio
【Java】Java执行远程机器上Linux命令 bit1129 linux命令
Java使用ethz通过ssh2执行远程机器Linux上命令，封装定义Linux机器的环境信息 package com.tom; import java.io.File; public class Env { private String hostaddr; //Linux机器的IP地址 private Integer po
java通信之Socket通信基础白糖_ java socket 网络协议
正处于网络环境下的两个程序，它们之间通过一个交互的连接来实现数据通信。每一个连接的通信端叫做一个Socket。一个完整的Socket通信程序应该包含以下几个步骤： ①创建Socket； ②打开连接到Socket的输入输出流； ④按照一定的协议对Socket进行读写操作； ④关闭Socket。 Socket通信分两部分：服务器端和客户端。服务器端必须优先启动，然后等待soc
angular.bind boyitech AngularJS angular.bind AngularJS API bind
angular.bind 描述：上下文，函数以及参数动态绑定，返回值为绑定之后的函数. 其中args是可选的动态参数，self在fn中使用this调用。使用方法： angular.bind(se
java-13个坏人和13个好人站成一圈，数到7就从圈里面踢出一个来，要求把所有坏人都给踢出来，所有好人都留在圈里。请找出初始时坏人站的位置。 bylijinnan java
import java.util.ArrayList; import java.util.List; public class KickOutBadGuys { /** * 题目：13个坏人和13个好人站成一圈，数到7就从圈里面踢出一个来，要求把所有坏人都给踢出来，所有好人都留在圈里。请找出初始时坏人站的位置。 * Maybe you can find out
Redis.conf配置文件及相关项说明（自查备用） Kai_Ge redis
Redis.conf配置文件及相关项说明 # Redis configuration file example # Note on units: when memory size is needed, it is possible to specifiy # it in the usual form of 1k 5GB 4M and so forth: #
[强人工智能]实现大规模拓扑分析是实现强人工智能的前奏 comsci 人工智能
真不好意思,各位朋友...博客再次更新... 节点数量太少,网络的分析和处理能力肯定不足,在面对机器人控制的需求方面,显得力不从心.... 但是,节点数太多,对拓扑数据处理的要求又很高,设计目标也很高,实现起来难度颇大...
记录一些常用的函数 dai_lm java
public static String convertInputStreamToString(InputStream is) { StringBuilder result = new StringBuilder(); if (is != null) try { InputStreamReader inputReader = new InputStreamRead
Hadoop中小规模集群的并行计算缺陷 datamachine mapreduce hadoop 并行计算
注：写这篇文章的初衷是因为Hadoop炒得有点太热，很多用户现有数据规模并不适用于Hadoop，但迫于扩容压力和去IOE（Hadoop的廉价扩展的确非常有吸引力）而尝试。尝试永远是件正确的事儿，但有时候不用太突进，可以调优或调需求，发挥现有系统的最大效用为上策。 -----------------------------------------------------------------
小学4年级英语单词背诵第二课 dcj3sjt126com english word
egg 蛋 twenty 二十 any 任何 well 健康的，好 twelve 十二 farm 农场 every 每一个 back 向后，回 fast 快速的 whose 谁的 much 许多 flower 花 watch 手表 very 非常，很 sport 运动 Chinese 中国的
自己实践了github的webhooks, linux上面的权限需要注意 dcj3sjt126com github webhook
环境, 阿里云服务器 1. 本地创建项目, push到github服务器上面 2. 生成www用户的密钥 sudo -u www ssh-keygen -t rsa -C "[email protected]" 3. 将密钥添加到github帐号的SSH_KEYS里面 3. 用www用户执行克隆, 源使
Java冒泡排序蕃薯耀冒泡排序 Java冒泡排序 Java排序
冒泡排序 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 蕃薯耀 2015年6月23日 10:40:14 星期二 http://fanshuyao.iteye.com/
Excle读取数据转换为实体List【基于apache-poi】 hanqunfeng apache
1.依赖apache-poi 2.支持xls和xlsx 3.支持按属性名称绑定数据值 4.支持从指定行、列开始读取 5.支持同时读取多个sheet 6.具体使用方式参见org.cpframework.utils.excelreader.CP_ExcelReaderUtilTest.java 比如： Str
3个处于草稿阶段的Javascript API介绍 jackyrong JavaScript
原文： http://www.sitepoint.com/3-new-javascript-apis-may-want-follow/?utm_source=html5weekly&utm_medium=email 本文中，介绍3个仍然处于草稿阶段，但应该值得关注的Javascript API. 1) Web Alarm API &
6个创建Web应用程序的高效PHP框架 lampcy Web 框架 PHP
以下是创建Web应用程序的PHP框架，有coder bay网站整理推荐： 1. CakePHP CakePHP是一个PHP快速开发框架，它提供了一个用于开发、维护和部署应用程序的可扩展体系。CakePHP使用了众所周知的设计模式，如MVC和ORM，降低了开发成本，并减少了开发人员写代码的工作量。 2. CodeIgniter CodeIgniter是一个非常小且功能强大的PHP框架，适合需
评"救市后中国股市新乱象泛起"谣言 nannan408
首先来看百度百家一位易姓作者的新闻：三个多星期来股市持续暴跌，跌得投资者及上市公司都处于极度的恐慌和焦虑中，都要寻找自保及规避风险的方式。面对股市之危机，政府突然进入市场救市，希望以此来重建市场信心，以此来扭转股市持续暴跌的预期。而政府进入市场后，由于市场运作方式发生了巨大变化，投资者及上市公司为了自保及为了应对这种变化，中国股市新的乱象也自然产生。首先，中国股市这两天
页面全屏遮罩的实现方式 Rainbow702 html css 遮罩 mask
之前做了一个页面，在点击了某个按钮之后，要求页面出现一个全屏遮罩，一开始使用了position:absolute来实现的。当时因为画面大小是固定的，不可以resize的，所以，没有发现问题。最近用了同样的做法做了一个遮罩，但是画面是可以进行resize的，所以就发现了一个问题，当画面被reisze到浏览器出现了滚动条的时候，就发现，用absolute 的做法是有问题的。后来改成fixed定位就
关于angularjs的点滴 tntxia AngularJS
angular是一个新兴的JS框架，和以往的框架不同的事，Angularjs更注重于js的建模，管理，同时也提供大量的组件帮助用户组建商业化程序，是一种值得研究的JS框架。 Angularjs使我们可以使用MVC的模式来写JS。Angularjs现在由谷歌来维护。这里我们来简单的探讨一下它的应用。首先使用Angularjs我
Nutz--->>反复新建ioc容器的后果 xiaoxiao1992428 DAO mvc IOC nutz
问题： public class DaoZ { public static Dao dao() { // 每当需要使用dao的时候就取一次 Ioc ioc = new NutIoc(new JsonLoader("dao.js")); return ioc.get(