mlflow模型生命周期_使用MLflow进行Tensorflow模型跟踪

mlflow模型生命周期

Developing a machine learning model is an iterative process consisting of multiple steps such as — model selections, model training, hyperparameter tuning, and deploying model into production. Tracking the model through these stages in an organized way helps in tracking various issues like —small changes in data, code, or hyperparameters that affect the overall model performance. But model tracking can be a non-trivial task that may get messy at times. MLflow is an open source platform to manage the ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry.

开发机器学习模型是一个迭代过程,包括多个步骤,例如模型选择,模型训练,超参数调整以及将模型部署到生产中。 以有组织的方式在这些阶段中跟踪模型有助于跟踪各种问题,例如影响整个模型性能的数据,代码或超参数的微小变化。 但是模型跟踪可能是一项不平凡的任务,有时可能会变得凌乱。 MLflow是一个开放源代码平台,用于管理ML生命周期,包括实验,可重复性,部署和中央模型注册表。

Although tensorflow has it’s own model tracking tool — tensorboard, mlflow provides a simpler interface for tracking the experiments, while also making it easier to push the trained model into production.

尽管tensorflow拥有自己的模型跟踪工具-tensorboard,但mlflow提供了一个更简单的界面来跟踪实验,同时还使将经过训练的模型投入生产变得更加容易。

Here, I will demonstrate how mlflow can be used to track tensorflow models using a remote tracking store. MLflow supports various tracking backend store. I will be using a MySQL database to store the experiments and model artifacts.

在这里,我将演示如何使用mlflow使用远程跟踪存储来跟踪张量流模型。 MLflow支持各种跟踪后端存储 。 我将使用MySQL数据库存储实验和模型工件。

设置MySQL服务器 (Setting Up MySQL server)

First we need to set up a mysql server. I will be using a Docker container to start the server on my local machine. You can pull the mysql-server docker image by running the following command:

首先,我们需要设置一个mysql服务器。 我将使用Docker容器在本地计算机上启动服务器。 您可以通过运行以下命令来提取mysql-server docker映像:

shell> docker pull mysql/mysql-server:latest

Once the docker image is pulled, we can create a docker container from this image:

提取docker映像后,我们可以从该映像创建docker容器:

shell> 

We map the ports 3306 and 33060 to the docker container so that we can access the database outside of the docker container. MySQL by default uses the port 3306 for access. We can see the details of the running container with docker ps command.

我们将端口3306和33060映射到docker容器,以便我们可以访问docker容器之外的数据库。 默认情况下,MySQL使用端口3306进行访问。 我们可以使用docker ps命令查看正在运行的容器的详细信息。

Now that our MySQL server is running, let’s do some access configurations for accessing the mysql server from outside the container. First, let’s configure the password for the root user. To do that we will need the automatically generated password for root.

现在我们MySQL服务器正在运行,让我们做一些访问配置,以从容器外部访问mysql服务器。 首先,让我们为root用户配置密码。 为此,我们需要自动生成root密码。

shell> docker logs mysql1 2>&1 | grep GENERATED
GENERATED ROOT PASSWORD: Axegh3kAJyDLaRuBemecis&EShOs

We will now run mysql command from inside the docker to access the mysql command shell as root:

现在,我们将从docker内部运行mysql命令,以root身份访问mysql命令外壳:

shell> 

Mysql will ask for the password, you must enter the password generated by mysql which we saw earlier. After this, we can change the password for the root user. Replace the string ‘password’ with the actual password that you want to set. To make the access available for the root user from outside the container, we will update the host value from ‘localhost’ to ‘%’.

Mysql会要求输入密码,您必须输入我们之前看到的mysql生成的密码。 此后,我们可以更改root用户的密码。 将字符串“ password”替换为您要设置的实际密码。 为了使用户可以从容器外部访问,我们将主机值从“ localhost”更新为“%”。

mysql> alter user 'root'@'localhost' identified by 'password';
mysql> update mysql.user set host = ‘%’ where user=’root’;

One last step, we will create a database called mlflow that mlflow will use to track the experiments and models.

最后一步,我们将创建一个名为mlflow的数据库,mlflow将使用该数据库来跟踪实验和模型。

mysql> create database mlflow

That’s it, now we should be able to access the mysql server that we just set up from outside the docker container. You can test the connection using: MySQL Workbench. I am not going to go over that in this post but it is pretty straight forward.

就是这样,现在我们应该能够从泊坞窗容器外部访问刚刚设置的mysql服务器。 您可以使用以下方法测试连接: MySQL Workbench 我不会在这篇文章中讨论这个问题,但是很简单。

编写Tensorflow培训代码 (Writing Tensorflow training code)

If you don’t have tensorflow and mlflow installed, then both these packages can be installed using pip install command:

如果您没有安装tensorflow和mlflow,则可以使用pip install命令安装这两个软件包:

shell> pip3 install tensorflow
shell> pip3 install tensorflow-datasets
shell> pip3 install mlflow
shell> pip3 install mysqlclient

I will be demonstrating the model training using Tensorflow’s keras API by training a simple image classification model on MNIST dataset.

我将通过在MNIST数据集上训练简单的图像分类模型来演示使用Tensorflow的keras API进行模型训练。

Let’s get to the code.

让我们看一下代码。

We will start by importing the required python modules:

我们将从导入所需的python模块开始:

import tensorflow as tf
import tensorflow_datasets as tfds
import mlflow

We use the set_tracking_uri() method to tell mlflow, where to store the training logs. The URI can be either a ‘/path/to/local/store’ or a SQLAlchemy databse URI:

我们使用set_tracking_uri()方法来告诉mlflow训练日志的存储位置。 URI可以是“ / path / to / local / store”或SQLAlchemy数据库的URI:

user = 'root'
pwd = 'password'
hostname = 'localhost'
port = 3306
database = 'mlflow'uri = 'mysql://{user}:{password}@{hostname}:{port}/{databse}'mlflow.set_tracking_uri(uri)

Mlflow stores all the runs under ‘default’ experiment name, by default. We can assign an experiment name by using the set_experiment() method and start_run() method to create a run in this experiment.

默认情况下,Mlflow将所有运行存储在“ 默认”实验名称下。 我们可以使用set_experiment()方法和start_run()方法来分配实验名称,以在此实验中创建运行。

mlflow.set_experiment(‘MNIST’)
mlflow.start_run(run_name=”Run_1")

Let’s load the data for training and validating the model:

让我们加载数据以训练和验证模型:

(ds_train, ds_test), ds_info = tfds.load(
‘mnist’,
split=[‘train’, ‘test’],
shuffle_files=True,
as_supervised=True,
with_info=True,
)# Normalizes images: `uint8` -> `float32`
def normalize_img(image, label):
return tf.cast(image, tf.float32) / 255., label# Train Dataset
ds_train = ds_train.map(
normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)
ds_train = ds_train.cache()
ds_train = ds_train.shuffle(ds_info.splits[‘train’].num_examples)
ds_train = ds_train.batch(128)
ds_train = ds_train.prefetch(tf.data.experimental.AUTOTUNE)# Test Dataset
ds_test = ds_test.map(
normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)
ds_test = ds_test.batch(128)
ds_test = ds_test.cache()
ds_test = ds_test.prefetch(tf.data.experimental.AUTOTUNE)

Now, let’s define a neural network model using the keras API:

现在,让我们使用keras API定义一个神经网络模型:

# Define the layers
inputs = tf.keras.Input(shape=(28, 28, 1))
hidden = tf.keras.layers.Flatten()(inputs)
hidden2 = tf.keras.layers.Dense(128, activation=’relu’)(hidden)
outputs = tf.keras.layers.Dense(10, activation=’softmax’)(hidden2)# Optimizer
opt = tf.keras.optimizers.Adam(learning_rate=0.002)# Create a Model object
model = tf.keras.Model(inputs, outputs)# Compile the model
model.compile(optimizer=opt, loss=’sparse_categorical_crossentropy’, metrics=[“accuracy”])

MLflow comes with strong bindings for major deep learning frameworks, including, but not limited to Tensorflow, PyTorch, Gluon and XGBoost. These bindings provide autolog feature which logs the model training automatically to mlflow run. This makes it super convenient to log all the hyperparameters, metrics, and even the trained model while training.

MLflow具有与主要深度学习框架的强大绑定,这些框架包括但不限于Tensorflow,PyTorch,Gluon和XGBoost。 这些绑定的autolog提供功能,它会自动记录模型训练mlflow运行。 这使得在训练时记录所有超参数,度量甚至训练过的模型变得非常方便。

import mlflow.tensorflow
mlflow.tensorflow.autolog(every_n_iter=2)

autolog takes one parameter, every_n_iter, which is the number of training epochs between every log of the training metrics. For example, if the value passed is 2, mlflow will log the training metrics (loss, accuracy, and validation loss etc.) every 2 epochs.

autolog采用一个参数every_n_iter,它是训练指标的每个日志之间的训练时期数。 例如,如果传递的值为2,则mlflow将每2个时期记录一次训练指标(损失,准确性和验证损失等)。

Now, we can just can use the model.fit() method to train our deep learning model.

现在,我们可以使用model.fit()方法来训练我们的深度学习模型。

model.fit(ds_train, epochs=100, validation_data=ds_test,
batch_size=128)

Once the model training is complete, we can end the run by calling:

模型训练完成后,我们可以通过调用以下命令结束运行:

mlflow.end_run()

And that’s it for the training code. We can now proceed to tracking the model and in metrics in the MLflow UI.

这就是培训代码。 现在,我们可以继续在MLflow UI中跟踪模型和指标。

在MLflow UI中进行跟踪 (Tracking in MLflow UI)

MLflow web UI can be started using the mlflow ui command. We will pass an additional parameter — backend-store-uri, which is nothing but the URI of the database from which we want mlflow to load the experiments. We use the same MySQL database URI as earlier:

可以使用mlflow ui命令启动MLflow Web UI。 我们将传递一个附加参数backend-store-uri ,它不过是我们希望mlflow从中加载实验的数据库的URI。 我们使用与之前相同MySQL数据库URI:

shell> mlflow ui --backend-store-uri ‘mysql://root:password@localhost:3306/mlflow’

The mlflow UI can be accessed at: http://localhost:5000.

可以在以下位置访问mlflow UI: http:// localhost:5000

In Experiments tab, you should be able to see our MNIST experiment. On clicking it, you can see all the runs for this experiment. MLflow gives a brief summary of each run on the experiment page. More detailed logs can be found on individual run page.

在“实验”标签中,您应该可以看到我们的MNIST实验。 点击它后,您可以看到该实验的所有运行。 MLflow在实验页面上简要介绍了每次运行。 可以在各个运行页面上找到更详细的日志。

MLflow Experiment tracking UI MLflow实验跟踪界面

MLflow also provides an automated plot generation capability for different metrics with many available customizations. Here, we visualize the model training loss.

MLflow还提供了许多可用的自定义功能,可针对不同指标提供自动绘图生成功能。 在这里,我们将模型训练损失可视化。

Metric Plotting 公制图

Finally, mlflow also provides model registry. Although, this feature only works when we use remote tracking (like how we use MySQL) instead of local tracking. To register a model, all one needs to do is go to model artifacts on the run page and click on register model, after selecting the model you want to register.

最后,mlflow还提供了模型注册表。 虽然,此功能仅在我们使用远程跟踪(例如我们如何使用MySQL)而不是本地跟踪时才有效。 要注册模型,所有需要做的就是在选择要注册的模型之后,在运行页面上进入模型工件,然后单击注册模型。

Models registered under same name are automatically versioned.

以相同名称注册的模型将自动进行版本控制。

结论 (Conclusion)

In this post, I have tried to cover the basics of how tensorflow models can be tracked using mlflow. I believe mlflow is an excellent tool for end-to-end machine learning model lifecycle tracking. Model Registry and deployment capability makes mlflow a convenient bridge between model development and deployment.

在本文中,我尝试介绍了如何使用mlflow跟踪tensorflow模型的基础知识。 我相信mlflow是用于端到端机器学习模型生命周期跟踪的出色工具。 模型注册表和部署功能使mlflow成为模型开发和部署之间的便捷桥梁。

翻译自: https://medium.com/analytics-vidhya/tensorflow-model-tracking-with-mlflow-e9de29c8e542

mlflow模型生命周期

你可能感兴趣的:(tensorflow,机器学习,python,人工智能,深度学习)