2019独角兽企业重金招聘Python工程师标准>>>
Jupyter on Kubernetes机器学习-MLFlow
- MLFlow使用教程,https://my.oschina.net/u/2306127/blog/1825690
- MLFlow官方文档,https://www.mlflow.org/docs/latest/quickstart.html
- 快速安装: pip install mlflow
#下载代码 !git clone https://github.com/databricks/mlflow
Cloning into 'mlflow'... remote: Counting objects: 830, done. remote: Compressing objects: 100% (22/22), done. remote: Total 830 (delta 5), reused 11 (delta 4), pack-reused 804 Receiving objects: 100% (830/830), 3.04 MiB | 28.00 KiB/s, done. Resolving deltas: 100% (339/339), done. Checking out files: 100% (279/279), done.
%%! export https_proxy=http://192.168.199.99:9999 echo $https_proxy #pip install mlflow
['http://192.168.199.99:9999']
!pip install mlflow
Collecting mlflow Downloading https://files.pythonhosted.org/packages/65/a0/082dcecdd76845ee8e97472741a5315e6dc697e2552935a73bdb6196d515/mlflow-0.2.1.tar.gz (4.3MB) 100% |████████████████████████████████| 4.3MB 153kB/s Collecting awscli (from mlflow) Downloading https://files.pythonhosted.org/packages/b5/dd/84d32d2275ea16daf09d561858dd0e615c56c9e8afb2e9b42d02bc45e417/awscli-1.15.51-py2.py3-none-any.whl (1.3MB) 100% |████████████████████████████████| 1.3MB 129kB/s Collecting click>=6.7 (from mlflow) Downloading https://files.pythonhosted.org/packages/34/c1/8806f99713ddb993c5366c362b2f908f18269f8d792aff1abfd700775a77/click-6.7-py2.py3-none-any.whl (71kB) 100% |████████████████████████████████| 71kB 102kB/s Collecting databricks-cli (from mlflow) Downloading https://files.pythonhosted.org/packages/58/78/4bda6f29a091ab7b0ad29efdba2491e5d0b56bd09d608857e6f0b799be48/databricks-cli-0.7.2.tar.gz Requirement already satisfied: requests>=2.17.3 in /opt/conda/lib/python3.6/site-packages (from mlflow) (2.19.1) Requirement already satisfied: six>=1.10.0 in /opt/conda/lib/python3.6/site-packages (from mlflow) (1.11.0) Collecting uuid (from mlflow) Downloading https://files.pythonhosted.org/packages/ce/63/f42f5aa951ebf2c8dac81f77a8edcc1c218640a2a35a03b9ff2d4aa64c3d/uuid-1.30.tar.gz Collecting gitpython (from mlflow) Downloading https://files.pythonhosted.org/packages/ac/c9/96d7c86c623cb065976e58c0f4898170507724d6b4be872891d763d686f4/GitPython-2.1.10-py2.py3-none-any.whl (449kB) 100% |████████████████████████████████| 450kB 108kB/s Collecting gunicorn (from mlflow) Downloading https://files.pythonhosted.org/packages/8c/da/b8dd8deb741bff556db53902d4706774c8e1e67265f69528c14c003644e6/gunicorn-19.9.0-py2.py3-none-any.whl (112kB) 100% |████████████████████████████████| 122kB 74kB/s Collecting Flask (from mlflow) Downloading https://files.pythonhosted.org/packages/7f/e7/08578774ed4536d3242b14dacb4696386634607af824ea997202cd0edb4b/Flask-1.0.2-py2.py3-none-any.whl (91kB) 100% |████████████████████████████████| 92kB 47kB/s Requirement already satisfied: numpy in /opt/conda/lib/python3.6/site-packages (from mlflow) (1.13.3) Requirement already satisfied: pandas in /opt/conda/lib/python3.6/site-packages (from mlflow) (0.23.1) Requirement already satisfied: scipy in /opt/conda/lib/python3.6/site-packages (from mlflow) (1.1.0) Requirement already satisfied: scikit-learn in /opt/conda/lib/python3.6/site-packages (from mlflow) (0.19.1) Requirement already satisfied: python-dateutil in /opt/conda/lib/python3.6/site-packages (from mlflow) (2.7.3) Collecting protobuf>=3.6.0 (from mlflow) Downloading https://files.pythonhosted.org/packages/fc/f0/db040681187496d10ac50ad167a8fd5f953d115b16a7085e19193a6abfd2/protobuf-3.6.0-cp36-cp36m-manylinux1_x86_64.whl (7.1MB) 100% |████████████████████████████████| 7.1MB 136kB/s Requirement already satisfied: pyyaml in /opt/conda/lib/python3.6/site-packages (from mlflow) (3.12) Collecting boto3 (from mlflow) Downloading https://files.pythonhosted.org/packages/59/f0/22554f0fc3aafd34e189919fd6a360d440fcaa6f86dedc9aaca904c885b1/boto3-1.7.50-py2.py3-none-any.whl (128kB) 100% |████████████████████████████████| 133kB 211kB/s Collecting querystring_parser (from mlflow) Downloading https://files.pythonhosted.org/packages/57/64/3086a9a991ff3aca7b769f5b0b51ff8445a06337ae2c58f215bcee48f527/querystring_parser-1.2.3.tar.gz Collecting docutils>=0.10 (from awscli->mlflow) Downloading https://files.pythonhosted.org/packages/36/fa/08e9e6e0e3cbd1d362c3bbee8d01d0aedb2155c4ac112b19ef3cae8eed8d/docutils-0.14-py3-none-any.whl (543kB) 100% |████████████████████████████████| 552kB 213kB/s Collecting botocore==1.10.50 (from awscli->mlflow) Downloading https://files.pythonhosted.org/packages/d5/9f/2e701a365b5ff0e8b664d6c393f3c61c20e52bb5148bbc2e27d737b890db/botocore-1.10.50-py2.py3-none-any.whl (4.4MB) 100% |████████████████████████████████| 4.4MB 221kB/s Requirement already satisfied: rsa<=3.5.0,>=3.1.2 in /opt/conda/lib/python3.6/site-packages (from awscli->mlflow) (3.4.2) Collecting colorama<=0.3.9,>=0.2.5 (from awscli->mlflow) Downloading https://files.pythonhosted.org/packages/db/c8/7dcf9dbcb22429512708fe3a547f8b6101c0d02137acbd892505aee57adf/colorama-0.3.9-py2.py3-none-any.whl Collecting s3transfer<0.2.0,>=0.1.12 (from awscli->mlflow) Downloading https://files.pythonhosted.org/packages/d7/14/2a0004d487464d120c9fb85313a75cd3d71a7506955be458eebfe19a6b1d/s3transfer-0.1.13-py2.py3-none-any.whl (59kB) 100% |████████████████████████████████| 61kB 267kB/s Collecting tabulate>=0.7.7 (from databricks-cli->mlflow) Downloading https://files.pythonhosted.org/packages/12/c2/11d6845db5edf1295bc08b2f488cf5937806586afe42936c3f34c097ebdc/tabulate-0.8.2.tar.gz (45kB) 100% |████████████████████████████████| 51kB 209kB/s Collecting configparser>=0.3.5 (from databricks-cli->mlflow) Downloading https://files.pythonhosted.org/packages/7c/69/c2ce7e91c89dc073eb1aa74c0621c3eefbffe8216b3f9af9d3885265c01c/configparser-3.5.0.tar.gz Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /opt/conda/lib/python3.6/site-packages (from requests>=2.17.3->mlflow) (3.0.4) Requirement already satisfied: idna<2.8,>=2.5 in /opt/conda/lib/python3.6/site-packages (from requests>=2.17.3->mlflow) (2.7) Requirement already satisfied: urllib3<1.24,>=1.21.1 in /opt/conda/lib/python3.6/site-packages (from requests>=2.17.3->mlflow) (1.23) Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.6/site-packages (from requests>=2.17.3->mlflow) (2018.4.16) Collecting gitdb2>=2.0.0 (from gitpython->mlflow) Downloading https://files.pythonhosted.org/packages/e0/95/c772c13b7c5740ec1a0924250e6defbf5dfdaee76a50d1c47f9c51f1cabb/gitdb2-2.0.3-py2.py3-none-any.whl (63kB) 100% |████████████████████████████████| 71kB 701kB/s Collecting itsdangerous>=0.24 (from Flask->mlflow) Downloading https://files.pythonhosted.org/packages/dc/b4/a60bcdba945c00f6d608d8975131ab3f25b22f2bcfe1dab221165194b2d4/itsdangerous-0.24.tar.gz (46kB) 100% |████████████████████████████████| 51kB 122kB/s Collecting Werkzeug>=0.14 (from Flask->mlflow) Downloading https://files.pythonhosted.org/packages/20/c4/12e3e56473e52375aa29c4764e70d1b8f3efa6682bef8d0aae04fe335243/Werkzeug-0.14.1-py2.py3-none-any.whl (322kB) 100% |████████████████████████████████| 327kB 366kB/s Requirement already satisfied: Jinja2>=2.10 in /opt/conda/lib/python3.6/site-packages (from Flask->mlflow) (2.10) Requirement already satisfied: pytz>=2011k in /opt/conda/lib/python3.6/site-packages (from pandas->mlflow) (2018.4) Requirement already satisfied: setuptools in /opt/conda/lib/python3.6/site-packages (from protobuf>=3.6.0->mlflow) (39.2.0) Collecting jmespath<1.0.0,>=0.7.1 (from boto3->mlflow) Downloading https://files.pythonhosted.org/packages/b7/31/05c8d001f7f87f0f07289a5fc0fc3832e9a57f2dbd4d3b0fee70e0d51365/jmespath-0.9.3-py2.py3-none-any.whl Requirement already satisfied: pyasn1>=0.1.3 in /opt/conda/lib/python3.6/site-packages (from rsa<=3.5.0,>=3.1.2->awscli->mlflow) (0.4.3) Collecting smmap2>=2.0.0 (from gitdb2>=2.0.0->gitpython->mlflow) Downloading https://files.pythonhosted.org/packages/e3/59/4e22f692e65f5f9271252a8e63f04ce4ad561d4e06192478ee48dfac9611/smmap2-2.0.3-py2.py3-none-any.whl Requirement already satisfied: MarkupSafe>=0.23 in /opt/conda/lib/python3.6/site-packages (from Jinja2>=2.10->Flask->mlflow) (1.0) Building wheels for collected packages: mlflow, databricks-cli, uuid, querystring-parser, tabulate, configparser, itsdangerous Running setup.py bdist_wheel for mlflow ... done Stored in directory: /home/jovyan/.cache/pip/wheels/fd/ef/05/d1a5e684ca724530d9e255a1052867461ed79ba163f7f8da03 Running setup.py bdist_wheel for databricks-cli ... done Stored in directory: /home/jovyan/.cache/pip/wheels/ed/db/48/ec3b28dbc74ec2e2fe4d175efdcdddc64a37f855105fe650d5 Running setup.py bdist_wheel for uuid ... done Stored in directory: /home/jovyan/.cache/pip/wheels/2a/80/9b/015026567c29fdffe31d91edbe7ba1b17728db79194fca1f21 Running setup.py bdist_wheel for querystring-parser ... done Stored in directory: /home/jovyan/.cache/pip/wheels/ee/09/99/bf937e4f02788fa8b33dc5240842ba3977ba5c3c4ad4a115d7 Running setup.py bdist_wheel for tabulate ... done Stored in directory: /home/jovyan/.cache/pip/wheels/2a/85/33/2f6da85d5f10614cbe5a625eab3b3aebfdf43e7b857f25f829 Running setup.py bdist_wheel for configparser ... done Stored in directory: /home/jovyan/.cache/pip/wheels/a3/61/79/424ef897a2f3b14684a7de5d89e8600b460b89663e6ce9d17c Running setup.py bdist_wheel for itsdangerous ... done Stored in directory: /home/jovyan/.cache/pip/wheels/2c/4a/61/5599631c1554768c6290b08c02c72d7317910374ca602ff1e5 Successfully built mlflow databricks-cli uuid querystring-parser tabulate configparser itsdangerous Installing collected packages: docutils, jmespath, botocore, colorama, s3transfer, awscli, click, tabulate, configparser, databricks-cli, uuid, smmap2, gitdb2, gitpython, gunicorn, itsdangerous, Werkzeug, Flask, protobuf, boto3, querystring-parser, mlflow Found existing installation: protobuf 3.5.2 Uninstalling protobuf-3.5.2: Successfully uninstalled protobuf-3.5.2 Successfully installed Flask-1.0.2 Werkzeug-0.14.1 awscli-1.15.51 boto3-1.7.50 botocore-1.10.50 click-6.7 colorama-0.3.9 configparser-3.5.0 databricks-cli-0.7.2 docutils-0.14 gitdb2-2.0.3 gitpython-2.1.10 gunicorn-19.9.0 itsdangerous-0.24 jmespath-0.9.3 mlflow-0.2.1 protobuf-3.6.0 querystring-parser-1.2.3 s3transfer-0.1.13 smmap2-2.0.3 tabulate-0.8.2 uuid-1.30
!ls -l mlflow
total 100 -rw-r--r-- 1 jovyan 4294967294 1460 Jul 4 03:44 CHANGELOG.rst -rw-r--r-- 1 jovyan 4294967294 305 Jul 4 03:44 conftest.py -rw-r--r-- 1 jovyan 4294967294 2586 Jul 4 03:44 CONTRIBUTING.rst -rw-r--r-- 1 jovyan 4294967294 126 Jul 4 03:44 dev-requirements.txt -rw-r--r-- 1 jovyan 4294967294 372 Jul 4 03:44 Dockerfile drwxr-sr-x 4 jovyan 4294967294 4096 Jul 4 03:44 docs drwxr-sr-x 5 jovyan 4294967294 4096 Jul 4 03:44 example -rwxr-xr-x 1 jovyan 4294967294 882 Jul 4 03:44 generate-protos.sh -rw-r--r-- 1 jovyan 4294967294 815 Jul 4 03:44 ISSUE_TEMPLATE.md -rw-r--r-- 1 jovyan 4294967294 11382 Jul 4 03:44 LICENSE.txt -rwxr-xr-x 1 jovyan 4294967294 138 Jul 4 03:44 lint.sh drwxr-sr-x 12 jovyan 4294967294 4096 Jul 4 03:44 mlflow -rw-r--r-- 1 jovyan 4294967294 16956 Jul 4 03:44 pylintrc -rw-r--r-- 1 jovyan 4294967294 2257 Jul 4 03:44 README.rst -rw-r--r-- 1 jovyan 4294967294 1828 Jul 4 03:44 setup.py -rwxr-xr-x 1 jovyan 4294967294 330 Jul 4 03:44 test-generate-protos.sh drwxr-sr-x 13 jovyan 4294967294 4096 Jul 4 03:44 tests -rw-r--r-- 1 jovyan 4294967294 281 Jul 4 03:44 tox.ini -rw-r--r-- 1 jovyan 4294967294 147 Jul 4 03:44 tox-requirements.txt
# The data set used in this example is from http://archive.ics.uci.edu/ml/datasets/Wine+Quality # P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. # Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009. import os import warnings import sys import pandas as pd import numpy as np from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score from sklearn.model_selection import train_test_split from sklearn.linear_model import ElasticNet import mlflow import mlflow.sklearn def eval_metrics(actual, pred): rmse = np.sqrt(mean_squared_error(actual, pred)) mae = mean_absolute_error(actual, pred) r2 = r2_score(actual, pred) return rmse, mae, r2
准备数据
warnings.filterwarnings("ignore") np.random.seed(40) # Read the wine-quality csv file (make sure you're running this from the root of MLflow!) #wine_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "./mlflow/example/wine-quality.csv") wine_path = "./mlflow/example/tutorial/wine-quality.csv" data = pd.read_csv(wine_path) # Split the data into training and test sets. (0.75, 0.25) split. train, test = train_test_split(data) # The predicted column is "quality" which is a scalar from [3, 9] train_x = train.drop(["quality"], axis=1) test_x = test.drop(["quality"], axis=1) train_y = train[["quality"]] test_y = test[["quality"]] alpha = float(sys.argv[1]) if len(sys.argv) > 1 else 0.5 l1_ratio = float(sys.argv[2]) if len(sys.argv) > 2 else 0.5
拟合模型,数据预测,精度评估,记录参数。
with mlflow.start_run(): lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42) lr.fit(train_x, train_y) predicted_qualities = lr.predict(test_x) (rmse, mae, r2) = eval_metrics(test_y, predicted_qualities) print("Elasticnet model (alpha=%f, l1_ratio=%f):" % (alpha, l1_ratio)) print(" RMSE: %s" % rmse) print(" MAE: %s" % mae) print(" R2: %s" % r2) mlflow.log_param("alpha", alpha) mlflow.log_param("l1_ratio", l1_ratio) mlflow.log_metric("rmse", rmse) mlflow.log_metric("r2", r2) mlflow.log_metric("mae", mae) mlflow.sklearn.log_model(lr, "model")
上面的代码还有些问题,需要MLFlow的进一步完善。