通过AirFlow远程调度TensorFlow机器学习程序

通过AirFlow远程调度TensorFlow机器学习程序

TensorFlow机器学习程序运行时间比较长,因此调度TensorFlow机器学习程序需要考虑采用异步而不是同步调用的方式。我们开发的机器学习应用框架Prism中,我们通过浏览器端调用AirFlow,AirFlow调用TensorFlow机器学习程序的方法,实现了远程调用TensorFlow机器学习程序。TensorFlow机器学习程序所需要的输入数据来自Zabbix,处理结果写入Zabbix。

1、运行环境

  • 服务器操作系统:Linux i-cbp9w1nr 4.4.0-116-generic #140-Ubuntu SMP Mon Feb 12 21:23:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
  • TensorFlow:v1.7.0
  • AirFlow:v1.9.0
  • Airflow REST API Plugin:最新版本

2、编写AirFlow DAG 

输入一下命令,查看DAG文件:

source ~/tensorflow/bin/activate
cd ~/airflow/dags
vi dag_tfts_ar_tep_zabbix_r4.py

dag_tfts_ar_tep_zabbix_r4.py 原代码如下:

# -*- coding: utf-8 -*-
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# -*- coding: utf-8 -*-
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

from datetime import timedelta
import airflow
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from airflow.operators.python_operator import PythonOperator
from pprint import pprint

dag = DAG("dag_tfts_ar_tep_zabbix_r4",
          default_args={"owner": "prism",
                        "start_date": airflow.utils.dates.days_ago(1)},
          schedule_interval='@once',
          dagrun_timeout=timedelta(minutes=4)
          )

my_templated_command = """
    echo "{{ts}}" >>/tmp/predix/testoutput.txt
    echo "dag_id: dag_tfts_ar_tep_zabbix_r4" >>/tmp/predix/testoutput.txt
    echo "task_id: task_tfts_ar_tep_zabbix_r4" >>/tmp/predix/testoutput.txt
    echo " 'cfg was passed in via Airflow CLI REST API (trigger_dag) with value {{ dag_run.conf.get(\'cfg\') }} " >>/tmp/predix/testoutput.txt
    echo " 'miff was passed in via BashOperator with value {{ params.miff }} " >>/tmp/predix/testoutput.txt
    /home/ubuntu/tfts_zabbix/tfts_ar_tep_zabbix.py --cfg="{{ dag_run.conf.get(\'cfg\') }}"
"""

run_this = BashOperator(
    task_id='task_tfts_ar_tep_zabbix_r4',
    bash_command=my_templated_command,
    params={"miff":"agg"},
    dag=dag)
                                                                                                                                                         43,1          Bot

其中TensorFlow机器学习程序运行的输入参数,通过以下行命令输入:

/home/ubuntu/tfts_zabbix/tfts_ar_tep_zabbix.py --cfg="{{ dag_run.conf.get(\'cfg\') }}"

3、编写TensorFlow机器学习程序

输入一下命令,查看TensorFlow机器学习程序:

cd /home/ubuntu/tfts_zabbix/
vi tfts_ar_tep_zabbix.py
chmod +X tfts_ar_tep_zabbix.py 

这是一个时间序列预测机器学习程序,代码如下:

#!/home/ubuntu/tensorflow/bin/python
# coding: utf-8

# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""An example of training and predicting with a TFTS estimator."""

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import argparse
import sys
import os

import numpy as np
import tensorflow as tf

from pyzabbix import ZabbixAPI
from datetime import datetime
import time
import json
import re

# Enabling debug logging
import sys
import logging

stream = logging.StreamHandler(sys.stdout)
stream.setLevel(logging.DEBUG) # DEBUG ERROR
log = logging.getLogger('pyzabbix')
log.addHandler(stream)
log.setLevel(logging.DEBUG) # DEBUG ERROR

    
try:
  import matplotlib  # pylint: disable=g-import-not-at-top
  # matplotlib.use("TkAgg")  # Need Tk for interactive plots.
  matplotlib.use("agg")
  from matplotlib import pyplot  # pylint: disable=g-import-not-at-top
  HAS_MATPLOTLIB = True
except ImportError:
  # Plotting requires matplotlib, but the unit test running this code may
  # execute in an environment without it (i.e. matplotlib is not a build
  # dependency). We'd still like to test the TensorFlow-dependent parts of this
  # example, namely train_and_predict.
  HAS_MATPLOTLIB = False

#%matplotlib inline

FLAGS = None

# get history data
def history_data(itemid):
  # The hostname at which the Zabbix web interface is available
  ZABBIX_SERVER = 'http://xx.xx.xx.xx/zabbix'
  zapi = ZabbixAPI(ZABBIX_SERVER)

  # Login to the Zabbix API
  zapi.login('xx', 'xx')

  # Create a time range
  time_till = time.mktime(datetime.now().timetuple())
  time_from = time_till - 60 * 60 * 8  # 8 hours
  
  # Query item's history (float) data
  history = zapi.history.get(itemids=[itemid],
                             time_from=time_from,
                             time_till=time_till,
                             output='extend',
                             limit='5000',
                             history=0,
                             )


  # If nothing was found, try getting it from history (integer) data
  if not len(history):
    history = zapi.history.get(itemids=[itemid],
                               time_from=time_from,
                               time_till=time_till,
                               output='extend',
                               limit='5000',
                               history=3,
                               )
  observed_clocks = []
  observed_values = []
  for point in history:
    observed_clocks.append(int(point['clock']))
    observed_values.append(float(point['value']))
  data = {
    tf.contrib.timeseries.TrainEvalFeatures.TIMES: np.arange(1, len(observed_values) + 1),
    tf.contrib.timeseries.TrainEvalFeatures.VALUES: np.array(observed_values),
    'observed_clocks': np.array(observed_clocks),
  }
  
  return data

# send data to zabbix server
def tfts_zabbix_sender(data):
    data_json_encoded = json.JSONEncoder().encode(data)
    #print('data (json encoded) to be sended to zabbix server:')
    #print(data_json_encoded)
    zabbix_sender = "zabbix_sender -z xx.xx.xx.xx  -s prismml -k ml.result -o " + "'" + data_json_encoded + "'"
    #print('send command:')
    #print(zabbix_sender)
    os.system(zabbix_sender)

def structural_ensemble_train_and_predict(data):
  # Cycle between 5 latent values over a period of 100. This leads to a very
  # smooth periodic component (and a small model), which is a good fit for our
  # example data. Modeling high-frequency periodic variations will require a
  # higher cycle_num_latent_values.
  structural = tf.contrib.timeseries.StructuralEnsembleRegressor(
      periodicities=100, num_features=1, cycle_num_latent_values=5)
  return train_and_predict(structural, data, training_steps=150)


def ar_train_and_predict(data):
  # An autoregressive model, with periodicity handled as a time-based
  # regression. Note that this requires windows of size 16 (input_window_size +
  # output_window_size) for training.
  ar = tf.contrib.timeseries.ARRegressor(
      periodicities=100, input_window_size=10, output_window_size=6,
      num_features=1,
      # Use the (default) normal likelihood loss to adaptively fit the
      # variance. SQUARED_LOSS overestimates variance when there are trends in
      # the series.
      loss=tf.contrib.timeseries.ARModel.NORMAL_LIKELIHOOD_LOSS)
  return train_and_predict(ar, data, training_steps=600)


def train_and_predict(estimator, data, training_steps):
  """A simple example of training and predicting."""
  # Read data in the default "time,value" Numpy format with no header
  reader = tf.contrib.timeseries.NumpyReader(data)
  # Set up windowing and batching for training
  train_input_fn = tf.contrib.timeseries.RandomWindowInputFn(
      reader, batch_size=16, window_size=16)
  # Fit model parameters to data
  estimator.train(input_fn=train_input_fn, steps=training_steps)
  # Evaluate on the full dataset sequentially, collecting in-sample predictions
  # for a qualitative evaluation. Note that this loads the whole dataset into
  # memory. For quantitative evaluation, use RandomWindowChunker.
  evaluation_input_fn = tf.contrib.timeseries.WholeDatasetInputFn(reader)
  evaluation = estimator.evaluate(input_fn=evaluation_input_fn, steps=1)
  # Predict starting after the evaluation
  (predictions,) = tuple(estimator.predict(
      input_fn=tf.contrib.timeseries.predict_continuation_input_fn(
          evaluation, steps=200)))
  times = evaluation["times"][0]
  observed = evaluation["observed"][0, :, 0]
  mean = np.squeeze(np.concatenate(
      [evaluation["mean"][0], predictions["mean"]], axis=0))
  variance = np.squeeze(np.concatenate(
      [evaluation["covariance"][0], predictions["covariance"]], axis=0))
  all_times = np.concatenate([times, predictions["times"]], axis=0)
  upper_limit = mean + np.sqrt(variance)
  lower_limit = mean - np.sqrt(variance)
  return times, observed, all_times, mean, upper_limit, lower_limit


def make_plot(name, training_times, observed, all_times, mean,
              upper_limit, lower_limit):
  """Plot a time series in a new figure."""
  pyplot.figure(figsize=(15, 5))
  pyplot.plot(training_times, observed, "b", label="training series")
  pyplot.plot(all_times, mean, "r", label="forecast")
  #pyplot.plot(all_times, upper_limit, "g", label="forecast upper bound")
  #pyplot.plot(all_times, lower_limit, "g", label="forecast lower bound")
  #pyplot.fill_between(all_times, lower_limit, upper_limit, color="grey", alpha="0.2")
  pyplot.axvline(training_times[-1], color="k", linestyle="--")
  pyplot.xlabel("time")
  pyplot.ylabel("observations")
  pyplot.legend(loc=0)
  pyplot.title(name)


def main(unused_argv):
  if not HAS_MATPLOTLIB:
    raise ImportError(
        "Please install matplotlib to generate a plot from this example.")
  # parse arguments
  print("unused_argv")
  print(unused_argv)
  parser = argparse.ArgumentParser()
  parser.add_argument(
      "--cfg",
      type=str,
      required=False,
      help="config.")
  known_args, unparsed = parser.parse_known_args()
  s_cfg = known_args.cfg
  print("s_cfg: {0}".format(s_cfg))
  s_cfg = re.sub('[\']', '"', s_cfg)
  print("s_cfg (replaced ' with \"): {0}".format(s_cfg))
  d_cfg = json.JSONDecoder().decode(s_cfg)
  print("d_cfg: {0}".format(d_cfg))
  itemid = d_cfg['itemids'][0]
  print("itemid: {0}".format(itemid))
  jobid = d_cfg['jobid']
  print("jobid: {0}".format(jobid))
  
  # get history data
  # 25583 -- CPU user time(prismtf -> CPU Performance )
  #itemid = "25583"
  data = history_data(itemid)
  print('data')
  print(data)
  #make_plot("Structural ensemble",
  #          *structural_ensemble_train_and_predict(FLAGS.input_filename))
  #make_plot("AR", *ar_train_and_predict(FLAGS.input_filename))
  #make_plot("AR", *ar_train_and_predict(data))
  #pyplot.show()
  #pyplot.savefig("/tmp/fig/tfts_ar_tep_zabbix.png")

  # send data to zabbix server
  (times, observed, all_times, mean, upper_limit, lower_limit) = ar_train_and_predict(data)
  #print('type of times: {0}'.format(type(times)))
  data_sended = {
    'jobid':jobid,
    'observed_clocks':data['observed_clocks'].tolist(),
    'times':times.tolist(),
    'observed':observed.tolist(),
    'all_times':all_times.tolist(),
    'mean':mean.tolist(),
    'upper_limit':upper_limit.tolist(),
    'lower_limit':lower_limit.tolist()
  }
  tfts_zabbix_sender(data_sended)



if __name__ == "__main__":
  parser = argparse.ArgumentParser()
  parser.add_argument(
      "--input_filename",
      type=str,
      required=False,
      help="Input csv file.")
  FLAGS, unparsed = parser.parse_known_args()
  # for test
  FLAGS.input_filename = './data/period_trend.csv'
  tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)

4、编写浏览器端发起调度程序

浏览器端发起调度程序使用SAP UI5编写,程序如下:

		// 提交组件运行
		onMLAppSubmitJob: function() {
			jQuery.sap.log.info("onMLAppSubmitJob called");
			this.getView().byId("idMLAppPollJobResultChartContainer").setVisible(false);
			this.jobid = null;
			this.jobid = this.uuidv4();
			var datasource = this._getDatasourceByServiceid(selectedServiceID);
			// datasource - {"itemids":["25582","25583"],"itemtips":["CPU system time(prismtf -> CPU Performance )","CPU user time(prismtf -> CPU Performance )"]} 
			if (datasource.itemids.length === 0) {
				MessageToast.show("请先在“数据源-训练”组件中,设置数据项。");
				return;
			}
			var cnf = {"cfg":{
				"itemids":datasource.itemids, 
				"train_steps":2000,
				"jobid": this.jobid
			}};
			cnf = encodeURIComponent(JSON.stringify(cnf));
			// rest_api_plugin_version
			var airflowurl = "http://xx.xx.xx.xx:8080/admin/rest_api/api?api=rest_api_plugin_version";
			// trigger_dag: Triggers a Dag to Run
			airflowurl = "http://xx.xx.xx.xx:8080/admin/rest_api/api?api=trigger_dag";
			airflowurl += "&dag_id=dag_tfts_ar_tep_zabbix_r4";
			airflowurl += "&conf=" + cnf;
			var mReq = {
				"type" : "GET",
				// "contentType" : "application/json-rpc",
				"contentType" : "application/x-www-form-urlencoded; charset=UTF-8",
				// "url" : this.getOwnerComponent().getModel("user").getProperty("/url"),
				"url" : airflowurl,
				"dataType" : "json",
				"global" : false,
				"processData" : false,
				"data" : ''
			};
			jQuery.sap.log.info("Airflow->rest_api_plugin_version->Request", JSON.stringify(mReq));
			var oResult = jQuery.sap.sjax(mReq);
			jQuery.sap.log.info("Airflow->rest_api_plugin_version->Response", JSON.stringify(oResult));
			// MessageToast.show(JSON.stringify(oResult));
			MessageToast.show("已提交组件运行,请稍候。");
			return;
		},

 

 

 

 

 

你可能感兴趣的:(通过AirFlow远程调度TensorFlow机器学习程序)