kubeflow-9-轻量级python组件的生成

kubeflow-3-pipeline的component和构建方法

1 轻量级python组件

Lightweight python components do not require you to build a new container image for every code change. They’re intended to use for fast iteration in notebook environment.
轻量级python组件不需要为每次代码更改构建新的容器映像。它们用于笔记本环境中的快速迭代。
Building a lightweight python component
构建轻量级python组件
To build a component just define a stand-alone python function and then call kfp.components.func_to_container_op(func) to convert it to a component that can be used in a pipeline.
要构建一个组件,只需定义一个独立的python函数,然后调用kfp.components.func_to_container_op(func)将其转换为可在管道中使用的组件。
There are several requirements for the function:
该功能有几个要求:

(1)The function should be stand-alone. It should not use any code declared outside of the function definition. Any imports should be added inside the main function. Any helper functions should also be defined inside the main function.
函数应该是独立的。它不应该使用在函数定义之外声明的任何代码。任何导入都应该添加到main函数中。任何助手函数也应该在main函数中定义。

(2)The function can only import packages that are available in the base image. If you need to import a package that’s not available you can try to find a container image that already includes the required packages. (As a workaround you can use the module subprocess to run pip install for the required package. There is an example below in my_divmod function.)
该函数只能导入基本映像中可用的包。如果需要导入不可用的包,可以尝试查找已包含所需包的容器映像。(作为一种解决方法,您可以使用模块子流程为所需的包运行pip install。下面是my_divmod函数的一个示例。)

(3)If the function operates on numbers, the parameters need to have type hints. Supported types are [int, float, bool]. Everything else is passed as string.
如果函数对数字进行操作,则参数需要有类型提示。支持的类型有[int,float,bool]。其他所有内容都作为字符串传递。

(4)To build a component with multiple output values, use the typing.NamedTuple type hint syntax:

NamedTuple(‘MyFunctionOutputs’, [(‘output_name_1’, type), (‘output_name_2’, float)])
要构建具有多个输出值的组件,请使用键入.NamedTuple类型提示语法

2 简单应用

import kfp
#(1)定义Python function
def add(a: float, b: float) -> float:
   '''Calculates sum of two arguments'''
   return a + b

#(2)Convert the function to a pipeline operation
add_op = kfp.components.func_to_container_op(add)

#(3)定义pipeline
@kfp.dsl.pipeline(
   name='Calculation pipeline',
   description='A toy pipeline that performs arithmetic calculations.'
)
def cal_pipeline(a = 7):
    #Passing pipeline parameter and a constant value as operation arguments
    add_task = add_op(a, 4) #Returns a dsl.ContainerOp class instance. 
#(4)提交执行
if __name__ == '__main__':
    kfp.Client().create_run_from_pipeline_func(cal_pipeline, arguments={})

3 复杂应用

A bit more advanced function which demonstrates how to use imports, helper functions and produce multiple outputs.

import kfp
from typing import NamedTuple
#(1)定义高级函数
#Advanced function
#Demonstrates imports, helper functions and multiple outputs
def my_divmod(dividend: float, divisor:float) -> NamedTuple('MyDivmodOutput', [('quotient', float), ('remainder', float), ('mlpipeline_metrics', 'Metrics')]):
    '''Divides two numbers and calculate  the quotient商 and remainder余数'''
    #(1-1)Pip installs inside a component function.
    #NOTE: 安装应该放在最开始的位置,以避免升级包
    # after it has already been imported and cached by python
    import sys, subprocess;
    subprocess.run([sys.executable, '-m', 'pip', 'install', 'numpy'])
    
    #(1-2)Imports inside a component function:
    import numpy as np

    #(1-3)This function demonstrates how to use nested functions inside a component function:
    def divmod_helper(dividend, divisor):
        return np.divmod(dividend, divisor)

    (quotient, remainder) = divmod_helper(dividend, divisor)

    import json
    

    # Exports two sample metrics:
    metrics = {
      'metrics': [{
          'name': 'quotient',
          'numberValue':  float(quotient),
        },{
          'name': 'remainder',
          'numberValue':  float(remainder),
        }]}

    from collections import namedtuple
    divmod_output = namedtuple('MyDivmodOutput', ['quotient', 'remainder', 'mlpipeline_metrics'])
    return divmod_output(quotient, remainder, json.dumps(metrics))

#(2)Convert the function to a pipeline operation
#You can specify an alternative base container image (the image needs to have Python 3.5+ installed).

divmod_op = kfp.components.func_to_container_op(my_divmod, base_image='tensorflow/tensorflow:1.13.2-py3')

#(3)Define the pipeline
#Pipeline function has to be decorated with the @dsl.pipeline decorator
@kfp.dsl.pipeline(
   name='Calculation pipeline',
   description='A toy pipeline that performs arithmetic calculations.'
)
def calc_pipeline(b='7',c='17'):
    #Passing a task output reference as operation arguments
    divmod_task = divmod_op(b, c)
#(4)提交执行
if __name__ == '__main__':   
    #Specify pipeline argument values
    arguments = {'b': '7', 'c': '8'}
    #Submit a pipeline run
    kfp.Client().create_run_from_pipeline_func(calc_pipeline, arguments=arguments)

(1)Test running the python function directly
my_divmod(100, 7)
输出
MyDivmodOutput(quotient=14, remainder=2, mlpipeline_metrics=’{“metrics”: [{“name”: “quotient”, “numberValue”: 14.0}, {“name”: “remainder”, “numberValue”: 2.0}]}’)
(2)组件输出的使用方式
For an operation with a single return value, the output reference can be accessed using task.output or task.outputs['output_name']

For an operation with a multiple return values, the output references can be accessed using task.outputs['output_name']

如果使用print,会在组件的log中出现。

你可能感兴趣的:(kubeflow)