在本教程中,您将学习如何使用Hetero NN。应该注意的是,Hetero NN也已升级为与Homo NN类似的工作方式,允许使用Pytorch后端对模型和数据集进行高度定制。我们将在后面的章节中专门介绍针对Hetero NN的定制。
此外,Hetero NN还改进了一些接口,如交互层接口,这使其使用逻辑更加清晰。
在本章中,我们将提供一个使用Hetero-NN的基本二进制分类任务的示例。使用此算法的过程与其他FATE算法一致:您将使用FATE提供的读取器和转换器接口来输入表数据,然后将数据输入到算法组件中。然后,组件将使用定义的顶部/底部模型、优化器和损失函数进行训练。此版本的用法与旧版本FATE的用法基本相同。
如果您想了解Hetero-NN算法的原理,可以参考异构神经网络。
一开始,我们将数据上传到FATE。我们可以使用管道直接上传数据。在这里,我们上传两个文件:来宾的brest_hetero_guest.csv和主机的brest_hetero_host.csv。请注意,在本教程中,我们使用的是独立版本,如果您使用的是集群版本,则需要在每台计算机上上载相应的数据。
from pipeline.backend.pipeline import PipeLine # pipeline class
# we have two party: guest, whose data with labels
# host, without label
# the dataset is vertically split
dense_data_guest = {"name": "breast_hetero_guest", "namespace": f"experiment"}
dense_data_host = {"name": "breast_hetero_host", "namespace": f"experiment"}
guest= 9999
host = 10000
pipeline_upload = PipeLine().set_initiator(role='guest', party_id=guest).set_roles(guest=guest, host=host)
partition = 4
# 上传一份数据
pipeline_upload.add_upload_data(file="./examples/data/breast_hetero_guest.csv",
table_name=dense_data_guest["name"], # table name
namespace=dense_data_guest["namespace"], # namespace
head=1, partition=partition) # data info
pipeline_upload.add_upload_data(file="./examples/data/breast_hetero_host.csv",
table_name=dense_data_host["name"],
namespace=dense_data_host["namespace"],
head=1, partition=partition) # data info
pipeline_upload.upload(drop=1)
乳房数据集是一个具有30个特征的二进制数据集,它是垂直分割的:客人拥有10个胎儿和标签,而主人拥有20个特征
import pandas as pd
df = pd.read_csv('../examples/data/breast_hetero_guest.csv') # 文件地址根据自己得环境设置
df
import pandas as pd
df = pd.read_csv('../examples/data/breast_hetero_host.csv')
df
上传完成后,我们可以开始编写Pipeline脚本以提交FATE任务。
import torch as t
from torch import nn
from pipeline.backend.pipeline import PipeLine # pipeline Class
from pipeline import fate_torch_hook
from pipeline.component import HeteroNN, Reader, DataTransform, Intersection # Hetero NN Component, Data IO component, PSI component
from pipeline.interface import Data, Model # data, model for defining the work flow
请确保执行以下fate_torch_hook函数,该函数可以修改某些torch类,以便可以通过Pipeline解析和提交您在脚本中定义的torch层、顺序、优化器和损失函数。
from pipeline import fate_torch_hook
t = fate_torch_hook(t)
guest = 9999
host = 10000
pipeline = PipeLine().set_initiator(role='guest', party_id=guest).set_roles(guest=guest, host=host)
guest_train_data = {"name": "breast_hetero_guest", "namespace": "experiment"}
host_train_data = {"name": "breast_hetero_host", "namespace": "experiment"}
pipeline = PipeLine().set_initiator(role='guest', party_id=guest).set_roles(guest=guest, host=host)
# read uploaded dataset
reader_0 = Reader(name="reader_0")
reader_0.get_party_instance(role='guest', party_id=guest).component_param(table=guest_train_data)
reader_0.get_party_instance(role='host', party_id=host).component_param(table=host_train_data)
# The transform component converts the uploaded data to the DATE standard format
data_transform_0 = DataTransform(name="data_transform_0")
data_transform_0.get_party_instance(role='guest', party_id=guest).component_param(with_label=True)
data_transform_0.get_party_instance(role='host', party_id=host).component_param(with_label=False)
# intersection
intersection_0 = Intersection(name="intersection_0")
这里我们初始化Hetero-NN组件。我们使用get_party_instance分别获取来宾组件和主机组件。由于双方的模型架构不同,我们必须使用各自的组件为每一方指定模型参数。
hetero_nn_0 = HeteroNN(name="hetero_nn_0", epochs=2,
interactive_layer_lr=0.01, batch_size=-1, validation_freqs=1, task_type='classification', seed=114514)
guest_nn_0 = hetero_nn_0.get_party_instance(role='guest', party_id=guest)
host_nn_0 = hetero_nn_0.get_party_instance(role='host', party_id=host)
# Guest Bottom, Top Model
guest_bottom = t.nn.Sequential(
nn.Linear(10, 2),
nn.ReLU()
)
guest_top = t.nn.Sequential(
nn.Linear(2, 1),
nn.Sigmoid()
)
# Host Bottom Model
host_bottom = t.nn.Sequential(
nn.Linear(20, 2),
nn.ReLU()
)
# After using fate_torch_hook, nn module can use InteractiveLayer, you can view the structure of Interactive layer with print
interactive_layer = t.nn.InteractiveLayer(out_dim=2, guest_dim=2, host_dim=2, host_num=1)
print(interactive_layer)
guest_nn_0.add_top_model(guest_top)
guest_nn_0.add_bottom_model(guest_bottom)
host_nn_0.add_bottom_model(host_bottom)
optimizer = t.optim.Adam(lr=0.01) # Notice! After fate_torch_hook, the optimizer can be initialized without model parameter
loss = t.nn.BCELoss()
hetero_nn_0.set_interactive_layer(interactive_layer)
hetero_nn_0.compile(optimizer=optimizer, loss=loss)
InteractiveLayer(
(activation): ReLU()
(guest_model): Linear(in_features=2, out_features=2, bias=True)
(host_model): ModuleList(
(0): Linear(in_features=2, out_features=2, bias=True)
)
(act_seq): Sequential(
(0): ReLU()
)
)