ASTGCN代码解析-训练部分(待补充)

目录

  • Project介绍
  • 配置文件说明
  • 主要函数解析(main函数入口)
    • 1. 数据处理
    • 2. 整理用于测试的ground truth
    • 3. 将数据打包为DataLoader
    • 4. 将上面正则化用的均值方差保存到.npz文件中
    • 5. 损失函数
    • 6. 加载模型!!!(重点分析下模型中的注意力机制、图卷积)
    • 7. 构造训练器,可视化监控,以及在训练前先计算验证集的损失,对测试集进行预测,及结果评估
    • 8. 开始训练模型
    • 9. 对测试集进行测试,并将结果保存到指定文件夹中(配置文件中的prediction_filename)

Project介绍

配置文件说明

主要函数解析(main函数入口)

1. 数据处理

# 主要的数据处理方法
all_data = read_and_generate_dataset(graph_signal_matrix_filename,
                                         num_of_weeks,
                                         num_of_days,
                                         num_of_hours,
                                         num_for_predict,
                                         points_per_hour,
                                         merge)

在加载数据后经过取sample之后分成了week_sample,day_sample,hour_sample和target四个部分,然后前三个变换了轴,最后target取三维特征中的第一维flow存放在all_sample中。

  • read_and_generate_dataset 将全部图数据进行处理,整理出能用于训练的(X,Y),Y即能找到其关联的周、天、小时数据的序列片段,大小与设置的predict有关,16992条数据有14965条能进行训练测试(参数设置维weeks=1,day=1,hours=3,predict12时),然后将14965划分为训练集、测试集、验证集三部分,最后将每部分中的周数据、天数据、小时数居分别进行标准化,返回的就是标准化后各部分的数据,以及week、day、recent三部分各自的标准化参数均值方差(均值和方差均是用train部分数据求的)
def read_and_generate_dataset(graph_signal_matrix_filename,
                              num_of_weeks, num_of_days,
                              num_of_hours, num_for_predict,
                              points_per_hour=12, merge=False):
    """
    图信号矩阵文件进行处理,提取出模型需要的X,Y,最后返回一个dict,包含训练集、验证集、测试集,每部分的均值方差数,如key=train,value={week:[];day:[];recent:[];target:[]}
    :param graph_signal_matrix_filename: 图矩阵文件
    :param num_of_weeks: 自定义关联周数
    :param num_of_days: 自定义关联天数
    :param num_of_hours: 自定义关进最近小时数
    :param num_for_predict: 自定义预测周期,12就是一小时
    :param points_per_hour: 图数据文件决定。一小时有12次数据
    :param merge: 是否合并训练集和验证集共同训练模型,最终数据按照6:2:2 划为训练集、验证集、测试集三部分,
    若mergr=Ture,用于训练的training_set=训练集+验证集,validation_set=验证集(不变)
    :return: dict,返回进行标准化的数据,以及标准化用到的均值和方差
    """

    '''
    Parameters
    ----------
    graph_signal_matrix_filename: str, path of graph signal matrix file

    num_of_weeks, num_of_days, num_of_hours: int

    num_for_predict: int

    points_per_hour: int, default 12, depends on data

    merge: boolean, default False,
           whether to merge training set and validation set to train model

    Returns
    ----------
    feature: np.ndarray,
             shape is (num_of_samples, num_of_batches * points_per_hour,
                       num_of_vertices, num_of_features)

    target: np.ndarray,
            shape is (num_of_samples, num_of_vertices, num_for_predict)

    '''

    # 加载.npz文件,图信号数据文件 pems04.npz,返回的是一个'numpy.ndarray'
    data_seq = np.load(graph_signal_matrix_filename)['data']

    all_samples = []
    for idx in range(data_seq.shape[0]):
        sample = get_sample_indices(data_seq, num_of_weeks, num_of_days,
                                    num_of_hours, idx, num_for_predict,
                                    points_per_hour)
        if not sample:
            continue

        week_sample, day_sample, hour_sample, target = sample

        # 进行了一个transpose [1,12,307,3]变为[1,307,3,12] ,1是扩维的,12表示一个周期的数据,weeks=1,如果是hours=3,此时应该是36
        # target[:, :, 0, :]只取了第一个特征作为预测值flow,原来是[1,307,3,12],在第3维上只要第一个值变为

        # all_samples=[(week[1,307,3,12],day[1,307,3,12],hour[1,307,3,36],target[1,307,12]),(1,307,1,12)]
        all_samples.append((
            np.expand_dims(week_sample, axis=0).transpose((0, 2, 3, 1)),
            np.expand_dims(day_sample, axis=0).transpose((0, 2, 3, 1)),
            np.expand_dims(hour_sample, axis=0).transpose((0, 2, 3, 1)),
            np.expand_dims(target, axis=0).transpose((0, 2, 3, 1))[:, :, 0, :]
        ))

    # all_sample:list 14965=16992-2016-11 有14965条可训练或验证或测试的数据
    # 每一条数据都是一个4元元组,(weeks_data(1,307,3,12 ~一周),day_data(1,307,3,12 ~一天),hour_data(1,307,3,36 ~3小时),target(1,307,12 3维特征只要第一维的))
    split_line1 = int(len(all_samples) * 0.6)
    split_line2 = int(len(all_samples) * 0.8)

    if not merge:
        training_set = [np.concatenate(i, axis=0)
                        for i in zip(*all_samples[:split_line1])]
    else:
        print('Merge training set and validation set!')
        training_set = [np.concatenate(i, axis=0)
                        for i in zip(*all_samples[:split_line2])]

    validation_set = [np.concatenate(i, axis=0)
                      for i in zip(*all_samples[split_line1: split_line2])]
    testing_set = [np.concatenate(i, axis=0)
                   for i in zip(*all_samples[split_line2:])]

    # testing_set=[(2993,307,3,12),(2993,307,3,12),(2993,307,3,36),(2993,307,12)] [周数据,天数据,小时数据,target]
    # validation_set=[(2993,307,3,12),(2993,307,3,12),(2993,307,3,36),(2993,307,12)]
    # training_set=[(11972或8979,307,3,12),(11972或8979,307,3,36),(11972或8979,307,3,12),(11972或8979,307,12)]

    train_week, train_day, train_hour, train_target = training_set
    val_week, val_day, val_hour, val_target = validation_set
    test_week, test_day, test_hour, test_target = testing_set


    print('training data: week: {}, day: {}, recent: {}, target: {}'.format(
        train_week.shape, train_day.shape,
        train_hour.shape, train_target.shape))
    print('validation data: week: {}, day: {}, recent: {}, target: {}'.format(
        val_week.shape, val_day.shape, val_hour.shape, val_target.shape))
    print('testing data: week: {}, day: {}, recent: {}, target: {}'.format(
        test_week.shape, test_day.shape, test_hour.shape, test_target.shape))

    # 进行标准化,normalization返回第一个元素是{'mean': mean, 'std': std}
    (week_stats, train_week_norm,
     val_week_norm, test_week_norm) = normalization(train_week,
                                                    val_week,
                                                    test_week)

    (day_stats, train_day_norm,
     val_day_norm, test_day_norm) = normalization(train_day,
                                                  val_day,
                                                  test_day)

    (recent_stats, train_recent_norm,
     val_recent_norm, test_recent_norm) = normalization(train_hour,
                                                        val_hour,
                                                        test_hour)

    all_data = {
        'train': {
            'week': train_week_norm,
            'day': train_day_norm,
            'recent': train_recent_norm,
            'target': train_target,
        },
        'val': {
            'week': val_week_norm,
            'day': val_day_norm,
            'recent': val_recent_norm,
            'target': val_target
        },
        'test': {
            'week': test_week_norm,
            'day': test_day_norm,
            'recent': test_recent_norm,
            'target': test_target
        },
        'stats': {
            'week': week_stats,
            'day': day_stats,
            'recent': recent_stats
        }
    }

    return all_data


  • get_sample_indices 返回一个target对应的周数据、天数据、最近数据
    week_sample=[12,307,3], day_sample=[12,307,3], hour_sample=[12*3,307,3], target=[12.307.3]
def get_sample_indices(data_sequence, num_of_weeks, num_of_days, num_of_hours,
                       label_start_idx, num_for_predict, points_per_hour=12):
    """
    提取出每个片段对应的week,day,recent,每个片段是Y,对应的recent等,一起作为X
    :param data_sequence: 读取的全部图矩阵数据
    :param num_of_weeks:
    :param num_of_days:
    :param num_of_hours:
    :param label_start_idx: 第一个可以作为训练测试样本的数据开始index
    :param num_for_predict:
    :param points_per_hour:
    :return: 返回一个target对应的周数据、天数据、最近数据
    week_sample=[12,307,3], day_sample=[12,307,3], hour_sample=[12*3,307,3], target=[12.307.3]
    """

    '''
    Parameters
    ----------
    data_sequence: np.ndarray
                   shape is (sequence_length, num_of_vertices, num_of_features)

    num_of_weeks, num_of_days, num_of_hours: int

    label_start_idx: int, the first index of predicting target

    num_for_predict: int,
                     the number of points will be predicted for each sample

    points_per_hour: int, default 12, number of points per hour

    Returns
    ----------
    week_sample: np.ndarray
                 shape is (num_of_weeks * points_per_hour,
                           num_of_vertices, num_of_features)

    day_sample: np.ndarray
                 shape is (num_of_days * points_per_hour,
                           num_of_vertices, num_of_features)

    hour_sample: np.ndarray
                 shape is (num_of_hours * points_per_hour,
                           num_of_vertices, num_of_features)

    target: np.ndarray
            shape is (num_for_predict, num_of_vertices, num_of_features)
    '''

    week_indices = search_data(data_sequence.shape[0], num_of_weeks,
                               label_start_idx, num_for_predict,
                               7 * 24, points_per_hour)
    if not week_indices:
        return None

    day_indices = search_data(data_sequence.shape[0], num_of_days,
                              label_start_idx, num_for_predict,
                              24, points_per_hour)
    if not day_indices:
        return None

    hour_indices = search_data(data_sequence.shape[0], num_of_hours,
                               label_start_idx, num_for_predict,
                               1, points_per_hour)
    if not hour_indices:
        return None

    week_sample = np.concatenate([data_sequence[i: j]
                                  for i, j in week_indices], axis=0)
    day_sample = np.concatenate([data_sequence[i: j]
                                 for i, j in day_indices], axis=0)
    hour_sample = np.concatenate([data_sequence[i: j]
                                  for i, j in hour_indices], axis=0)
    target = data_sequence[label_start_idx: label_start_idx + num_for_predict]


    print('获取每个片段对应的关联片段,week,day,recent:')
    print('currint index:',label_start_idx)
    print('week_indices:',week_indices)
    print('day_indices:', day_indices)
    print('hour_indices:', hour_indices)
    print(' ')
    print('week_sample:',week_sample.shape)
    print('day_sample:',day_sample.shape)
    print('hour_sample:',hour_sample.shape)
    print('target:',target.shape)

    return week_sample, day_sample, hour_sample, target

  • search_data 获取训练测试target 对应的X的索引,即关联的week day hour序列的索引[(start_idx,end_idx)]
def search_data(sequence_length, num_of_batches, label_start_idx,
                num_for_predict, units, points_per_hour):
    """

    :param sequence_length:
    :param num_of_batches: recent,day,week取的周期数,在配置文件中设置的,int
    :param label_start_idx: 能被作为训练测试集的片段的开始index,这里是遍历的,从0开始判断
    :param num_for_predict:
    :param units:
    :param points_per_hour:
    :return: 返回一个list,其中元素数目是配置文件中设置的关联个数,如week=1,就是一个二元元组[(0,12)],
    元组的第一个数字表示开始索引,后一个是结束索引=start_idx+num_for_predict
    """

    '''
    Parameters
    ----------
    sequence_length: int, length of all history data

    num_of_batches: int, the number of batches will be used for training

    label_start_idx: int, the first index of predicting target

    num_for_predict: int,
                     the number of points will be predicted for each sample

    units: int, week: 7 * 24, day: 24, recent(hour): 1

    points_per_hour: int, number of points per hour, depends on data

    Returns
    ----------
    list[(start_idx, end_idx)]
    '''

    if points_per_hour < 0:
        raise ValueError("points_per_hour should be greater than 0!")

    # 最后一条数据的Index+片段长度不能超过总序列长度
    if label_start_idx + num_for_predict > sequence_length:
        return None

    x_idx = []
    for i in range(1, num_of_batches + 1):
        start_idx = label_start_idx - points_per_hour * units * i
        end_idx = start_idx + num_for_predict
        if start_idx >= 0:
            x_idx.append((start_idx, end_idx))
        else:
            return None

    if len(x_idx) != num_of_batches:
        return None

    return x_idx[::-1]

2. 整理用于测试的ground truth

将testing_set中target部分数据进行transpose和reshape,由(2993,307,12) ,变为(2993,3684)

 # test set ground truth true_value=(2993,3684)
    true_value = (all_data['test']['target'].transpose((0, 2, 1))
                  .reshape(all_data['test']['target'].shape[0], -1))

3. 将数据打包为DataLoader

注意:1. 多GPU怎么处理?

    # training set data loader
    train_loader = gluon.data.DataLoader(
                        gluon.data.ArrayDataset(
                            nd.array(all_data['train']['week'], ctx=ctx),
                            nd.array(all_data['train']['day'], ctx=ctx),
                            nd.array(all_data['train']['recent'], ctx=ctx),
                            nd.array(all_data['train']['target'], ctx=ctx)
                        ),
                        batch_size=batch_size,
                        shuffle=True
    )

    # validation set data loader
    val_loader = gluon.data.DataLoader(
                    gluon.data.ArrayDataset(
                        nd.array(all_data['val']['week'], ctx=ctx),
                        nd.array(all_data['val']['day'], ctx=ctx),
                        nd.array(all_data['val']['recent'], ctx=ctx),
                        nd.array(all_data['val']['target'], ctx=ctx)
                    ),
                    batch_size=batch_size,
                    shuffle=False
    )

    # testing set data loader
    test_loader = gluon.data.DataLoader(
                    gluon.data.ArrayDataset(
                        nd.array(all_data['test']['week'], ctx=ctx),
                        nd.array(all_data['test']['day'], ctx=ctx),
                        nd.array(all_data['test']['recent'], ctx=ctx),
                        nd.array(all_data['test']['target'], ctx=ctx)
                    ),
                    batch_size=batch_size,
                    shuffle=False
    )

4. 将上面正则化用的均值方差保存到.npz文件中

# save Z-score mean and std
stats_data = {}
for type_ in ['week', 'day', 'recent']:
	stats = all_data['stats'][type_]
	stats_data[type_ + '_mean'] = stats['mean']
	stats_data[type_ + '_std'] = stats['std']

# 以压缩的.npz 格式将多个数组保存到一个文件中
# 要保存到文件的数组。每个数组都将以其对应的关键字名称保存到输出文件中,字典形式
np.savez_compressed(
   os.path.join(params_path, 'stats_data'),
   **stats_data
)

5. 损失函数

注意:1. 是否可以换其他的计算损失方法?这个是最优的吗?

# loss function MSE
loss_function = gluon.loss.L2Loss()

6. 加载模型!!!(重点分析下模型中的注意力机制、图卷积)

注意:1. 模型结构; 2. 多GPU 3. 模型输入输出;3. get_backbones函数弄明白;4. 模型参数初始化

	all_backbones = get_backbones(args.config, adj_filename, ctx)
    net = model(num_for_predict, all_backbones)
    net.initialize(ctx=ctx)
    for val_w, val_d, val_r, val_t in val_loader:
        net([val_w, val_d, val_r])
        break
    net.initialize(ctx=ctx, init=MyInit(), force_reinit=True)

7. 构造训练器,可视化监控,以及在训练前先计算验证集的损失,对测试集进行预测,及结果评估

    # initialize a trainer to train model
    trainer = gluon.Trainer(net.collect_params(), optimizer,
                            {'learning_rate': learning_rate})

    # initialize a SummaryWriter to write information into logs dir
    sw = SummaryWriter(logdir=params_path, flush_secs=5)

    # compute validation loss before training
    compute_val_loss(net, val_loader, loss_function, sw, epoch=0)

    # compute testing set MAE, RMSE, MAPE before training
    evaluate(net, test_loader, true_value, num_of_vertices, sw, epoch=0)

8. 开始训练模型

注意:1. 分析下evaluate方法

    # train model
    global_step = 1
    for epoch in range(1, epochs + 1):

        for train_w, train_d, train_r, train_t in train_loader:

            start_time = time()

            with autograd.record():
                output = net([train_w, train_d, train_r])
                print('模型输出:',len(output),len(output[0]),len(output[0][0])) #(batch_size,307,12)
                print('每一个传感器的输出:',output[0][0])  # 与配置文件中的num_for_predict一致

                l = loss_function(output, train_t)
            l.backward()
            trainer.step(train_t.shape[0])
            training_loss = l.mean().asscalar()

            sw.add_scalar(tag='training_loss',
                          value=training_loss,
                          global_step=global_step)

            print('global step: %s, training loss: %.2f, time: %.2fs'
                  % (global_step, training_loss, time() - start_time))
            global_step += 1

        # logging the gradients of parameters for checking convergence
        for name, param in net.collect_params().items():
            try:
                sw.add_histogram(tag=name + "_grad",
                                 values=param.grad(),
                                 global_step=global_step,
                                 bins=1000)
            except:
                print("can't plot histogram of {}_grad".format(name))

        # compute validation loss
        # 训练完一个epoch后,计算验证集的损失
        compute_val_loss(net, val_loader, loss_function, sw, epoch)

        # evaluate the model on testing set
        # 训练完一个epoch后,对测试集再进行预测,及结果评估
        evaluate(net, test_loader, true_value, num_of_vertices, sw, epoch)

        params_filename = os.path.join(params_path,
                                       '%s_epoch_%s.params' % (model_name,
                                                               epoch))
        net.save_parameters(params_filename)
        print('save parameters to file: %s' % (params_filename))

    # close SummaryWriter
    sw.close()

9. 对测试集进行测试,并将结果保存到指定文件夹中(配置文件中的prediction_filename)

注意:1. 分析下predict方法;2. 取testLoader中小部分试试;3. 加载模型的方法,gpu,cpu,多gpu等等

    # 所有epoch训练结束后,如果需要对测试集进行测试,就将结果保存到prediction_filename中
    if 'prediction_filename' in training_config:
        prediction_path = training_config['prediction_filename']

        prediction = predict(net, test_loader)

        np.savez_compressed(
            os.path.normpath(prediction_path),
            prediction=prediction,
            ground_truth=all_data['test']['target']
        )

你可能感兴趣的:(#,交通预测,python,交通流量预测,mxnet)