Neural Networks Hyperparameter tuning in tensorflow 2.0 by SiDdhartha(原文链接)
TensorFlow 2.0 神经网络超参调优 (英文翻译)
原文:
When building machine learning models, you need to choose various hyperparameters, such as the dropout rate in a layer or the learning rate. These decisions impact model metrics, such as accuracy. Therefore, an important step in the machine learning workflow is to identify the best hyperparameters for your problem, which often involves experimentation. This process is known as “Hyperparameter Optimization” or “Hyperparameter Tuning”. Typically people use grid search, but grid search is computationally very expensive and less interactive, To solve such problems TensorFlow 2.0 provides HParams dashboard in TensorBoard, which can be visualized within notebook easily.
翻译:
在构建机器学习模型时需要选择各种超参数,例如丢弃率或学习率,这会影响到模型的准确性等指标。因此,在机器学习中的一个关键步骤是确定最佳超参数,而这往往与实验相关,此过程被称为“超参优化”或“超参调整”。人们通常使用网格搜索,但网格搜索计算成本高且交互性较差。为了解决这个问题,1.TensorFlow 2.0在TensorBoard中提供了HParams仪表板,可以在笔记本电脑上友好地可视化该过程。
1.TensorFlow :是一个开源软件库,用于各种感知和语言理解任务的机器学习。
原文:
The HParams dashboard in TensorBoard provides several tools to help with this process of identifying the best experiment or most promising sets of hyperparameters.
This tutorial will focus on the following steps:
- Experiment setup and HParams summary
- Adapt TensorFlow runs to log hyperparameters and metrics
- Start runs and log them all under one parent directory
- Visualize the results in TensorBoard’s HParams dashboard
翻译:
TensorBoard中的HParams仪表板提供了多种工具,用于辅助识别最佳实验或者超参数集合的过程。
本教程将重点介绍以下步骤:
- 配置实验和HParams摘要
- 调整TensorFlow以记录超参数和指标
- 多次运行将所得结果全部记录在一个父目录下
- 在TensorBoard的HParams仪表板中可视化结果
原文:
Start by installing TF 2.0 and loading the TensorBoard notebook extension:
# !pip install -q tf-nightly-2.0-preview
!pip install tf-nightly-gpu-2.0-preview
# Load the TensorBoard notebook extension
%load_ext tensorboard
# Clear any logs from previous runs
!rm -rf ./logs/
翻译:
首先安装TF 2.0并加载TensorBoard扩展:
# 安装TF 2.0
!pip install tf-nightly-gpu-2.0-preview
# 加载TensorBoard扩展
%load_ext tensorboard
# 清除之前运行的所有日志
!rm -rf ./logs/
原文:
0.0 The Dataset
I have used Titanic: Machine Learning from Disaster from kaggle, you can download and find description of dataset on kaggle. I have used google colab and hence uploaded data in google drive.
0.1 Mount google drive
I have uploaded data on google drive, Learn How to use data from google drive here.
from google.colab import drive
drive.mount('/content/drive')
翻译:
0.0 数据集
泰坦尼克数据集可以在kaggle下载,这是数据集和描述。
0.1 加载谷歌驱动器
首先在google drive中上传数据,然后在google colab加载google drive。
from google.colab import drive
drive.mount('/content/drive')
原文:
1.0 Import libraries
from __future__ import absolute_import, division, print_function, unicode_literals
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorboard.plugins.hparams import api as hp
from tensorflow import feature_column
from tensorflow.keras import layers
from sklearn.model_selection import train_test_split
print("GPU Available: ", tf.test.is_gpu_available())
翻译:
1.0 导入库文件
from __future__ import absolute_import, division, print_function, unicode_literals
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorboard.plugins.hparams import api as hp
from tensorflow import feature_column
from tensorflow.keras import layers
from sklearn.model_selection import train_test_split
print("GPU Available: ", tf.test.is_gpu_available())
原文:
2.0 Load and preprocess Data
2.1 Use Pandas to create a dataframe
data = pd.read_csv('drive/My Drive/collab data/titanic/train.csv')
data.head(5)
2.2 Missing Data
2.2.1 Check missing values
data.isnull().sum()
2.2.2 Missing value handling
mean_value = round(data['Age'].mean())
mode_value = data['Embarked'].mode()[0]
value = {'Age': mean_value, 'Embarked': mode_value}
data.fillna(value=value,inplace=True)
data.dropna(axis=1,inplace=True)
data.shape
2.3 Explore data with pandas_profiling library
import pandas_profiling as pdpf
pdpf.ProfileReport(data)
翻译:
2.0 载入并预处理数据
2.1 用Pandas创建dataframe
data = pd.read_csv('drive/My Drive/collab data/titanic/train.csv')
data.head(5)
2.2 缺失数据
2.2.1 检查缺失值
data.isnull().sum()
2.2.2 处理缺失值
mean_value = round(data['Age'].mean())
mode_value = data['Embarked'].mode()[0]
value = {'Age': mean_value, 'Embarked': mode_value}
data.fillna(value=value,inplace=True)
data.dropna(axis=1,inplace=True)
data.shape
2.3 用pandas_profiling库探索数据
import pandas_profiling as pdpf
pdpf.ProfileReport(data)
原文:
3.0 Train, val, test Split
We will divide data into train, validation, test data with 3:1:1 ratio
train, test = train_test_split(data, test_size=0.2)
train, val = train_test_split(train, test_size=0.25)
print(len(train), 'train examples')
print(len(val), 'validation examples')
print(len(test), 'test examples')
>>534 train examples
>>178 validation examples
>>179 test examples
翻译:
3.0 分割训练集、验证集、测试集
按照3:1:1比例分割训练集、验证集、测试集。
原文:
4.0 Input pilpe line
4.1 Create an input pipeline using tf.data
# A utility method to create a tf.data dataset from a Pandas Dataframe
def df_to_dataset(dataframe, shuffle=True, batch_size=32):
dataframe = dataframe.copy()
labels = dataframe.pop('Survived')
ds = tf.data.Dataset.from_tensor_slices((dict(dataframe), labels))
if shuffle:
ds = ds.shuffle(buffer_size=len(dataframe))
ds = ds.batch(batch_size)
return ds
4.2 Feature columns
4.2.1 Decide which types of features you have in data
#numarical features
num_c = ['Age','Fare','Parch','SibSp']
bucket_c = ['Age'] #bucketized numerical feature
#categorical features
cat_i_c = ['Embarked', 'Pclass','Sex'] #indicator columns
cat_e_c = ['Ticket'] # embedding column
4.2.2 Scaler function
def get_scal(feature):
def minmax(x):
mini = train[feature].min()
maxi = train[feature].max()
return (x - mini)/(maxi-mini)
return(minmax)
4.2.3 Create Feature Columns
# Numerical columns
feature_columns = []
for header in num_c:
scal_input_fn = get_scal(header)
feature_columns.append(feature_column.numeric_column(header, normalizer_fn=scal_input_fn))
# Bucketized columns
Age = feature_column.numeric_column("Age")
age_buckets = feature_column.bucketized_column(Age, boundaries=[18, 25, 30, 35, 40, 45, 50, 55, 60, 65])
feature_columns.append(age_buckets)
# Categorical indicator columns
for feature_name in cat_i_c:
vocabulary = data[feature_name].unique()
cat_c = tf.feature_column.categorical_column_with_vocabulary_list(feature_name, vocabulary)
one_hot = feature_column.indicator_column(cat_c)
feature_columns.append(one_hot)
# Categorical embedding columns
for feature_name in cat_e_c:
vocabulary = data[feature_name].unique()
cat_c = tf.feature_column.categorical_column_with_vocabulary_list(feature_name, vocabulary)
embeding = feature_column.embedding_column(cat_c, dimension=50)
feature_columns.append(embeding)
# Crossed columns
vocabulary = data['Sex'].unique()
Sex = tf.feature_column.categorical_column_with_vocabulary_list('Sex', vocabulary)
crossed_feature = feature_column.crossed_column([age_buckets, Sex], hash_bucket_size=1000)
crossed_feature = feature_column.indicator_column(crossed_feature)
feature_columns.append(crossed_feature)
len(feature_columns)
翻译:
4.0 输入流水线
4.1 用tf.data创建一个输入流水线
# 一个由Pandas Dataframe转tf.data dataset的通用方法
def df_to_dataset(dataframe, shuffle=True, batch_size=32):
dataframe = dataframe.copy()
labels = dataframe.pop('Survived')
ds = tf.data.Dataset.from_tensor_slices((dict(dataframe), labels))
if shuffle:
ds = ds.shuffle(buffer_size=len(dataframe))
ds = ds.batch(batch_size)
return ds
4.2 特征列
4.2.1 定义特征列类型
#数值型特征
num_c = ['Age','Fare','Parch','SibSp'] #数值列
bucket_c = ['Age'] #分桶列
#分类型特征
cat_i_c = ['Embarked', 'Pclass','Sex'] #指标列
cat_e_c = ['Ticket'] # 嵌入列
4.2.2 定义标准化方法
def get_scal(feature):
def minmax(x):
mini = train[feature].min()
maxi = train[feature].max()
return (x - mini)/(maxi-mini)
return(minmax)
4.2.3 创建特征列
# 数值列
feature_columns = []
for header in num_c:
scal_input_fn = get_scal(header)
feature_columns.append(feature_column.numeric_column(header, normalizer_fn=scal_input_fn))
# 分桶列
Age = feature_column.numeric_column("Age")
age_buckets = feature_column.bucketized_column(Age, boundaries=[18, 25, 30, 35, 40, 45, 50, 55, 60, 65])
feature_columns.append(age_buckets)
# 分类指标列
for feature_name in cat_i_c:
vocabulary = data[feature_name].unique()
cat_c = tf.feature_column.categorical_column_with_vocabulary_list(feature_name, vocabulary)
one_hot = feature_column.indicator_column(cat_c)
feature_columns.append(one_hot)
# 分类嵌入列
for feature_name in cat_e_c:
vocabulary = data[feature_name].unique()
cat_c = tf.feature_column.categorical_column_with_vocabulary_list(feature_name, vocabulary)
embeding = feature_column.embedding_column(cat_c, dimension=50)
feature_columns.append(embeding)
# 组合列
vocabulary = data['Sex'].unique()
Sex = tf.feature_column.categorical_column_with_vocabulary_list('Sex', vocabulary)
crossed_feature = feature_column.crossed_column([age_buckets, Sex], hash_bucket_size=1000)
crossed_feature = feature_column.indicator_column(crossed_feature)
feature_columns.append(crossed_feature)
len(feature_columns)
原文:
5.0 Experiment setup and the HParams experiment summary
Experiment with five hyperparameters in the model:
- Number of units in the first dense layer
- Number of units in the first dense layer
- Dropout rate in the dropout layer
- Optimizer
- L2 Regularization parameter
List the values to try, and log an experiment configuration to TensorBoard. This step is optional: you can provide domain information to enable more precise filtering of hyperparameters in the UI, and you can specify which metrics should be displayed, here we are using ‘accuracy’ metric, you can chose any other matric and you can use as many hperparameters as you want. In hp.HParam function you have to specify the name of parameter and values.
HP_NUM_UNITS1 = hp.HParam('num_units 1', hp.Discrete([4,8,16]))
HP_NUM_UNITS2 = hp.HParam('num_units 2', hp.Discrete([4,8]))
HP_DROPOUT = hp.HParam('dropout', hp.RealInterval(0.2, 0.5))
HP_OPTIMIZER = hp.HParam('optimizer', hp.Discrete(['adam', 'sgd','RMSprop']))
HP_L2 = hp.HParam('l2 regularizer', hp.RealInterval(.001,.01))
METRIC_ACCURACY = 'accuracy'
with tf.summary.create_file_writer('logs/hparam_tuning').as_default():
hp.hparams_config(
hparams=[HP_NUM_UNITS1,HP_NUM_UNITS2, HP_DROPOUT,HP_L2 ,HP_OPTIMIZER],
metrics=[hp.Metric(METRIC_ACCURACY, display_name='Accuracy')],
)
If you choose to skip this step, you can use a string literal wherever you would otherwise use an HParam value: e.g., hparams['dropout'] instead of hparams[HP_DROPOUT]
翻译:
5.0 配置实验和HParams摘要
在模型中实验下面5个超参数:
- 第一个全连接层的神经元节点数;
- 第二个全连接层的神经元节点数;
- 丢失层里的丢弃率;
- 优化算法;
- L2正则化参数;
列出各种要尝试的值,并将实验的配置记录到TensorBoard。此步骤为可选的:在UI里提供一个范围以更精确地过滤超参数集,且指定哪些指标应被显示出来,这里用到的是“精确度”指标,可以根据具体情况选择其他的指标。在hp.HParam方法中必须指定参数名和值。
HP_NUM_UNITS1 = hp.HParam('num_units 1', hp.Discrete([4,8,16]))
HP_NUM_UNITS2 = hp.HParam('num_units 2', hp.Discrete([4,8]))
HP_DROPOUT = hp.HParam('dropout', hp.RealInterval(0.2, 0.5))
HP_OPTIMIZER = hp.HParam('optimizer', hp.Discrete(['adam', 'sgd','RMSprop']))
HP_L2 = hp.HParam('l2 regularizer', hp.RealInterval(.001,.01))
METRIC_ACCURACY = 'accuracy'
with tf.summary.create_file_writer('logs/hparam_tuning').as_default():
hp.hparams_config(
hparams=[HP_NUM_UNITS1,HP_NUM_UNITS2, HP_DROPOUT,HP_L2 ,HP_OPTIMIZER],
metrics=[hp.Metric(METRIC_ACCURACY, display_name='Accuracy')],
)
如果选择跳过这步,你要用字符串来代替HParam value。例如用hparams['dropout'] 代替 hparams[HP_DROPOUT]。
原文:
6.0 Adapt TensorFlow runs to log hyperparameters and metrics
The model will be quite simple: a input feature layer and two hidden dense layers with a dropout layer between them and a output dense sigmoid layer. The training code will look familiar, although the hyperparameters are no longer hardcoded. Instead, the hyperparameters are provided in an hparams dictionary and used throughout the training function:
6.1 Create a feature layer
Now that we have defined our feature columns, we will use a DenseFeatures layer to input them to our Keras model.
feature_layer = tf.keras.layers.DenseFeatures(feature_columns)
6.2 Define model in a function
def train_test_model(hparams):
model = tf.keras.Sequential([
feature_layer,
layers.Dense(hparams[HP_NUM_UNITS1], kernel_regularizer=tf.keras.regularizers.l2(0.001), activation='relu'),
layers.Dropout(hparams[HP_DROPOUT]),
layers.Dense(hparams[HP_NUM_UNITS2], kernel_regularizer=tf.keras.regularizers.l2(0.001), activation='relu'),
layers.Dense(1, activation='sigmoid')
])
model.compile(optimizer=hparams[HP_OPTIMIZER],
loss='binary_crossentropy',
metrics=['accuracy'])
model.fit(train_ds,
validation_data=val_ds,
epochs=5)
_, accuracy = model.evaluate(val_ds)
return accuracy
6.3 Log an hparams summary
For each run, log an hparams summary with the hyperparameters and final accuracy:
def run(run_dir, hparams):
with tf.summary.create_file_writer(run_dir).as_default():
hp.hparams(hparams) # record the values used in this trial
accuracy = train_test_model(hparams)
tf.summary.scalar(METRIC_ACCURACY, accuracy, step=1)
翻译:
6.0 调整TensorFlow以记录超参数和指标
这个模型很简单:由一个特征输入层和两个层间带丢弃层的全连接隐层,以及一个带sigmoid函数的输出层组成。虽然超参数不再是写死的,但训练代码也没有太多变化。只是超参数由hparams字典提供,可以在整个训练函数中使用:
6.1 创建特征层
我们已经定义好了特征列,接下来用DenseFeatures把特征列输入到Keras模型。
feature_layer = tf.keras.layers.DenseFeatures(feature_columns)
6.2 定义模型
def train_test_model(hparams):
model = tf.keras.Sequential([
feature_layer,
layers.Dense(hparams[HP_NUM_UNITS1], kernel_regularizer=tf.keras.regularizers.l2(0.001), activation='relu'),
layers.Dropout(hparams[HP_DROPOUT]),
layers.Dense(hparams[HP_NUM_UNITS2], kernel_regularizer=tf.keras.regularizers.l2(0.001), activation='relu'),
layers.Dense(1, activation='sigmoid')
])
model.compile(optimizer=hparams[HP_OPTIMIZER],
loss='binary_crossentropy',
metrics=['accuracy'])
model.fit(train_ds,
validation_data=val_ds,
epochs=5)
_, accuracy = model.evaluate(val_ds)
return accuracy
6.3 记录一次hparams摘要
每运行一次,记录所得的超参数集和准确率到hparams中。
def run(run_dir, hparams):
with tf.summary.create_file_writer(run_dir).as_default():
hp.hparams(hparams) # 记录这次实验用到的值
accuracy = train_test_model(hparams)
tf.summary.scalar(METRIC_ACCURACY, accuracy, step=1)
原文:
7.0 Start runs and log them all under one parent directory
You can now try multiple experiments, training each one with a different set of hyperparameters.
For simplicity, use a grid search: try all combinations of the discrete parameters and just the lower and upper bounds of the real-valued parameter. For more complex scenarios, it might be more effective to choose each hyperparameter value randomly (this is called a random search). There are more advanced methods that can be used.
Run a few experiments, which will take a few minutes:
session_num = 0
for num_units1 in HP_NUM_UNITS1.domain.values:
for num_units2 in HP_NUM_UNITS2.domain.values:
for dropout_rate in (HP_DROPOUT.domain.min_value, HP_DROPOUT.domain.max_value):
for l2 in (HP_L2.domain.min_value, HP_L2.domain.max_value):
for optimizer in HP_OPTIMIZER.domain.values:
hparams = {
HP_NUM_UNITS1: num_units1,
HP_NUM_UNITS2: num_units2,
HP_DROPOUT: dropout_rate,
HP_L2: l2,
HP_OPTIMIZER: optimizer
}
run_name = "run-%d" % session_num
print('--- Starting trial: %s' % run_name)
print({h.name: hparams[h] for h in hparams})
run('logs/hparam_tuning/' + run_name, hparams)
session_num += 1
翻译:
7.0 多次运行将所得结果全部记录在一个父目录下
现在可以多次运行实验,每一次训练能得到不同的超参数集。
简单起见使用网格搜索:尝试所有离散参数的组合和参数正好是真值上下限的情况。如果情况更复杂些,随机地选择每个超参数值可能效率更高(这被称为随机搜索)。另外还有更多的高级方法可用。
多跑几次实验,可能要花费点时间:
session_num = 0
for num_units1 in HP_NUM_UNITS1.domain.values:
for num_units2 in HP_NUM_UNITS2.domain.values:
for dropout_rate in (HP_DROPOUT.domain.min_value, HP_DROPOUT.domain.max_value):
for l2 in (HP_L2.domain.min_value, HP_L2.domain.max_value):
for optimizer in HP_OPTIMIZER.domain.values:
hparams = {
HP_NUM_UNITS1: num_units1,
HP_NUM_UNITS2: num_units2,
HP_DROPOUT: dropout_rate,
HP_L2: l2,
HP_OPTIMIZER: optimizer
}
run_name = "run-%d" % session_num
print('--- Starting trial: %s' % run_name)
print({h.name: hparams[h] for h in hparams})
run('logs/hparam_tuning/' + run_name, hparams)
session_num += 1
原文:
8.0 Visualize the results in TensorBoard’s HParams plugin
%tensorboard --logdir logs/hparam_tuning
Following photos are only for demo purpose, run code to see actual tensorboard
The left pane of the dashboard provides filtering capabilities that are active across all the views in the HParams dashboard:
- Filter which hyperparameters/metrics are shown in the dashboard
- Filter which hyperparameter/metrics values are shown in the dashboard
- Filter on run status (running, success, …)
- Sort by hyperparameter/metric in the table view
- Number of session groups to show (useful for performance when there are many experiments)
The HParams dashboard has three different views, with various useful information:
- The Table View lists the runs, their hyperparameters, and their metrics.
- The Parallel Coordinates View shows each run as a line going through an axis for each hyperparemeter and metric. Click and drag the mouse on any axis to mark a region which will highlight only the runs that pass through it. This can be useful for identifying which groups of hyperparameters are most important. The axes themselves can be re-ordered by dragging them.
- The Scatter Plot View shows plots comparing each hyperparameter/metric with each metric. This can help identify correlations. Click and drag to select a region in a specific plot and highlight those sessions across the other plots.
翻译:
8.0 在TensorBoard的HParams仪表板中可视化结果
%tensorboard --logdir logs/hparam_tuning
以下图片仅供演示,运行代码查看实际情况
面板左侧提供了在HParams仪表板中所有活动视图的过滤功能:
- 过滤面板中要显示的超参数或指标值名称
- 过滤面板中要显示的超参数或指标值数值
- 过滤运行状态(运行中,已成功等等)
- 在视图中按超参数或指标值排序
- 要显示的会话数(用于在多次实验中)
HParams仪表板中有三种页面视图,包含不同有用的信息:
- 表视图 列出了运行次数,以及超参数集和指标。
- 平行坐标视图 将每次运行显示为穿过超参数集和指标为坐标轴的一条直线,用于识别哪些超参数集是最重要的。还可以通过拖拽来重新排序轴本本身。
- 散点视图 按照每个超参数或指标对应每个指标生成比较图,用于识别它们的相关性。