https://www.tensorflow.org/versions/r0.12/tutorials/wide_and_deep/index.html#tensorflow-wide-deep-learning-tutorial
对于前向反馈和后向传输进行解释的博客:http://blog.csdn.net/zhangjunhit/article/details/53501680
In the previous TensorFlow Linear Model Tutorial(https://www.tensorflow.org/versions/r0.12/tutorials/wide/index.html), we trained a logistic regression model to predict the probability that the individual has an annual income of over 50,000 dollars using the Census Income Dataset. TensorFlow is great for training deep neural networks too, and you might be thinking which one you should choose—Well, why not both? Would it be possible to combine the strengths of both in one model?
在之前的TensorFlow Linear Model Tutorial中,我们训练了一个logistic回归模型原来预测是否某个人的年收入达到了50000美元(使用了Census Income Dataset(https://archive.ics.uci.edu/ml/datasets/Census+Income)数据)TensorFlow对于深层神经网络训练很有帮助,你可能会想你应该选择哪一个神经网络,为什么不呢?有没有可能把两者的优势结合在一起?
In this tutorial, we’ll introduce how to use the TF.Learn API to jointly train a wide linear model and a deep feed-forward neural network. This approach combines the strengths of memorization and generalization. It’s useful for generic large-scale regression and classification problems with sparse input features (e.g., categorical features with a large number of possible feature values). If you’re interested in learning more about how Wide & Deep Learning works, please check out our research paper.
在本教程中,我们将介绍如何使用TF.Learn API共同训练宽线性模型和深前馈神经网络。这种方法结合了记忆和概括的优点。它对于稀疏输入特征的一般大规模回归和分类问题是有用的(例如,具有大量可能的特征值的分类特征)。如果你想了解更多关于广泛和深入学习的知识,请查阅我们的研究论文(https://arxiv.org/abs/1606.07792)。
The figure above shows a comparison of a wide model (logistic regression with sparse features and transformations), a deep model (feed-forward neural network with an embedding layer and several hidden layers), and a Wide & Deep model (joint training of both). At a high level, there are only 3 steps to configure a wide, deep, or Wide & Deep model using the TF.Learn API:
(定义基础特征列)
First, let’s define the base categorical and continuous feature columns that we’ll use. These base columns will be the building blocks used by both the wide part and the deep part of the model.
首先,让我们定义我们将要使用的离散分类和连续特征列。这些基本列将是模型的宽部分和深部分使用的构建块。
import tensorflow as tf
#Categorical base columns
gender = tf.contrib.layers.sparse_column_with_keys=(column_name="gender",keys=["Female","Male"])
race = tf.contrib.layers.sparse_column_with_keys(column_name="race", keys=[
"Amer-Indian-Eskimo", "Asian-Pac-Islander", "Black", "Other", "White"])
education = tf.contrib.layers.sparse_column_with_hash_bucket("education",hash_bucket_size=1000)
relationship = tf.contrib.layers.sparse_column_with_hash_bucket("relationship",hash_bucket_size=100)
workclass = tf.contrib.layers.sparse_column_with_hash_bucket("workclasss",hash_bucket_size = 100)
occupation = tf.contrib.layers.sparse_column_with_hash_bucket("occupation",hash_bucket_size = 1000)
native_country = tf.contrib.layers.sparse_column_with_hash_bucket("native_country", hash_bucket_size=1000)
#Continuous base columns
age = tf.contrib.layers.real_valued_column("age")
age_buckets=tf.contrib.layers.bucketized_column(age,boundaries = [18,25,30,35,40,45,50,55,60,65])
education_num = tf.contrib.layers.real_valued_column("education_num")
capital_gain = tf.contrib.layers.real_valued_column("capital_gain")
capital_loss = tf.contrib.layers.real_valued_column("capital_loss")
hours_per_week = tf.contrib.layers.real_valued_column("hours_per_week")
The wide model is a linear model with a wide set of sparse and crossed feature columns:
宽模型是一组具有稀疏和交叉特征列的线性模型:
wide_columns = [
gender,native_country,education,occupation,workclass,relationship,age_buckets,
tf.contrib.layers.crossed_column([education,occupation],hash_bucket_size=int(1e4)),
tf.contrib.layers.crossed_column([native_country,occupation],hash_bucket_size=int(1e4)),
tf.contrib.layers.crossed_column([age_buckets,education,occupation],hash_bucket_size=int(1e6))]
Wide models with crossed feature columns can memorize sparse interactions between features effectively. That being said, one limitation of crossed feature columns is that they do not generalize to feature combinations that have not appeared in the training data. Let’s add a deep model with embeddings to fix that.
具有交叉特征列的宽模型可以有效地记忆特征间的稀疏交互。也就是说,交叉特征列的一个限制是,它们不能推广到没有出现在训练数据中的特征组合。让我们添加一个深模型嵌入固定。
The deep model is a feed-forward neural network, as shown in the previous figure. Each of the sparse, high-dimensional categorical features are first converted into a low-dimensional and dense real-valued vector, often referred to as an embedding vector. These low-dimensional dense embedding vectors are concatenated with the continuous features, and then fed into the hidden layers of a neural network in the forward pass. The embedding values are initialized randomly, and are trained along with all other model parameters to minimize the training loss. If you’re interested in learning more about embeddings, check out the TensorFlow tutorial on Vector Representations of Words, or Word Embedding on Wikipedia.
深模型是一个前向反馈神经网络,如前一个图所示。每一个稀疏的高维分类特征首先被转换成一个低维稠密的实值向量,通常称为嵌入向量。这些低维稠密嵌入向量与连续特征级联,然后在前向神经网络中隐藏到神经网络的隐含层中。嵌入值随机初始化,并与所有其他模型参数一起训练,以最大限度地减少训练损失。如果你有兴趣学习更多关于嵌入,检查出的词向量表示的tensorflow教程,或字嵌入维基百科。
We’ll configure the embeddings for the categorical columns using embedding_column, and concatenate them with the continuous columns:
我们将对categorical columns通过embedding_column进行嵌入,并将它们与连续列进行串联:
deep_columns = [
tf.contrib.layers.embedding_column(workclass,dimension=8),
tf.contrib.layers.embedding_column(education,dimension=8),
tf.contrib.layers.embedding_column(gender,dimension=8),
tf.contrib.layers.embedding_column(relationship,dimension=8),
tf.contrib.layers.embedding_column(native_country,dimension=8),
tf.contrib.layers.embedding_column(occupation,dimension=8),#no race???
age,education_num,capital_gain,capital_loss,hours_per_week]
The higher the dimension of the embedding is, the more degrees of freedom the model will have to learn the representations of the features. For simplicity, we set the dimension to 8 for all feature columns here. Empirically, a more informed decision for the number of dimensions is to start with a value on the order of log2(n) or k*(n^(1/4)), where n is the number of unique features in a feature column and k is a small constant (usually smaller than 10).
嵌入维数越高,模型的学习自由度就越高。为了简单起见,我们将这里的所有特征列设置为8。根据经验,更明智的决策的维数是大于log2(n) or k*(n^(1/4)),这里的n是特征列中唯一的特征,而k是一个小于10的常数。
Through dense embeddings, deep models can generalize better and make predictions on feature pairs that were previously unseen in the training data. However, it is difficult to learn effective low-dimensional representations for feature columns when the underlying interaction matrix between two feature columns is sparse and high-rank. In such cases, the interaction between most feature pairs should be zero except a few, but dense embeddings will lead to nonzero predictions for all feature pairs, and thus can over-generalize. On the other hand, linear models with crossed features can memorize these “exception rules” effectively with fewer model parameters.
通过密集的嵌入,深模型可以概括更好,在训练数据时可以预测到特征对里以前看不见的预测。然而,当两个特征列之间的底层交互矩阵稀疏且高秩时,很难学习有效的低维特征列表示方法。在这种情况下,除了少数,大多数特征之间的相互作用对应为零。但密集的嵌入会导致所有特征对的非零预测,从而会被过度概括。另一方面,具有交叉特征的线性模型可以用较少的模型参数有效地记住这些异常规则。
Now, let’s see how to jointly train wide and deep models and allow them to complement each other’s strengths and weaknesses.
现在,让我们看看如何共同培养广泛和深入的模式,并允许它们互补彼此的长处和短处。
The wide models and deep models are combined by summing up their final output log odds as the prediction, then feeding the prediction to a logistic loss function. All the graph definition and variable allocations have already been handled for you under the hood, so you simply need to create a DNNLinearCombinedClassifier:
可以通过将宽度模型和深度模型最终输出的对数几率作为预测值将两者联系起来,接下来把预测值引入logistic损失函数。所有的图定义和变量分配已经为您处理好了,所以你只需要创建一个dnnlinearcombinedclassifier:
m = tf.contrib.learn.DNNLinearCombinedClassifier(
"models",
linear_feature_columns = wide_columns,
dnn_feature_columns=deep_columns,
dnn_hidden_units=[100,50])
Before we train the model, let’s read in the Census dataset as we did in the TensorFlow Linear Model tutorial. The code for input data processing is provided here again for your convenience:
import tensorflow as tf
import tempfile
import pandas as pd
#Define the column names for the data sets
COLUMNS = ["age","workclass","fnlwgt","education","education_num",
"marital_status","occupation","relationship","race","gender",
"capital_gain","capital_loss","hours_per_week","native_country",
"income_bracket"]
CATEGORICAL_COLUMNS = ["workclass","education","marital_status","occupation","relationship","race",
"gender","native_country"]
CONTINUOUS_COLUMNS = ["age","education_num","capital_gain","capital_loss","hours_per_week"]
LABEL_COLUMN = "label"
#Download the training and test data to temporary files
#Alternatively,you can download them yourself and change train_file and test_file to your own paths
#Read the trining and test data sets into Pandas dataframe
df_train=pd.read_csv("adult.data",names=COLUMNS,skipinitialspace=True)
df_test = pd.read_csv("adult.test", names=COLUMNS, skipinitialspace=True, skiprows=1)
df_train[LABEL_COLUMN] = (df_train["income_bracket"].apply(lambda x:">50K" in x)).astype(int)
df_test[LABEL_COLUMN] = (df_test["income_bracket"].apply(lambda x :">50K" in x)).astype(int)
def input_fn(df):
#Creates a dictionary mapping from each continuous feature column name(k)
# to the values of that column stored in a constant Tensor
continuous_cols = {k:tf.constant(df[k].values) for k in CONTINUOUS_COLUMNS}
#Creates a dictionary mapping from each categorical feature column name (k)
#to the values of that column stored in a tf.SparseTensor
categorical_cols = {k:tf.SparseTensor(
indices = [[i,0]for i in range(df[k].size)],
values = df[k].values,
dense_shape=[df[k].size,1])
for k in CATEGORICAL_COLUMNS}
#Mearges the two dictionaries into one
feature_cols = dict(continuous_cols.items()|categorical_cols.items())
label = tf.constant(df[LABEL_COLUMN].values)
return feature_cols,label
def train_input_fn():
return input_fn(df_train)
def eval_input_fn():
return input_fn(df_test)
After reading in the data, you can train and evaluate the model:
在读入数据后,你就可以测试和评估模型了。
m.fit(input_fn = train_input_fn,steps = 200)
results = m.evaluate(input_fn=eval_input_fn,step = 1)
for key in sorted(results):
print("%s:%s"%(key,results[key]))
The first line of the output should be something like accuracy: 0.84429705. We can see that the accuracy was improved from about 83.6% using a wide-only linear model to about 84.4% using a Wide & Deep model. If you’d like to see a working end-to-end example, you can download our example code.
输出的第一行应该是精确的:0.84429705。我们可以看到,使用宽而深的模型,仅使用宽线性模型,精度从83.6%提高到84.4%左右。如果您希望看到一个工作端到端的示例,您可以下载我们的示例代码(https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/learn/wide_n_deep_tutorial.py)。
Note that this tutorial is just a quick example on a small dataset to get you familiar with the API. Wide & Deep Learning will be even more powerful if you try it on a large dataset with many sparse feature columns that have a large number of possible feature values. Again, feel free to take a look at our research paper for more ideas about how to apply Wide & Deep Learning in real-world large-scale maching learning problems.
请注意,本教程只是小数据集上的一个快速示例,以使您熟悉API。如果你在一个大数据集上进行广泛而深入的学习,它将具有更强大的功能。再次,随时让我们研究一下更多的想法如何将宽深在现实世界的大规模机器学习问题的学习(https://arxiv.org/abs/1606.07792)。