离散值特征
One-hot表示
叉乘
叉乘之后
稀疏特征做差乘获取共现信息
实现记忆的效果
优点:有效,广泛 用于工业界
缺点:需要人工设计;可能过拟合,所有特征都叉乘,相当于记住每一个样本
向量表达
eg:词表={AI,你,我,中国},他=[0.3, 0.2, 0.6,(n维向量)]
Word2vec工具
向量转换:将一个词转换为一个向量,通过向量之间的距离,衡量词语之间的距离
优点
带有语义信息,不同向量之间有相关性
兼容没有出现过的特征组合
更少人工参与
缺点:
过度泛化,推荐不怎么相关的产品
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Jjp93s3k-1575542017036)(/Users/bobwang/Library/Application Support/typora-user-images/image-20191204150705551.png)]
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-zuO37pHI-1575542017037)(/Users/bobwang/Library/Application Support/typora-user-images/image-20191204150815240.png)]
模型的构建
# 函数式API,功能API
input = keras.layers.Input(shape=x_train.shape[1:])
# 符合函数式的输入,对应deep
hidden1 = keras.layers.Dense(30, activation="relu")(input)
hidden2 = keras.layers.Dense(30, activation="relu")(hidden1)
# 将deep和wide结合
concat = keras.layers.concatenate([input, hidden2])
# 网络结构中的输出
output = keras.layers.Dense(1)(concat)
# 模型固化
model = keras.models.Model(inputs=[input,], outputs=[output,])
model.summary()
model.compile(loss="mean_squared_error", optimizer="sgd")
callbacks = [keras.callbacks.EarlyStopping(patience=5, min_delta=1e-2)]
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-wyh0NbWY-1575542017037)(/Users/bobwang/Library/Application Support/typora-user-images/image-20191204162742230.png)]
# 子类API
class WideDeepModel(keras.models.Model):
def __init__(self):
super(WideDeepModel, self).__init__()
self.hidden1_layer = keras.layers.Dense(30, activation="relu")
self.hidden2_layer = keras.layers.Dense(30, activation="relu")
self.output_layer = keras.layers.Dense(1)
def call(self, input):
hidden1 = self.hidden1_layer(input)
hidden2 = self.hidden2_layer(hidden1)
concat = keras.layers.concatenate([input, hidden2])
# 网络结构中的输出
output = self.output_layer(concat)
return output
# 模型固化
model = keras.models.Sequential([WideDeepModel(),])
model.build(input_shape=(None, 8))
model.summary()
model.compile(loss="mean_squared_error", optimizer="sgd")
callbacks = [keras.callbacks.EarlyStopping(patience=5, min_delta=1e-2)]
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Sya78TcR-1575542017038)(/Users/bobwang/Library/Application Support/typora-user-images/image-20191204164410606.png)]
input_wide = keras.layers.Input(shape=[5])
input_deep =keras.layers.Input(shape=[6])
hidden1 = keras.layers.Dense(30, activation="relu")(input_deep)
hidden2 = keras.layers.Dense(30,activation="relu")(hidden1)
concat = keras.layers.concatenate([input_wide, hidden2])
output = keras.layers.Dense(1)(concat)
model = keras.models.Model(inputs=[input_wide, input_deep], outputs =[output,])
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-ooN3XfFh-1575542017038)(/Users/bobwang/Library/Application Support/typora-user-images/image-20191205113504861.png)]
x_train_scaled_wide = x_train_scaled[:,:5]
x_train_scaled_deep = x_train_scaled[:, 2:]
x_valid_scaled_wide = x_valid_scaled[:, :5]
x_valid_scaled_deep = x_valid_scaled[:, 2:]
x_test_scaled_wide = x_test_scaled[:, :5]
x_test_scaled_deep = x_test_scaled[:, 2:]
history = model.fit([x_train_scaled_wide,x_train_scaled_deep], y_train,
validation_data = ([x_valid_scaled_wide, x_valid_scaled_deep], y_valid), callbacks = callbacks)
测试集合
model.evaluate([x_test_scaled, x_test_scaled], y_test)