20190709工作进展

  1. 60个epoch测试

pai -name pytorch -project algo_public_dev -Dpython=3.6 -Dscript="file:///apsarapangu/disk1/hengsong.lhs/origin_deep_cluster_odps_5.tar.gz" -DentryFile="clusterUsingPrecenter.py" -Dtables="odps://graph_embedding/tables/hs_jingyan_query_related_video_pool_2_3,odps://graph_embedding/tables/hs_jingyan_query_related_top_query_1" -Doutputs="odps://graph_embedding/tables/hs_jingyan_query_cluster_result_title5,odps://graph_embedding/tables/hs_jingyan_query_cluster_result_title5_0,odps://graph_embedding/tables/hs_jingyan_query_cluster_result_title5_1,odps://graph_embedding/tables/hs_jingyan_query_cluster_result_title5_2,odps://graph_embedding/tables/hs_jingyan_query_cluster_result_title5_3,odps://graph_embedding/tables/hs_jingyan_query_cluster_result_title5_4" -Dbucket="oss://bucket-automl/" -Darn="acs:ram::1293303983251548:role/graph2018" -Dhost="cn-hangzhou.oss-internal.aliyun-inc.com" -DuserDefinedParameters="" -DworkerCount=10;

  1. 60 epoch with same center

pai -name pytorch -project algo_public_dev -Dpython=3.6 -Dscript="file:///apsarapangu/disk1/hengsong.lhs/origin_deep_cluster_odps_5.tar.gz" -DentryFile="clusterUsingSameCenter.py" -Dtables="odps://graph_embedding/tables/hs_jingyan_query_related_video_pool_2_3,odps://graph_embedding/tables/hs_jingyan_query_related_top_query_1" -Doutputs="odps://graph_embedding/tables/hs_jingyan_query_cluster_result_title6,odps://graph_embedding/tables/hs_jingyan_query_cluster_result_title6_0,odps://graph_embedding/tables/hs_jingyan_query_cluster_result_title6_1,odps://graph_embedding/tables/hs_jingyan_query_cluster_result_title6_2,odps://graph_embedding/tables/hs_jingyan_query_cluster_result_title6_3,odps://graph_embedding/tables/hs_jingyan_query_cluster_result_title6_4" -Dbucket="oss://bucket-automl/" -Darn="acs:ram::1293303983251548:role/graph2018" -Dhost="cn-hangzhou.oss-internal.aliyun-inc.com" -DuserDefinedParameters="" -DworkerCount=10;

https://logview.alibaba-inc.com/logview/?h=http://service-corp.odps.aliyun-inc.com/api&p=graph_embedding&i=20190709032927801g0wmw8y_7387d687_4759_4cb9_8dbd_86d52f8bfd42&token=b2x0T2FCNHJ2dzhjYzlMY1J3SzJOaGx4alB3PSxPRFBTX09CTzoxMjkzMzAzOTgzMjUxNTQ4LDE1NjMyNDc3NzMseyJTdGF0ZW1lbnQiOlt7IkFjdGlvbiI6WyJvZHBzOlJlYWQiXSwiRWZmZWN0IjoiQWxsb3ciLCJSZXNvdXJjZSI6WyJhY3M6b2RwczoqOnByb2plY3RzL2dyYXBoX2VtYmVkZGluZy9pbnN0YW5jZXMvMjAxOTA3MDkwMzI5Mjc4MDFnMHdtdzh5XzczODdkNjg3XzQ3NTlfNGNiOV84ZGJkXzg2ZDUyZjhiZmQ0MiJdfV0sIlZlcnNpb24iOiIxIn0=

  1. 60 epoch without same center 1000 class

pai -name pytorch -project algo_public_dev -Dpython=3.6 -Dscript="file:///apsarapangu/disk1/hengsong.lhs/origin_deep_cluster_odps_5.tar.gz" -DentryFile="clusterUsingPrecenter.py" -Dtables="odps://graph_embedding/tables/hs_jingyan_query_related_video_pool_2_3,odps://graph_embedding/tables/hs_jingyan_query_related_top_query_1" -Doutputs="odps://graph_embedding/tables/hs_jingyan_query_cluster_result_title7,odps://graph_embedding/tables/hs_jingyan_query_cluster_result_title7_0,odps://graph_embedding/tables/hs_jingyan_query_cluster_result_title7_1,odps://graph_embedding/tables/hs_jingyan_query_cluster_result_title7_2,odps://graph_embedding/tables/hs_jingyan_query_cluster_result_title7_3,odps://graph_embedding/tables/hs_jingyan_query_cluster_result_title7_4" -Dbucket="oss://bucket-automl/" -Darn="acs:ram::1293303983251548:role/graph2018" -Dhost="cn-hangzhou.oss-internal.aliyun-inc.com" -DuserDefinedParameters="" -DworkerCount=10;

  1. 60 epoch without same center --result


    20190709工作进展_第1张图片
    泳衣类

    20190709工作进展_第2张图片
    儿童婴儿

    女鞋-潮鞋

    20190709工作进展_第3张图片
    拖鞋女
  2. title问题的终结:

pai -name pytorch -project algo_public_dev -Dpython=3.6 -Dscript="file:///apsarapangu/disk1/hengsong.lhs/origin_deep_cluster_odps_5.tar.gz" -DentryFile="test_query_with_title.py" -Dtables="odps://graph_embedding/tables/hs_jingyan_query_related_video_pool_2_3,odps://graph_embedding/tables/hs_jingyan_query_related_top_query_3" -Doutputs="odps://graph_embedding/tables/hs_query_title_6" -Dbucket="oss://bucket-automl/" -Darn="acs:ram::1293303983251548:role/graph2018" -Dhost="cn-hangzhou.oss-internal.aliyun-inc.com" -DuserDefinedParameters="" -DworkerCount=1;

pai -name pytorch -project algo_public_dev -Dpython=3.6 -Dscript="file:///apsarapangu/disk1/hengsong.lhs/origin_deep_cluster_odps_5.tar.gz" -DentryFile="test_query_with_title.py" -Dtables="odps://graph_embedding/tables/hs_tmp_video_emb_0,odps://graph_embedding/tables/hs_jingyan_query_related_top_query_128" -Doutputs="odps://graph_embedding/tables/hs_query_title_4" -Dbucket="oss://bucket-automl/" -Darn="acs:ram::1293303983251548:role/graph2018" -Dhost="cn-hangzhou.oss-internal.aliyun-inc.com" -DuserDefinedParameters="" -DworkerCount=1;

使用query和title的词向量计算欧氏距离取前K大就可以得到非常好的结果,比自动编码器的效果要好很多。。

hs_query_title_1:title和query的对应效果
hs_query_title_2:title之间的对应效果
hs_query_title_3:video_emb之间的对应效果
video_emb和query的对应效果暂时没有办法得到,因为使用alinlp得到的词向量只有50/100/200三种,而video_emb是128维的。。

create table if not exists graph_embedding.hs_heter_graph_embedding_out_nearest_neighbor_006(
node_id bigint,
emb string
) LIFECYCLE 14;

hs_heter_graph_embedding_out_nearest_neighbor_006

PAI -name am_vsearch_nearest_neighbor_014 -project algo_market
-Dcluster="{"worker":{"count":40,"gpu":100}}"
-Ddim=100
-Did_col="node_id"
-Dvector_col="emb"
-Dinput_slice=40
-Dtopk=100
-Dnprob=512
-Dmetric="l2"
-Dinput="odps://graph_embedding/tables/hs_heter_graph_embedding_video_recall_"
-Dquery="odps://graph_embedding/tables/hs_heter_graph_embedding_ave_info_"
-Doutputs="odps://graph_embedding/tables/hs_heter_graph_embedding_out_nearest_neighbor_006"
-DenableDynamicCluster=true -DmaxTrainingTimeInHour=60;

  1. 10k query结果

http://logview.odps.aliyun-inc.com:8080/logview/?h=http://service-corp.odps.aliyun-inc.com/api&p=graph_embedding&i=20190709130124349g21gaf3_0c7b0914_1272_4427_9460_5829509135f5&token=RjVtcFRtYUVkc043WVl6dnNNRjRzaldKekxzPSxPRFBTX09CTzoxMjkzMzAzOTgzMjUxNTQ4LDE1NjMyODIwOTAseyJTdGF0ZW1lbnQiOlt7IkFjdGlvbiI6WyJvZHBzOlJlYWQiXSwiRWZmZWN0IjoiQWxsb3ciLCJSZXNvdXJjZSI6WyJhY3M6b2RwczoqOnByb2plY3RzL2dyYXBoX2VtYmVkZGluZy9pbnN0YW5jZXMvMjAxOTA3MDkxMzAxMjQzNDlnMjFnYWYzXzBjN2IwOTE0XzEyNzJfNDQyN185NDYwXzU4Mjk1MDkxMzVmNSJdfV0sIlZlcnNpb24iOiIxIn0=

结果在hs_query_title_6中

  1. 查询分区表
    select * from tbcdm.dim_tb_itm where ds=max_pt('tbcdm.dim_tb_itm') limit 10;

你可能感兴趣的:(20190709工作进展)