20190703进展

  1. 基于预设中心点的聚类(title数据)
cluster 数量 表名 收敛epoch 收敛时间 备注
10 2 458 02:08:08 预设中心点,结果向少数几个类靠拢,预设中心点的指向型并不好
10 2 458 02:08:08 没有预设中心点,结果向少数几个类靠拢,可能是程序的问题
  1. query cluser命令

pai -name pytorch -project algo_public_dev -Dpython=3.6 -Dscript="file:///apsarapangu/disk1/hengsong.lhs/origin_deep_cluster_odps_6.tar.gz" -DentryFile="getClusterCenterofQuery.py" -Dtables="odps://graph_embedding/tables/hs_jingyan_query_related_top_query_1" -Doutputs="odps://graph_embedding/tables/hs_jingyan_query_cluster_result_1" -Dbucket="oss://bucket-automl/" -Darn="acs:ram::1293303983251548:role/graph2018" -Dhost="cn-hangzhou.oss-internal.aliyun-inc.com" -DuserDefinedParameters="--dataset mine_dataset" -DworkerCount=10;

pai -name pytorch -project algo_public_dev -Dpython=3.6 -Dscript="file:///apsarapangu/disk1/hengsong.lhs/origin_deep_cluster_odps_5.tar.gz" -DentryFile="clusterUsingPrecenter.py" -Dtables="odps://graph_embedding/tables/hs_jingyan_query_related_video_pool_2_2" -Dbucket="oss://bucket-automl/" -Darn="acs:ram::1293303983251548:role/graph2018" -Dhost="cn-hangzhou.oss-internal.aliyun-inc.com" -DuserDefinedParameters="--dataset mine_dataset" -DworkerCount=10;

pai -name pytorch -project algo_public_dev -Dpython=3.6 -Dscript="file:///apsarapangu/disk1/hengsong.lhs/origin_deep_cluster_odps_5.tar.gz" -DentryFile="clusterUsingPrecenter.py" -Dtables="odps://graph_embedding/tables/hs_jingyan_query_related_video_pool_2_3" -Doutputs="odps://graph_embedding/tables/hs_jingyan_query_cluster_result_title_1" -Dbucket="oss://bucket-automl/" -Darn="acs:ram::1293303983251548:role/graph2018" -Dhost="cn-hangzhou.oss-internal.aliyun-inc.com" -DuserDefinedParameters="--dataset mine_dataset" -DworkerCount=10;

  1. 还是存在query聚类不收敛的问题

pai -name pytorch -project algo_public_dev -Dpython=3.6 -Dscript="file:///apsarapangu/disk1/hengsong.lhs/origin_deep_cluster_odps_6.tar.gz" -DentryFile="getClusterCenterofQuery.py" -Dtables="odps://graph_embedding_intern/tables/zj_gul_videos_embedding_infos_" -Dbucket="oss://bucket-automl/" -Darn="acs:ram::1293303983251548:role/graph2018" -Dhost="cn-hangzhou.oss-internal.aliyun-inc.com" -DuserDefinedParameters="--dataset mine_dataset" -DworkerCount=10;

  1. 阈值确定方式:使用hard的阈值作为初始阈值,在其基础上进行优化

  2. video_emb 测试

20190703095509454gpys0yyi2

http://logview.odps.aliyun-inc.com:8080/logview/?h=http://service-corp.odps.aliyun-inc.com/api&p=graph_embedding&i=20190703095514748g61kw8y_24c3488a_91cb_47ca_8ad0_5c86b72c422e&token=TTV2b0FJeTJxUks1RHMyNmRFeGlad0E4RXU4PSxPRFBTX09CTzoxMjkzMzAzOTgzMjUxNTQ4LDE1NjI3NTI1MjAseyJTdGF0ZW1lbnQiOlt7IkFjdGlvbiI6WyJvZHBzOlJlYWQiXSwiRWZmZWN0IjoiQWxsb3ciLCJSZXNvdXJjZSI6WyJhY3M6b2RwczoqOnByb2plY3RzL2dyYXBoX2VtYmVkZGluZy9pbnN0YW5jZXMvMjAxOTA3MDMwOTU1MTQ3NDhnNjFrdzh5XzI0YzM0ODhhXzkxY2JfNDdjYV84YWQwXzVjODZiNzJjNDIyZSJdfV0sIlZlcnNpb24iOiIxIn0=

  1. title_2 测试 without precenter

20190703100237695gnucrzvj2

http://logview.odps.aliyun-inc.com:8080/logview/?h=http://service-corp.odps.aliyun-inc.com/api&p=graph_embedding&i=2019070310030268gammw8y_f7d2a672_0d87_425a_bcbb_bc145bfcef58&token=TGdkcDgyZGg3M2pJMXhBZHQ0bHJYN0VmVTZBPSxPRFBTX09CTzoxMjkzMzAzOTgzMjUxNTQ4LDE1NjI3NTI5ODMseyJTdGF0ZW1lbnQiOlt7IkFjdGlvbiI6WyJvZHBzOlJlYWQiXSwiRWZmZWN0IjoiQWxsb3ciLCJSZXNvdXJjZSI6WyJhY3M6b2RwczoqOnByb2plY3RzL2dyYXBoX2VtYmVkZGluZy9pbnN0YW5jZXMvMjAxOTA3MDMxMDAzMDI2OGdhbW13OHlfZjdkMmE2NzJfMGQ4N180MjVhX2JjYmJfYmMxNDViZmNlZjU4Il19XSwiVmVyc2lvbiI6IjEifQ==

  1. title_2 测试 with precenter

20190703095801980grh5nu69

http://logview.odps.aliyun-inc.com:8080/logview/?h=http://service-corp.odps.aliyun-inc.com/api&p=graph_embedding&i=20190703100046977g23mw8y_977a3163_416d_4a42_b837_c093abc7a717&token=dUl2OFFWT3VFNWMwbEpCTjlxMisybHpaNGZBPSxPRFBTX09CTzoxMjkzMzAzOTgzMjUxNTQ4LDE1NjI3NTI4NTIseyJTdGF0ZW1lbnQiOlt7IkFjdGlvbiI6WyJvZHBzOlJlYWQiXSwiRWZmZWN0IjoiQWxsb3ciLCJSZXNvdXJjZSI6WyJhY3M6b2RwczoqOnByb2plY3RzL2dyYXBoX2VtYmVkZGluZy9pbnN0YW5jZXMvMjAxOTA3MDMxMDAwNDY5NzdnMjNtdzh5Xzk3N2EzMTYzXzQxNmRfNGE0Ml9iODM3X2MwOTNhYmM3YTcxNyJdfV0sIlZlcnNpb24iOiIxIn0=

你可能感兴趣的:(20190703进展)