机器学习基础:案例研究——week 4

作业代码:

import graphlab
# Limit number of worker processes. This preserves system memory, which prevents hosted notebooks from crashing.
graphlab.set_runtime_config('GRAPHLAB_DEFAULT_NUM_PYLAMBDA_WORKERS', 4)
#导入数据
people = graphlab.SFrame("people_wiki.gl/")
#建立一个单词统计向量(为每条评论建立单词统计向量)【分词】
people["word_count"] = graphlab.text_analytics.count_words(people["text"])
#计算td-idf
tfidf = graphlab.text_analytics.tf_idf(people["word_count"])
people['tfidf'] = tfidf
  1. Top word count words for Elton John
elton = people[people["name"] == "Elton John"]
elton[["word_count"]].stack("word_count",new_column_name = ["word","count"]).sort("count",ascending = False)

输出结果如下:

机器学习基础:案例研究——week 4_第1张图片
Paste_Image.png

2 . Top TF-IDF words for Elton John

elton[["tfidf"]].stack("tfidf",new_column_name = ["word","tfidf"]).sort("tfidf",ascending = False)

输出结果如下:

机器学习基础:案例研究——week 4_第2张图片
Paste_Image.png

3 . The cosine distance between 'Elton John's and 'Victoria Beckham's articles (represented with TF-IDF) falls within which range?
4 . The cosine distance between 'Elton John's and 'Paul McCartney's articles (represented with TF-IDF) falls within which range?
5 . Who is closer to 'Elton John', 'Victoria Beckham' or 'Paul McCartney'?

victoria = people[people['name'] == 'Victoria Beckham']
paul = people[people["name"] == "Paul McCartney"]
graphlab.distances.cosine(elton['tfidf'][0],victoria['tfidf'][0])
graphlab.distances.cosine(elton["tfidf"][0],paul["tfidf"][0])

输出结果如下:
0.9567006376655429
0.8250310029221779

knn_tfdif_model = graphlab.nearest_neighbors.create(people,features = ["tfidf"],label = "name",distance = "cosine")
knn_wordcount_model = graphlab.nearest_neighbors.create(people,features = ["word_count"],label = "name",distance = "cosine")

6 . Who is the nearest neighbor to 'Elton John' using raw word counts?
8 . Who is the nearest neighbor to 'Victoria Beckham' using raw word counts?

knn_wordcount_model.query(elton)
knn_wordcount_model.query(victoria)

输出结果如下:

机器学习基础:案例研究——week 4_第3张图片
Paste_Image.png
机器学习基础:案例研究——week 4_第4张图片
Paste_Image.png

7 . Who is the nearest neighbor to 'Elton John' using TF-IDF?
9 . Who is the nearest neighbor to 'Victoria Beckham' using TF-IDF?

knn_tfdif_model.query(elton)
knn_tfdif_model.query(victoria)

输出结果如下:

机器学习基础:案例研究——week 4_第5张图片
Paste_Image.png
机器学习基础:案例研究——week 4_第6张图片
Paste_Image.png

你可能感兴趣的:(机器学习基础:案例研究——week 4)