



1. 首先安装jdk1.8 配置环境变量等


2. 安装Anaconda环境,配置环境变量


3. 安装spark,配置环境变量(spark1.6.1)


4. 安装hadoop,配置环境变量(hadoop2.6.0)


5. 安装pycharm







1. Centos7自带的python

 # python            #输入python命令,查看自带的版本


 # which python         #查看自带python的位置


 # cd /usr/bin/
 # ls -al python*         #查看python情况



2. 安装python3.5可能使用的依赖

 # yum install openssl-devel bzip2-devel expat-devel gdbm-devel readline-devel sqlite-devel

3. 下载python

 # cd /opt/
 # wget ""

4. 解压下载好的压缩包

 # tar -zxvf Python-3.5.0.tgz

5. 配置编译

 # sudo mkdir /usr/local/python3
 # sudo Python-3.5.0/configure --prefix=/usr/local/python3
 # sudo make
 # sudo make install

6. 备份旧版本python,链接新版本python

 # cd /usr/bin/
 # sudo mv python python.bak,
 # sudo ln -s /usr/local/python3/bin/python3  /usr/bin/python

7. 修改yum配置文件

 # vim /usr/bin/yum
 将文件中第一行#!/usr/bin/python 改为 #!/usr/bin/python2.7,意思就是将python2指向python2.7



三、python demo






import jieba
from wordcloud import WordCloud

f = open('G:\\book\\tlbb.txt','r').read()
s = {}
f = jieba.cut(f)
for w in f:
    if len(w) > 1:
        previous_count = s.get(w,0)
        s[w] = previous_count+1

word = sorted(s.items(),key=lambda (word,count):count, reverse = True)
word = word[1:1000]
#print word[:100]
wordcloud = WordCloud(font_path = 'D:\\Anaconda\\anaconda\\Lib\\site-packages\\matplotlib\\mpl-data\\fonts\\ttf\\MSYHBD.TTF').fit_words(word)
import matplotlib.pyplot as plt




四、python on spark





from pyspark import SparkContext

sc = SparkContext("local","Simple App")
doc = sc.parallelize([['a','b','c'],['b','d','d']])
words = doc.flatMap(lambda d:d).distinct().collect()
word_dict = {w:i for w,i in zip(words,range(len(words)))}
word_dict_b = sc.broadcast(word_dict)

def wordCountPerDoc(d):
    wd = word_dict_b.value
    for w in d:
        if dict.get(wd[w],0):
            dict[wd[w]] +=1
            dict[wd[w]] = 1
    return dict


D:\Anaconda\anaconda\python.exe E:/pythonworkspace/pythontest001/Test001/
[{0: 1, 1: 1, 2: 1}, {2: 1, 3: 2}]
17/11/21 15:00:33 INFO SparkContext: Invoking stop() from shutdown hook

Process finished with exit code 0
