背景:
管理hadoop集群,需要批量执行操作。比如,修改hosts文件,将配置、jar包同步到每台机器
为什么使用fabric:
1.首先,在Instgram的技术博客里看到fabric的介绍;
2.之前在爬虫团队用fabric管理了200+阿里云的机器,可靠性、稳定性已经过验证;
3.学习成本低,安装方便;
fabric简介:
fabric的官方网站:http://www.fabfile.org/
官方网站对fabric的定义为:
Fabric is a Python (2.5-2.7) library and command-line tool for streamlining the use of SSH for application deployment or systems administration tasks.
我的理解:
fabric是一个python的module(再次证明python胶水语言的霸气),底层采用ssh连接,可以对目标机器集群进行批量部署或者系统管理
fabric的安装方法:
$ pip install fabric
这会把fabric依赖模块一起安装好
在命令行打fab命令,如果能看到这样的输出,就表明fabric已经安装好
Usage: fab [options] [:arg1,arg2=val2,host=foo,hosts='h1;h2',...] …
使用fabric:
1. 新建一个文件夹,作为我们的开发目录
$ cd ~
$ mkdir -p develop/python/fabricDev
$ cd develop/python/fabricDev
2. 新建 fabfile.py 文件,添加如下代码并保存:
#encoding=utf-8
from fabric.api import run
def host_type():
run(‘uname -s’)
解释:
fabric的fab命令会在当前目录下寻找fabfile.py这个文件
fabfile.py定义的每一个函数,就是一个命令
比如这里定义了host_type命令,用来获取操作系统名字
3. 在本机运行fabric,获取本机host_type
$ fab -H localhost host_type
解释:-H localhost指定在本机运行
打印输出:
liuyufan@liumatoMacBook-Pro ~/github/python/fabric % fab -H localhost host_type
[liuyufan@localhost:22] Executing task ‘host_type’
[liuyufan@localhost:22] run: uname -s
[liuyufan@localhost:22] out: Darwin
[liuyufan@localhost:22] out:
Done.
Disconnecting from localhost… done.
4. 获取hadoop集群4台机器的host_type
我在自己的电脑上搭了4台虚拟机做hadoop集群,现在我想用fabric获取这四台机器的host_type。我只需要将这4台机器ssh连接的ip, port, username, password设置好即可
编辑fabfile.py并保存
#encoding=utf-8
from fabric.api import run, env #(新增)env模块,指定运行环境信息
env.hosts=[ #(新增)
'[email protected]:22', #(新增)
'[email protected]:22', #(新增)
'[email protected]:22', #(新增)
'[email protected]:22', #(新增)
] #(新增)
env.password = ‘patrick’ #(新增)
def host_type():
run(‘uname -s’)
执行命令:
$ fab host_type
打印输出:
liuyufan@liumatoMacBook-Pro ~/github/python/fabric % fab host_type
[[email protected]:22] Executing task ‘host_type’
[[email protected]:22] run: uname -s
[[email protected]:22] out: Linux
[[email protected]:22] out:
[[email protected]:22] Executing task ‘host_type’
[[email protected]:22] run: uname -s
[[email protected]:22] out: Linux
[[email protected]:22] out:
[[email protected]:22] Executing task ‘host_type’
[[email protected]:22] run: uname -s
[[email protected]:22] out: Linux
[[email protected]:22] out:
[[email protected]:22] Executing task ‘host_type’
[[email protected]:22] run: uname -s
[[email protected]:22] out: Linux
[[email protected]:22] out:
Done.
Disconnecting from [email protected]… done.
Disconnecting from [email protected]… done.
Disconnecting from [email protected]… done.
Disconnecting from [email protected]… done.
可以看到fabric根据机器在env.hosts数组的顺序,顺序执行host_type命令。其实fabric还支持并行执行,下面用并行方式运行一下
5. 并行获取4台机器的host_type信息
编辑fabfile.py并保存
#encoding=utf-8
from fabric.api import run, env, parallel#(新增)
env.hosts=[
'[email protected]:22',
'[email protected]:22',
'[email protected]:22',
'[email protected]:22',
]
env.password = ‘patrick’
@parallel#(新增)
def host_type():
run(‘uname -s’)
解释:
每个host的每个task会启动一个新的进程,默认采用划窗算法保证不会启动过多进程。可以使用@parallel(pool_size=5)指定进程池的最大进程数
执行命令:
$ fab host_type
打印输出:
liuyufan@liumatoMacBook-Pro ~/github/python/fabric % fab host_type
[[email protected]:22] Executing task ‘host_type’
[[email protected]:22] Executing task ‘host_type’
[[email protected]:22] Executing task ‘host_type’
[[email protected]:22] Executing task ‘host_type’
[[email protected]:22] run: uname -s
[[email protected]:22] run: uname -s
[[email protected]:22] run: uname -s
[[email protected]:22] run: uname -s
[[email protected]:22] out: Linux
[[email protected]:22] out:
[[email protected]:22] out: Linux
[[email protected]:22] out:
[[email protected]:22] out: Linux
[[email protected]:22] out:
[[email protected]:22] out: Linux
[[email protected]:22] out:
Done.
到此fabric就可以在集群上运行了
是否需要parallel功能,可以根据具体功能做权衡。顺序执行具有fail-fast的优点,比如上面4台机器,假如第3台正好网络断开,这样fabric在连接到第三台机器的时候就会报错并退出。这样下次执行的时候只需要从第3台开始,将3、4添加到执行列表中即可
而并行执行有可能是第1,2,4台机器执行成功,而第3台却没有。这就需要从输出里查看哪台失败。假设有100台机器,碰巧是第42,57,69台机器没有执行成功,这就需要从100台输出里查看失败的机器,查找过程将会非常麻烦
6. fabric 其他有用的命令
# 1. 直接执行 shell 命令
# run(‘cat /var/crawl/client.xml |grep \‘)
# run(‘cmd 2′)
# run(‘cmd 3′)
# 2. 切换目录并执行
# with cd(‘/var/crawl’):
# run(‘echo hi >> test.txt’)
# 3. 判断文件或者目录是否存在
# if exists(‘var/crawl/client.xml’):
# print ‘Config file exists’
# else:
# print ‘Config file not exist’
# 4. 从远端服务器下载文件
# get(‘/remote/path/to/file’,'/local/path/’)
# 5. 上传文件到远端服务器
# put(‘/local/path/to/file’,'/remote/path’)
# 6. 嵌套运行
# with prefix(‘cd ~/shark-0.9.1/bin/’):
# with prefix(‘chmod +x *.sh’):
# run(‘shark-shell.sh’)
# 7. sudo
# sudo(“mkdir /var/www/new_docroot”, user=”www-data”)
# 8. 获取返回值并执行命令
# files = run(‘ls’)
# run(‘ls -l’, files)
上面这些操作基本可以满足日常运维需求
=== 完 ===