Pexpect 是一个用来启动子程序并对其进行自动控制的纯 Python 模块,因此在模拟人对程序的控制上表现很好,由于项目需要,看了一些关于pexpect的api文档和部分源码,这里对立面一些关键函数使用部分进行解释说明,部分内部源码会进行截取以供分析,自己的例子也会给出,主要是针对ssh远程登录,然后执行一些hive任务,将执行期间的各种状态信息获取。这里给出个例子,可以根据业务需要封装成更通用的参数式传入的函数,这里只给出一个参数写死的示例程序。
代码举例,如下所示:
from logger import logger
try:
import pxssh
import getpass
import pexpect
except Exception,e:
logger.error('your system need install pexpect module'+str(e))
def test_case3(self):
try:
connectHandler = pxssh.pxssh(timeout=100)
fpwrite = open('./hadoop_logs/test.log','a')
connectHandler.login ('192.168.10.70', 'hadoop', 'hadoop')
connectHandler.sendline ('sudo sh /home/hadoop/.verlink/testhive.sh 2>&1')
#connectHandler.sendline('sleep 5')
connectHandler.logfile = fpwrite
flag = connectHandler.expect(['password', pexpect.EOF,pexpect.TIMEOUT],timeout=2)
if flag == 2:
logger.info('[Worker] pexpect.TIMEOUT')
logger.info('[Worker] ' + str(connectHandler))
connectHandler.prompt()
elif flag == 1:
logger.info('[Worker] pexpect.EOF')
#logger.info('[Worker] ' + str(connectHandler))
connectHandler.prompt()
else:
logger.info('[Worker] COMMAND NEED PASSWORD AND HAS GIVEN AUTOMATIC')
connectHandler.sendline("hadoop")
connectHandler.prompt()
print connectHandler.before
connectHandler.logout()
print connectHandler.pid
logger.info('[Worker] job exitstatus: '+str( connectHandler.exitstatus))
except Exception,e:
print "pxssh failed on login."
print str(e)
上面代码中在import中需要引入三个模块,pexpect pxssh getpass,其中pxssh和getpass都是继承自pexpect,pexpect在安装成功以后,这三个模块都会存在,只需在使用的时候引入即可
下面的函数test_case3是个测试例子,里面很好的表现了pexpect在模拟登录并且执行命令获取信息的整个过程。
line1: connectHandler = pxssh.pxssh(timeout=100)
该行代码声明一个pxssh的对象,该声明的全部格式,我在源码中看了一下,如下所示,可以根据里面具体的参数配合自己的业务进行配置
def __init__ (self, timeout=30, maxread=2000, searchwindowsize=None,
logfile=None, cwd=None, env=None, ignore_sighup=True, echo=True,
options={}, encoding=None, codec_errors='strict'):
line2: connectHandler.login ('192.168.10.70', 'hadoop', 'hadoop')
此处是调用的pxssh内部的方法login() 里面的参数分别是host,user和passwd,如果机器之间建立了互信不需要输入密码,这里也不需要更改,内部是一个just in case 逻辑,此时已经登录到了目标机器
line3: connectHandler.sendline ('sudo sh /home/hadoop/.verlink/testhive.sh 2>&1')
sendline函数是用来向目标机器发送指令的,在这个connectHandler成功login到logout之间,所有的指令都是在目标机器上执行的。这里由于有了sudo的指令,因此会要求输入password来到最高权限,因此这里会有一个交互的过程所以在下面的代码中完成了这个功能。
flag = connectHandler.expect(['password', pexpect.EOF,pexpect.TIMEOUT],timeout=2)
if flag == 2:
logger.info('[Worker] pexpect.TIMEOUT')
logger.info('[Worker] ' + str(connectHandler))
connectHandler.prompt()
elif flag == 1:
logger.info('[Worker] pexpect.EOF')
#logger.info('[Worker] ' + str(connectHandler))
connectHandler.prompt()
else:
logger.info('[Worker] COMMAND NEED PASSWORD AND HAS GIVEN AUTOMATIC')
connectHandler.sendline("hadoop")
connectHandler.prompt()
这里调用了expect的函数,意思是我期望得到什么,这里输入了sudo,期望得到password,所以在expect(['password,pexpect.EOF,pexpect,TIMEOUT'])中有了password这个词,这里可以用完全的词汇,也可以使用re正则,编译好的re pattern也是可以支持的,后面的timeout=2是自定义的,当该参数缺失的时候会调用全局的pexpect.TIMEOUT就是在connectHandler在构造函数中的timeout数值,因为这里当没有期望password的时候,我不想等待这么久的时间,所以我把这个timeout设置成了2.就是等待两秒,如果shell prompt没有提示要求输入password,就直接执行后面的语句即可,不必一直expect。其中的0,1,2 分别代表了list中的‘password’,pexpect.EOF, pexpect.TIMEOUT三个字段。
下面是expect的源码,可以参考:
def expect(self, pattern, timeout=-1, searchwindowsize=-1, async=False):
pattern部分是个list,timeout是可以自己进行定义的超时时间
line5: connectHandler.logfile = fpwrite
这里是指定shell命令的输出路径,fpwrite是前面打开的可追加的文件句柄。可以在程序执行的过程中不断的查看该fpwrite来看到程序的执行过程。
line6: connectHandler.prompt()
是说要等待shell 提示符 等到shell提示符了说明上一条shell指令已经获取到了,可以继续往下进行了。下面是prompt的函数源码。
def prompt(self, timeout=-1):
'''Match the next shell prompt.
This is little more than a short-cut to the :meth:`~pexpect.spawn.expect`
method. Note that if you called :meth:`login` with
``auto_prompt_reset=False``, then before calling :meth:`prompt` you must
set the :attr:`PROMPT` attribute to a regex that it will use for
matching the prompt.
Calling :meth:`prompt` will erase the contents of the :attr:`before`
attribute even if no prompt is ever matched. If timeout is not given or
it is set to -1 then self.timeout is used.
:return: True if the shell prompt was matched, False if the timeout was
reached.
'''
if timeout == -1:
timeout = self.timeout
i = self.expect([self.PROMPT, TIMEOUT], timeout=timeout)
if i==1:
return False
return True
在超时时间内等到shell提示符就可以继续执行指令,其中的成员变量self.PROMPT,我在源码中查了一下是
self.UNIQUE_PROMPT = "\[PEXPECT\][\$\#] "
self.PROMPT = self.UNIQUE_PROMPT
也就是linux中常用的shell 提示符 $ 和 #
最后
connectHandler.logout()
此时在远程机器上执行的所有指令就结束了,登出系统。
后面,可以获取到执行的pid,退出码 exitstatus等等信息,下面是源码中针对各个字段的解释,很有用
def __init__(self, timeout=30, maxread=2000, searchwindowsize=None,
logfile=None, encoding=None, codec_errors='strict'):
self.stdin = sys.stdin
self.stdout = sys.stdout
self.stderr = sys.stderr
self.searcher = None
self.ignorecase = False
self.before = None
self.after = None
self.match = None
self.match_index = None
self.terminated = True
self.exitstatus = None
self.signalstatus = None
# status returned by os.waitpid
self.status = None
# the child file descriptor is initially closed
self.child_fd = -1
self.timeout = timeout
self.delimiter = EOF
self.logfile = logfile
# input from child (read_nonblocking)
self.logfile_read = None
# output to send (send, sendline)
self.logfile_send = None
# max bytes to read at one time into buffer
self.maxread = maxread
# This is the read buffer. See maxread.
self.buffer = bytes() if (encoding is None) else text_type()
# Data before searchwindowsize point is preserved, but not searched.
self.searchwindowsize = searchwindowsize
# Delay used before sending data to child. Time in seconds.
# Most Linux machines don't like this to be below 0.03 (30 ms).
self.delaybeforesend = 0.05
# Used by close() to give kernel time to update process status.
# Time in seconds.
self.delayafterclose = 0.1
# Used by terminate() to give kernel time to update process status.
# Time in seconds.
self.delayafterterminate = 0.1
self.softspace = False
self.name = '<' + repr(self) + '>'
self.closed = True
# Unicode interface
self.encoding = encoding
command: /usr/bin/ssh
args: ['/usr/bin/ssh', '-q', '-l', 'hadoop', '192.168.10.70']
searcher: None
buffer (last 100 chars): 'sleep 5\r\n'
before (last 100 chars): 'sleep 5\r\n'
after:
match:
match_index: 2
exitstatus: None
flag_eof: False
pid: 20442
child_fd: 7
closed: False
timeout: 100
delimiter:
logfile:
logfile_read: None
logfile_send: None
maxread: 2000
ignorecase: False
searchwindowsize: None
delaybeforesend: 0.05
delayafterclose: 0.1
delayafterterminate: 0.1
sleep 5
20442
INFO:yoho_scheduler:[Worker] job exitstatus: 0