Pexpect 是一个用来启动子程序并对其进行自动控制的纯 Python 模块,因此在模拟人对程序的控制上表现很好,由于项目需要,看了一些关于pexpect的api文档和部分源码,这里对立面一些关键函数使用部分进行解释说明,部分内部源码会进行截取以供分析,自己的例子也会给出,主要是针对ssh远程登录,然后执行一些hive任务,将执行期间的各种状态信息获取。这里给出个例子,可以根据业务需要封装成更通用的参数式传入的函数,这里只给出一个参数写死的示例程序。
代码举例,如下所示:
from logger import logger try: import pxssh import getpass import pexpect except Exception,e: logger.error('your system need install pexpect module'+str(e)) def test_case3(self): try: connectHandler = pxssh.pxssh(timeout=100) fpwrite = open('./hadoop_logs/test.log','a') connectHandler.login ('192.168.10.70', 'hadoop', 'hadoop') connectHandler.sendline ('sudo sh /home/hadoop/.verlink/testhive.sh 2>&1') #connectHandler.sendline('sleep 5') connectHandler.logfile = fpwrite flag = connectHandler.expect(['password', pexpect.EOF,pexpect.TIMEOUT],timeout=2) if flag == 2: logger.info('[Worker] pexpect.TIMEOUT') logger.info('[Worker] ' + str(connectHandler)) connectHandler.prompt() elif flag == 1: logger.info('[Worker] pexpect.EOF') #logger.info('[Worker] ' + str(connectHandler)) connectHandler.prompt() else: logger.info('[Worker] COMMAND NEED PASSWORD AND HAS GIVEN AUTOMATIC') connectHandler.sendline("hadoop") connectHandler.prompt() print connectHandler.before connectHandler.logout() print connectHandler.pid logger.info('[Worker] job exitstatus: '+str( connectHandler.exitstatus)) except Exception,e: print "pxssh failed on login." print str(e)上面代码中在import中需要引入三个模块,pexpect pxssh getpass,其中pxssh和getpass都是继承自pexpect,pexpect在安装成功以后,这三个模块都会存在,只需在使用的时候引入即可
下面的函数test_case3是个测试例子,里面很好的表现了pexpect在模拟登录并且执行命令获取信息的整个过程。
line1: connectHandler = pxssh.pxssh(timeout=100)该行代码声明一个pxssh的对象,该声明的全部格式,我在源码中看了一下,如下所示,可以根据里面具体的参数配合自己的业务进行配置
def __init__ (self, timeout=30, maxread=2000, searchwindowsize=None, logfile=None, cwd=None, env=None, ignore_sighup=True, echo=True, options={}, encoding=None, codec_errors='strict'):
line2: connectHandler.login ('192.168.10.70', 'hadoop', 'hadoop')此处是调用的pxssh内部的方法login() 里面的参数分别是host,user和passwd,如果机器之间建立了互信不需要输入密码,这里也不需要更改,内部是一个just in case 逻辑,此时已经登录到了目标机器
line3: connectHandler.sendline ('sudo sh /home/hadoop/.verlink/testhive.sh 2>&1')sendline函数是用来向目标机器发送指令的,在这个connectHandler成功login到logout之间,所有的指令都是在目标机器上执行的。这里由于有了sudo的指令,因此会要求输入password来到最高权限,因此这里会有一个交互的过程所以在下面的代码中完成了这个功能。
flag = connectHandler.expect(['password', pexpect.EOF,pexpect.TIMEOUT],timeout=2) if flag == 2: logger.info('[Worker] pexpect.TIMEOUT') logger.info('[Worker] ' + str(connectHandler)) connectHandler.prompt() elif flag == 1: logger.info('[Worker] pexpect.EOF') #logger.info('[Worker] ' + str(connectHandler)) connectHandler.prompt() else: logger.info('[Worker] COMMAND NEED PASSWORD AND HAS GIVEN AUTOMATIC') connectHandler.sendline("hadoop") connectHandler.prompt()这里调用了expect的函数,意思是我期望得到什么,这里输入了sudo,期望得到password,所以在expect(['password,pexpect.EOF,pexpect,TIMEOUT'])中有了password这个词,这里可以用完全的词汇,也可以使用re正则,编译好的re pattern也是可以支持的,后面的timeout=2是自定义的,当该参数缺失的时候会调用全局的pexpect.TIMEOUT就是在connectHandler在构造函数中的timeout数值,因为这里当没有期望password的时候,我不想等待这么久的时间,所以我把这个timeout设置成了2.就是等待两秒,如果shell prompt没有提示要求输入password,就直接执行后面的语句即可,不必一直expect。其中的0,1,2 分别代表了list中的‘password’,pexpect.EOF, pexpect.TIMEOUT三个字段。
下面是expect的源码,可以参考:
def expect(self, pattern, timeout=-1, searchwindowsize=-1, async=False):pattern部分是个list,timeout是可以自己进行定义的超时时间
line5: connectHandler.logfile = fpwrite这里是指定shell命令的输出路径,fpwrite是前面打开的可追加的文件句柄。可以在程序执行的过程中不断的查看该fpwrite来看到程序的执行过程。
line6: connectHandler.prompt()是说要等待shell 提示符 等到shell提示符了说明上一条shell指令已经获取到了,可以继续往下进行了。下面是prompt的函数源码。
def prompt(self, timeout=-1): '''Match the next shell prompt. This is little more than a short-cut to the :meth:`~pexpect.spawn.expect` method. Note that if you called :meth:`login` with ``auto_prompt_reset=False``, then before calling :meth:`prompt` you must set the :attr:`PROMPT` attribute to a regex that it will use for matching the prompt. Calling :meth:`prompt` will erase the contents of the :attr:`before` attribute even if no prompt is ever matched. If timeout is not given or it is set to -1 then self.timeout is used. :return: True if the shell prompt was matched, False if the timeout was reached. ''' if timeout == -1: timeout = self.timeout i = self.expect([self.PROMPT, TIMEOUT], timeout=timeout) if i==1: return False return True在超时时间内等到shell提示符就可以继续执行指令,其中的成员变量self.PROMPT,我在源码中查了一下是
self.UNIQUE_PROMPT = "\[PEXPECT\][\$\#] " self.PROMPT = self.UNIQUE_PROMPT也就是linux中常用的shell 提示符 $ 和 #
最后
connectHandler.logout()此时在远程机器上执行的所有指令就结束了,登出系统。
后面,可以获取到执行的pid,退出码 exitstatus等等信息,下面是源码中针对各个字段的解释,很有用
def __init__(self, timeout=30, maxread=2000, searchwindowsize=None, logfile=None, encoding=None, codec_errors='strict'): self.stdin = sys.stdin self.stdout = sys.stdout self.stderr = sys.stderr self.searcher = None self.ignorecase = False self.before = None self.after = None self.match = None self.match_index = None self.terminated = True self.exitstatus = None self.signalstatus = None # status returned by os.waitpid self.status = None # the child file descriptor is initially closed self.child_fd = -1 self.timeout = timeout self.delimiter = EOF self.logfile = logfile # input from child (read_nonblocking) self.logfile_read = None # output to send (send, sendline) self.logfile_send = None # max bytes to read at one time into buffer self.maxread = maxread # This is the read buffer. See maxread. self.buffer = bytes() if (encoding is None) else text_type() # Data before searchwindowsize point is preserved, but not searched. self.searchwindowsize = searchwindowsize # Delay used before sending data to child. Time in seconds. # Most Linux machines don't like this to be below 0.03 (30 ms). self.delaybeforesend = 0.05 # Used by close() to give kernel time to update process status. # Time in seconds. self.delayafterclose = 0.1 # Used by terminate() to give kernel time to update process status. # Time in seconds. self.delayafterterminate = 0.1 self.softspace = False self.name = '<' + repr(self) + '>' self.closed = True # Unicode interface self.encoding = encoding
command: /usr/bin/ssh args: ['/usr/bin/ssh', '-q', '-l', 'hadoop', '192.168.10.70'] searcher: None buffer (last 100 chars): 'sleep 5\r\n' before (last 100 chars): 'sleep 5\r\n' after: <class 'pexpect.exceptions.TIMEOUT'> match: <class 'pexpect.exceptions.TIMEOUT'> match_index: 2 exitstatus: None flag_eof: False pid: 20442 child_fd: 7 closed: False timeout: 100 delimiter: <class 'pexpect.exceptions.EOF'> logfile: <open file './hadoop_logs/test.log', mode 'a' at 0x7f370df6a300> logfile_read: None logfile_send: None maxread: 2000 ignorecase: False searchwindowsize: None delaybeforesend: 0.05 delayafterclose: 0.1 delayafterterminate: 0.1 sleep 5 20442 INFO:yoho_scheduler:[Worker] job exitstatus: 0