程序说明
无自动拉起脚本
- Spark Streaming程序往往不稳定,遇到一些网络延迟或者部分节点挂掉会导致程序批次时间过长,从而影响程序的实时性。
- Spark Streaming 程序会因为一些不可控因素宕机,影响数据处理。
该脚本作用
- 会根据Spark在Yarn中的任务ID去判断程序是否宕机,如果程序宕机,则启动Spark程序。
- 会根据Yarn的Web页面获取当前批次的处理时长,如果批次执行时间过长的话就会重启Spark程序。
程序设计
- 获取程序的
ApplicationId
yarn application -list | grep Kafka2Spark2HBase
- 通过Python自带
subprocess
库调用shell命令并读取shell命令结果,获取结果后取返回值的第0个位置就是applicationId
import subprocess
application = subprocess.Popen('yarn application -list | grep Kafka2Spark2HBase ', shell=True, stdout=subprocess.PIPE)
application.wait()
line = str(application.stdout.read())
applicationId = line.split(' ', 1)[0]
print line
print applicationId
Kafka2Spark2HBase
是我的Spark Streaming程序名称,这里替换为自己的程序入口
如果获取为空,这说明程序未启动,此时触发宕机拉起程序
(详细代码里有)
- 根据ApplicationId获取Yarn的Web页面并获取当前运行批次的耗时
url = 'http://ip:8088/proxy/' + applicationId
page = urllib2.urlopen(url)
contents = page.read()
soup = BeautifulSoup(contents, "html.parser")
table = soup.find('table', id='activeJob-table')
tbody = table.find('tbody')
tds = tbody.find_all('td')
job_id = int(tds[0].get_text().strip())
job_time = tds[3].get_text().strip()
time_num = float(job_time.split(' ', 1)[0])
time_unit = job_time.split(' ', 1)[1]
print job_id, job_time
完整代码(Python版)
import urllib2
import os
import time
import subprocess
from bs4 import BeautifulSoup
application = subprocess.Popen('yarn application -list | grep Kafka2Spark2HBase ', shell=True, stdout=subprocess.PIPE)
application.wait()
line = str(application.stdout.read())
applicationId = line.split(' ', 1)[0]
if len(applicationId) == 0:
print '============================='
print time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
os.system('yarn application -kill ' + applicationId + ' ')
os.system('/home/spark/streaming-start.sh ')
exit("程序宕机,已重新拉起")
def case_ms(t):
return True
def case_s(t):
if t < 59:
return True
def case_min(t):
return False
def case_h(t):
return False
def default(t):
return False
url = 'http://tz-nn-01.rdsp.com:8088/proxy/' + applicationId
page = urllib2.urlopen(url)
contents = page.read()
soup = BeautifulSoup(contents, "html.parser")
table = soup.find('table', id='activeJob-table')
tbody = table.find('tbody')
tds = tbody.find_all('td')
job_id = int(tds[0].get_text().strip())
job_time = tds[3].get_text().strip()
time_num = float(job_time.split(' ', 1)[0])
time_unit = job_time.split(' ', 1)[1]
flag = False
if job_id > 0:
if time_unit == 'ms':
flag = case_ms(time_num)
if time_unit == 's':
flag = case_s(time_num)
if time_unit == 'min':
flag = case_min(time_num)
if time_unit == 'h':
flag = case_h(time_num)
if job_id == 0:
flag = True
if time_unit == 'h':
if time_num > 2:
flag = False
print time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()), flag, applicationId, job_id, job_time
if flag is False:
print '========================================'
print time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()), flag, applicationId, job_id, job_time
os.system('yarn application -kill ' + applicationId + ' ')
os.system('/home/spark/streaming-start.sh ')
print '批次超时,已重启。批次ID:', job_id
print '========================================'