Presto on yarn解决方案

Deploying Presto on a YARN-Based Cluster

presto不像spark那样默认就支持yarn,spark与yarn兼容性很好, 只需要简单的配置下启动脚本和集群环境就可以在Yarn上运行spark任务。presto则不然它需要借助于slider。通过slider实现prestoon yarn。Yarn是一个通用资源管理系统,可为上层应用提供统一的资源管理和调度,它的引入为集群在利用率、资源统一管理和数据共享等方面带来了巨大好处。所以此方案就是把presto的应用提交到yarn上。prestoon yarn不可以直接使用官方提供的二进制包安装,需要重新编译presto,编译生成presto-yarn-package-1.6-SNAPSHOT-0.184.zip,通过slider来使用这个包。网上资料确实很少,基本上找不到presto on yarn的资料,只能依靠并不是很详细的官方文档,遇到错之后再排查。如果hadoop是cdh版编译slider时需要指定cdh版本默认是apache hadoop。译安装部署过程中遇到很多坑,现在把编译、安装、调试部署的过程记录下来仅供参考。

 

一、环境基本要求

•  Linux

•  Java 8, 64-bit

•  Presto-0.184

•  Zookeeper

•  Hadoop

•  Yarn

二、Presto基础配置及架构

•  Presto架构图

Presto on yarn解决方案_第1张图片

•  presto执行过程示意图

Presto on yarn解决方案_第2张图片

•  连接器

•   Presto支持从以下版本的Hadoop中读取Hive数据:支持以下文件类型:Text, SequenceFile, RCFile, ORC

1                   Apache Hadoop 1.x hive-hadoop1

2                   Apache Hadoop 2.x  hive-hadoop2

3                   Cloudera CDH4      hive-cdh4

4                   Cloudera CDH5      hive-cdh5

•   此外,需要有远程的Hive元数据。不支持本地或嵌入模式。 Presto不使用MapReduce,只需要HDFS

三、安装配置步骤

1、编译presto-on-yarn

(1)下载地址:https://github.com/prestodb/presto-yarn/

(2)编译:mvn clean package -Dpresto.version=0.184

Presto on yarn解决方案_第3张图片

从上面可以看出已经编译完了,在编译完的目录下可以找到presto-yarn-package-1.6-SNAPSHOT-0.184.zip

查看presto-yarn-package-1.6-SNAPSHOT-0.184.zip的目录结构


2、编译slider

下载地址:https://archive.apache.org/dist/incubator/slider/

编译slider,指定CDH版本,升级JDK

https://slider.incubator.apache.org/developing/building.html

(1)下载源码:apache-slider-0.91.0-incubating-source-release.tar.gz

(2)修改maven相关的配置

采用CDH版本hadoop(2.6.0-cdh5.7.0),hbase(1.2.0-cdh5.7.0),修改该目录下pom文件

2.6.0-cdh5.7.0
1.2.0-cdh5.7.0
1.7.0

注释掉slider-core和slider-funtest中对hadoop-minicluster包依赖


3)编译:mvn clean package -Dmaven.test.skip=true -DskipTests

Presto on yarn解决方案_第4张图片


在编译后的目录下找到slider-0.91.0-incubating-all.tar.gz

4slider配置,参照官方详细文档

https://prestodb.io/presto-yarn/installation-yarn-manual.html#deploying-presto-on-a-yarn-based-cluster,解压后,在conf目录下找到下面两个配置文件

slider-client.xml


    
      slider.client.resource.origin
      conf/slider-client.xml
      This is just for diagnostics
    
    
      slider.security.protocol.acl
      *
    
    
      slider.yarn.queue
      root.presto
      the name of the YARN queue to use.
    
    
      slider.yarn.queue.priority
      1
      the priority of the application.
    
    
      slider.am.login.keytab.required
      false
      Declare that a keytab must be provided.
    
   
      yarn.log-aggregation-enable
      true
    
    
      yarn.application.classpath
    $HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/share/hadoop/common/*,$HADOOP_COMMON_HOME/share/
hadoop/commo
n/lib/*,$HADOOP_HDFS_HOME/share/hadoop/hdfs/*,$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*,$HADOOP_YA
RN_HOME/shar
e/hadoop/yarn/*,$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*
    
    
      yarn.resourcemanager.address
      master1:8032
    
    
      yarn.resourcemanager.scheduler.address
      master2:8030
    
    
      fs.defaultFS
      hdfs://common/user/hadoop/.slider/
    
     
       slider.zookeeper.quorum
       node1:2181,node2:2181,node3:2181
    

slider-env.sh配置

export JAVA_HOME=${JAVA_HOME}
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR}
export SLIDER_JVM_OPTS="-server -Xmx40g -Xms4g -Xmn8g"

解压presto-yarn-package-1.6-SNAPSHOT-0.184.zip,获取appConfig-default.json,resources-default.json,修改里面的配置项设置分配presto的资源, 相关配置参照如下(官方配置说明文档https://prestodb.io/presto-yarn/installation-yarn-configuration-options.html),appConfig.json配置

{
  "schema": "http://example.org/specification/v2.0.0",
  "metadata": {
  },
  "global": {
    "java_home": "/usr/local/java",
    "site.global.app_user": "hadoop",
    "site.global.user_group": "hadoop",
    "site.global.app_name": "presto-server-0.184",
    "site.global.data_dir": "/data01/presto/data",
    "site.global.config_dir": "/data01/presto/etc",
    "zookeeper.quorum" : "node1:2181,node2:2181,node3:2181",
    "application.def": "hdfs://common/user/hadoop/.slider/package/presto/presto-yarn-package-1.6-S
NAPSHOT-0.184.zip",
    "site.global.singlenode": "false",
    "site.global.coordinator_host": "${COORDINATOR_HOST}",
    "site.global.app_pkg_plugin": "${AGENT_WORK_ROOT}/app/definition/package/plugins/",
    "site.global.presto_query_max_memory": "512GB",
    "site.global.presto_query_max_memory_per_node":"20GB",
    "site.global.presto_server_port": "18088",
    "site.global.catalog": "{'hive': ['connector.name=hive-hadoop2','hive.config.resources=/usr/lo
cal/hadoop/etc/hadoop/core-site.xml,/usr/local/hadoop/etc/hadoop/hdfs-site.xml','hive.metastore.ur
i=thrift://10.134.81.70:9083'], 'tpch': ['connector.name=tpch']}",
    "site.global.jvm_args": "['-server', '-Xmx40960M', '-XX:+UseG1GC', '-XX:G1HeapRegionSize=320M'
, '-XX:+UseGCOverheadLimit', '-XX:+ExplicitGCInvokesConcurrent', '-XX:+HeapDumpOnOutOfMemoryError'
, '-XX:OnOutOfMemoryError=kill -9 %p']",
    "site.global.log_properties": "['com.facebook.presto.hive=WARN','com.facebook.presto.server=IN
FO']",
    "site.global.additional_config_properties":"['task.max-worker-threads=50', 'distributed-joins-
enabled=true']"
  },
  "components": {
    "slider-appmaster": {
      "jvm.heapsize": "1024M"
    },
    "MYAPP_COMPONENT": {
    }
  }
}
 
{
  "schema" : "http://example.org/specification/v2.0.0",
  "metadata" : {
  },
  "global" : {
     "yarn.vcores": "1"
  },
  "components": {
    "slider-appmaster": {
    },
   "COORDINATOR": {
      "yarn.role.priority": "1",
      "yarn.component.instances": "1",
      "yarn.component.placement.policy": "1",
      "yarn.memory": "20000"
    },
    "WORKER": {
      "yarn.role.priority": "2",
      "yarn.component.instances": "20",
      "yarn.component.placement.policy": "1",
      "yarn.memory": "20000"
    }
  }
}

3、使用slider安装presto on yarn

在hdfs创建slider的相关目录及授予访问权限,确保当前用户有访问权限,在每个nodemanager创建presto的本地目录

[hadoop@master1 slider]$ slider package --install --name presto --package presto-yarn-package-1.6-SNAPSHOT-0.184.zip  
2017-09-25 14:23:00,327 [main] INFO  client.SliderClient - Installing package file:/usr/local/apache/slider-0.91.0-incubating/presto-yarn-package-1.6-SNAPSHOT-0.184.zip to hdfs://common/user/hadoop/.slider/package/presto/presto-yarn-package-1.6-SNAPSHOT-0.184.zip (overwrite set to false)
2017-09-25 14:23:03,114 [main] INFO  tools.SliderUtils - Reading metainfo.xml of size 2425
2017-09-25 14:23:03,115 [main] INFO  client.SliderClient - Found XML metainfo file in package
2017-09-25 14:23:03,135 [main] INFO  client.SliderClient - Creating summary metainfo file
2017-09-25 14:23:03,153 [main] INFO  client.SliderClient - Set application.def in your app config JSON to .slider/package/presto/presto-yarn-package-1.6-SNAPSHOT-0.184.zip
2017-09-25 14:23:03,154 [main] INFO  util.ExitUtil - Exiting with status 0

4、启动Presto OnYarn 

[hadoop@master1 slider]$ slider create presto-query --template appConfig.json --resources resources.json        
2017-09-25 15:06:35,776 [main] INFO  agent.AgentClientProvider - Validating app definition hdfs://common/user/hadoop/.slider/package/presto/presto-yarn-package-1.6-SNAPSHOT-0.184.zip
2017-09-25 15:06:35,778 [main] INFO  agent.AgentUtils - Reading metainfo at hdfs://common/user/hadoop/.slider/package/presto/presto-yarn-package-1.6-SNAPSHOT-0.184.zip
2017-09-25 15:06:35,945 [main] INFO  agent.AgentUtils - Got metainfo from summary file
2017-09-25 15:06:35,985 [main] INFO  client.SliderClient - No credentials requested
2017-09-25 15:06:36,087 [main] INFO  agent.AgentUtils - Reading metainfo at hdfs://common/user/hadoop/.slider/package/presto/presto-yarn-package-1.6-SNAPSHOT-0.184.zip
2017-09-25 15:06:36,094 [main] INFO  agent.AgentUtils - Got metainfo from summary file
2017-09-25 15:06:36,127 [main] INFO  launch.AbstractLauncher - Setting yarn.resourcemanager.am.retry-count-window-ms to 300000
2017-09-25 15:06:36,127 [main] INFO  launch.AbstractLauncher - Log include patterns: 
2017-09-25 15:06:36,127 [main] INFO  launch.AbstractLauncher - Log exclude patterns: 
2017-09-25 15:06:36,461 [main] INFO  slideram.SliderAMClientProvider - Loading all dependencies for AM.
2017-09-25 15:06:36,462 [main] INFO  tools.SliderUtils - Loading all dependencies from /usr/local/apache/slider-0.91.0-incubating/lib
2017-09-25 15:06:40,709 [main] INFO  agent.AgentClientProvider - Automatically uploading the agent tarball at hdfs://common/user/hadoop/.slider/cluster/presto-query/tmp/application_1504914229457_101524/agent
2017-09-25 15:06:40,791 [main] INFO  agent.AgentClientProvider - Validating app definition hdfs://common/user/hadoop/.slider/package/presto/presto-yarn-package-1.6-SNAPSHOT-0.184.zip
2017-09-25 15:06:40,796 [main] INFO  tools.SliderUtils - For faster submission of apps, upload dependencies using cmd dependency --upload
2017-09-25 15:06:40,804 [main] INFO  client.SliderClient - Submitting application application_1504914229457_101524
2017-09-25 15:06:40,808 [main] INFO  launch.AppMasterLauncher - Submitting application to Resource Manager
2017-09-25 15:06:41,036 [main] INFO  impl.YarnClientImpl - Submitted application application_1504914229457_101524
2017-09-25 15:06:41,039 [main] INFO  util.ExitUtil - Exiting with status 0

在Yarn的后台可以看到刚才启动的application

Presto on yarn解决方案_第5张图片

在zk里也可以看到对应的presto应用


5、简单的使用方法

(1)从yarn界面找到coordinator_address地址

Presto on yarn解决方案_第6张图片

(2)在命令行使用presto

 Presto on yarn解决方案_第7张图片

问题总结

问题1:

NetUtil.py:62 - SSLError: Failed to connect. Please check openssl library versions

解决:需要升级openssl(openssl>= 1.0.1e-16)

(1)         查看当前openssl版本:rpm -qa | grep openssl,如果版本低的话需要升级yum -y upgrade openssl

(2)         openssl,重启nodemanager

问题2:  presto对jdk版本有要求,需要升级jdk版本

问题3:编译slider指定cdh版本的时候可能maven依赖或中央仓库连不上等错误

解决:观察maven对应的错误信息解决即可,因为不同的环境编译报是不一样的

问题4:resouceConf.json配置的jvm参数太小,当查询大点的SQL时会报错

问题5:slider默认的参数可能与现在集群的环境不匹配,导致一些参数不可用导致报错,如:yarn.label.expression,只有当使用CapacityScheduler模式时才能使用这属性指定coordinator和worker,否则会报错;目前YARN使用的调度模式是Fair Scheduler,所以不支持。

问题6:appConfig.json 文件里的site.global.catalog配置项,默认很多参数都没有,需要对presto很熟悉遇到错误才马上定位问题所在,比如:connector.name =hive-hadoop2,hive.config.resources= =/usr/lo

cal/hadoop/etc/hadoop/core-site.xml,/usr/local/hadoop/etc/hadoop/hdfs-site.xml'。如果connector.name 值是hive-cdh5会报错;默认没有hive.config.resources配置项,如果不配这项识别不到HDFS环境。

问题7:当启动多个presto application时,需要注意相关配置项,要不会与之前启过的application冲去导致影响别的服务,因为每启动一个application都会先删除对应的data和etc目录,然后再重新生成配置和数据目录

问题8:site.global.jvm_args参数设置的值要大于site.global.presto_query_max_memory_per_node的值,否则报OOM

问题9:内存单位写成g报错,需要写成GB

问题10:....


你可能感兴趣的:(storm,spark))