Apache Zeppelin on CDH搭建

本文基于centos6.4、CDH版本5.7.6、spark版本为1.6.0

1.环境准备

git1.7.1、maven3.3.9、JDK1.8

2.下载最新版zeepline源码

wget http://mirror.bit.edu.cn/apache/zeppelin/zeppelin-0.7.3/zeppelin-0.7.3.tgz
tar -zxvf zeppelin-0.7.3.tgz 
cd zeeplin-0.7.3

3.编译

mvn -X clean package -Pspark-1.6 -Dhadoop.version=2.6.0-cdh5.7.6 -Phadoop-2.6  -Pyarn -Ppyspark -Psparkr  -Pvendor-repo -DskipTests -Pbuild-distr

[INFO] Zeppelin ........................................... SUCCESS [  8.360 s]
[INFO] Zeppelin: Interpreter .............................. SUCCESS [  5.909 s]
[INFO] Zeppelin: Zengine .................................. SUCCESS [ 22.396 s]
[INFO] Zeppelin: Display system apis ...................... SUCCESS [ 10.373 s]
[INFO] Zeppelin: Spark dependencies ....................... SUCCESS [ 32.613 s]
[INFO] Zeppelin: Spark .................................... SUCCESS [ 18.004 s]
[INFO] Zeppelin: Markdown interpreter ..................... SUCCESS [  0.734 s]
[INFO] Zeppelin: Angular interpreter ...................... SUCCESS [  0.259 s]
[INFO] Zeppelin: Shell interpreter ........................ SUCCESS [  0.374 s]
[INFO] Zeppelin: Livy interpreter ......................... SUCCESS [02:06 min]
[INFO] Zeppelin: HBase interpreter ........................ SUCCESS [  2.358 s]
[INFO] Zeppelin: Apache Pig Interpreter ................... SUCCESS [  2.589 s]
[INFO] Zeppelin: PostgreSQL interpreter ................... SUCCESS [  0.371 s]
[INFO] Zeppelin: JDBC interpreter ......................... SUCCESS [  0.682 s]
[INFO] Zeppelin: File System Interpreters ................. SUCCESS [  0.650 s]
[INFO] Zeppelin: Flink .................................... SUCCESS [  4.925 s]
[INFO] Zeppelin: Apache Ignite interpreter ................ SUCCESS [ 21.882 s]
[INFO] Zeppelin: Kylin interpreter ........................ SUCCESS [  0.298 s]
[INFO] Zeppelin: Python interpreter ....................... SUCCESS [01:29 min]
[INFO] Zeppelin: Lens interpreter ......................... SUCCESS [  1.920 s]
[INFO] Zeppelin: Apache Cassandra interpreter ............. SUCCESS [ 35.499 s]
[INFO] Zeppelin: Elasticsearch interpreter ................ SUCCESS [  5.039 s]
[INFO] Zeppelin: BigQuery interpreter ..................... SUCCESS [  2.585 s]
[INFO] Zeppelin: Alluxio interpreter ...................... SUCCESS [  1.680 s]
[INFO] Zeppelin: Scio ..................................... SUCCESS [ 29.029 s]
[INFO] Zeppelin: web Application .......................... SUCCESS [07:18 min]
[INFO] Zeppelin: Server ................................... SUCCESS [ 46.044 s]
[INFO] Zeppelin: Packaging distribution ................... SUCCESS [ 52.455 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 16:00 min
[INFO] Finished at: 2018-03-25T12:10:07+08:00
[INFO] Final Memory: 332M/5705M
[INFO] ------------------------------------------------------------------------

具体参数请参考:http://zeppelin.apache.org/docs/0.7.3/install/build.html

4.部署Zeppelin

tar -zxvf ~/zeppelin-0.7.3/zeppelin-distribution/target/zeppelin-0.7.3.tar.gz -C /opt/bigdata/
ln -s /opt/bigdata/zeppelin-0.7.3 /opt/bigdata/zeppelin
bin/zeppelin-daemon.sh start

访问地址:http://localhost:8080,也可修改zeppelin-site.xml默认端口

5.配置访问hive仓库

通过jdbc方式访问hive,首先确保启动hiveserver2.
1.拷贝hive相关驱动到zeppelin

cp ~/hive/lib/hive-exec-1.1.0-cdh5.7.6.jar   /opt/zepplin/interpreter/jdbc/
cp ~/hive/lib/hive-jdbc-1.1.0-cdh5.7.6.jar   /opt/zepplin/interpreter/jdbc/
cp ~/hive/lib/hive-metastore-1.1.0-cdh5.7.6.jar   /opt/zepplin/interpreter/jdbc/
cp ~/hive/lib/hive-serde-1.1.0-cdh5.7.6.jar   /opt/zepplin/interpreter/jdbc/
cp ~/hive/lib/hive-service-1.1.0-cdh5.7.6.jar  /opt/zepplin/interpreter/jdbc/
cp ~/hadoop/lib/hadoop-common-2.6.0-cdh5.7.6.jar /opt/zepplin/interpreter/jdbc/

2.重启zepplin

bin/zeppelin-daemon.sh restart

3.修改页面Interpreters jdbc配置
Apache Zeppelin on CDH搭建_第1张图片
点击restart按钮
4.查询验证
Apache Zeppelin on CDH搭建_第2张图片

6.配置集成spark on yarn

zepplin目前支持,local、yarn-client、standalone和mesos等模式,默认为local模式
1.修改zeppelin-env.sh

cp zeppelin-env.sh.template zeppelin-env.sh
export MASTER=yarn-client
export HADOOP_CONF_DIR=[your_hadoop_conf_path]
export SPARK_HOME=[your_spark_home_path]
export SPARK_SUBMIT_OPTIONS="--conf spark.dynamicAllocation.minExecutors=10 --executor-memory 2G --driver-memory 2g  --executor-cores 2"

*重启zepplin,打开UI.设置spark参数,restart
Apache Zeppelin on CDH搭建_第3张图片
验证spark,出错
Apache Zeppelin on CDH搭建_第4张图片
发现是zepplin/lib下面的有hadoop的commonjar包里面没有这个方法,本身已经配置CDH环境了,所以删除所有hadoopjar(很奇怪,之前编译的是CDH版本的)
这里写图片描述

查看hdfs文件,报错
Apache Zeppelin on CDH搭建_第5张图片
jackson版本冲突导致,zepplin中的版本为2.5.3,spark1.6使用的为2.4,更换zepplin/lib下jackson-databind的版本重启即可。

wget http://central.maven.org/maven2/com/fasterxml/jackson/core/jackson-databind/2.4.4/jackson-databind-2.4.4.jar 

7.用户权限配置

zeppelin主要利用Apache Shiro做用户权限管理。
1.关闭匿名访问,拷贝zeppelin-site.xml,设置zeppelin.anonymous.allowed=false

conf/zeppelin-site.xml.template to conf/zeppelin-site.xml

2.启用shiro,拷贝shiro文件

cp conf/shiro.ini.template conf/shiro.ini

shiro提供了基于users/roles/urls的权限控制,也有提供基于目录服务做用户权限,本问主要介绍基于用户角色权限的方式。

[users]
admin = admin, admin
zhangsan=123456,readonly
[roles]
readonly= *
admin = *
[urls]
/api/interpreter/** = authc, roles[admin]
/api/configurations/** = authc, roles[admin]
/api/credential/** = authc, roles[admin]
/** = authc

3.设置zeppelin已当前登录用户访问hive,不设置的话是已启动zeppelin进程的用户访问。
参考官方文档:https://zeppelin.apache.org/docs/0.7.3/manual/userimpersonation.html
有两种方式设置:

  1. 给每个用户做免密登录
  2. 设置ZEPPELIN_IMPERSONATE_CMD,这里采用ZEPPELIN_IMPERSONATE_CMD,vim zeppelin-env.sh
    “`
    export ZEPPELIN_IMPERSONATE_CMD=’sudo -H -u ${ZEPPELIN_IMPERSONATE_USER} bash -c ’
    export ZEPPELIN_IMPERSONATE_SPARK_PROXY_USER=false

“`
3. 重启zeppelin server.
4. 管理员账号登录UI,设置spark interceptor的Impersonate,参考sh interceptor设置
Apache Zeppelin on CDH搭建_第6张图片
Apache Zeppelin on CDH搭建_第7张图片
5. 创建hive表验证,查看hdfs目录用户权限

4设置node book权限,每个用户可以设置自己notebook权限
Apache Zeppelin on CDH搭建_第8张图片
文本框带suggest功能,输入用户简称可以自动提示

最后,有一篇zeppelin的中文翻译,虽然翻译的不怎么样,推荐给需要的朋友
http://cwiki.apachecn.org/pages/viewpage.action?pageId=10030467

7.使用期间出现的bug有:

1.执行任务无法显示任务执行进度。
WARN [2018-04-24 16:04:56,238] ({qtp745160567-15917} ServletHandler.java[doHandle]:620) -
javax.servlet.ServletException: Filtered request failed.
at org.apache.shiro.web.servlet.AbstractShiroFilter.doFilterInternal(AbstractShiroFilter.java:384)
at org.apache.shiro.web.servlet.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:125)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at org.apache.zeppelin.server.CorsFilter.doFilter(CorsFilter.java:72)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:499)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.AbstractMethodError: javax.ws.rs.core.Response.getStatusInfo()Ljavax/ws/rs/core/Response$StatusType;
at javax.ws.rs.WebApplicationException.validate(WebApplicationException.java:186)
at javax.ws.rs.ClientErrorException.(ClientErrorException.java:88)
at org.apache.cxf.jaxrs.utils.JAXRSUtils.findTargetMethod(JAXRSUtils.java:503)
at org.apache.cxf.jaxrs.interceptor.JAXRSInInterceptor.processRequest(JAXRSInInterceptor.java:198)
at org.apache.cxf.jaxrs.interceptor.JAXRSInInterceptor.handleMessage(JAXRSInInterceptor.java:90)
at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:272)
at org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)
at org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:239)
at org.apache.cxf.transport.servlet.ServletController.invokeDestination(ServletController.java:248)
at org.apache.cxf.transport.servlet.ServletController.invoke(ServletController.java:222)
at org.apache.cxf.transport.servlet.ServletController.invoke(ServletController.java:153)
at org.apache.cxf.transport.servlet.CXFNonSpringServlet.invoke(CXFNonSpringServlet.java:167)
at org.apache.cxf.transport.servlet.AbstractHTTPServlet.handleRequest(AbstractHTTPServlet.java:286)
at org.apache.cxf.transport.servlet.AbstractHTTPServlet.doGet(AbstractHTTPServlet.java:211)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
at org.apache.cxf.transport.servlet.AbstractHTTPServlet.service(AbstractHTTPServlet.java:262)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:812)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1669)

解决方案:
javax.ws.rs.core.Response.getStatusInfo(),是JAX-RS 2里面的功能,cxf使用是JAX-RX 1,替换cxf相关依赖。
zeppelin 0.8.0好像已经解决。https://issues.apache.org/jira/browse/ZEPPELIN-903

你可能感兴趣的:(可视化)