Hue安装部署

Hue简介

1.1 Hue介绍

Hue是一个开源的Apache Hadoop UI系统,最早是由Cloudera Desktop演化而来,由Cloudera贡献给开源社区,它是基于Python Web框架Django实现的。通过使用Hue我们可以在浏览器端的Web控制台上与Hadoop集群进行交互来分析处理数据。

1.2 功能和特性集合

Ø 默认基于轻量级sqlite数据库管理会话数据,用户认证和授权,可以自定义为MySQL、Postgresql,以及Oracle

Ø 基于文件浏览器(File Browser)访问HDFS

Ø 基于Hive编辑器来开发和运行Hive查询

Ø 支持基于Solr进行搜索的应用,并提供可视化的数据视图,以及仪表板(Dashboard)

Ø 支持基于Impala的应用进行交互式查询

Ø 支持Spark编辑器和仪表板(Dashboard)

Ø 支持Pig编辑器,并能够提交脚本任务

Ø 支持Oozie编辑器,可以通过仪表板提交和监控Workflow、Coordinator和Bundle

Ø 支持HBase浏览器,能够可视化数据、查询数据、修改HBase表

Ø 支持Metastore浏览器,可以访问Hive的元数据,以及HCatalog

Ø 支持Job浏览器,能够访问MapReduce Job(MR1/MR2-YARN)

Ø 支持Job设计器,能够创建MapReduce/Streaming/Java Job

Ø 支持Sqoop 2编辑器和仪表板(Dashboard)

Ø 支持ZooKeeper浏览器和编辑器

Ø 支持MySql、PostGresql、Sqlite和Oracle数据库查询编辑器

1.3 Hadoop集群服务分布图

1.3.1 ZOOKEEPER&HDFS

服务器

ZOOKEEPER

HDFS

IP地址

主机名

zookeeper

journalnode

namenode

zkfc

datanode

192.168.1.201

hadoop001

 

192.168.1.202

hadoop002

 

 

192.168.1.203

hadoop003

 

192.168.1.204

hadoop004

 

 

 

 

192.168.1.205

hadoop005

 

 

 

 

192.168.1.206

hadoop006

 

 

 

 

192.168.1.207

Hue

 

 

 

 

 

 

1.3.2 YARN

服务器

YARN

IP地址

主机名

resourcemanager

nodemanager

timeline

JobHistory

192.168.1.201

hadoop001

 

 √

192.168.1.202

hadoop002

  

 

 

192.168.1.203

hadoop003

  

 

 

192.168.1.204

hadoop004

 

 

 

192.168.1.205

hadoop005

 

 

 

192.168.1.206

hadoop006

 

 

 

192.168.1.207

hue

 

  

 

 

 

1.3.3 HIVE&MYSQL

服务器

HIVE

MYSQL

IP地址

主机名

metastore

hiveserver2

mariadb

192.168.1.201

hadoop001

√(主)

192.168.1.202

hadoop002

  

  

  

192.168.1.203

hadoop003

 

 

 √(备)

192.168.1.204

hadoop004

 

 

 

192.168.1.205

hadoop005

 

 

 

192.168.1.206

hadoop006

 

 

 

192.168.1.207

hue

 

 

 

   

1.3.4 HBASE

服务器

HBASE

IP地址

主机名

master

regionserver

192.168.1.201

hadoop001

 

192.168.1.202

hadoop002

 

192.168.1.203

hadoop003

 

192.168.1.204

hadoop004

 

192.168.1.205

hadoop005

 

192.168.1.206

hadoop006

 

192.168.1.207

hue

 

 

 

1.4 Hue部署情况

192.168.1.207

组件

完成情况

HIVE

  √

HBASE

SPARKSQL

  √

HDFS

ZOOKEEPER

  √

DBQuery

  

Impala

  

HDFS

  

Job Browser

  √

Ooize

  

Pig

  

Notebook

  √

 

环境及依赖

2.1 操作系统

Description: CentOS Linux release 7.2.1511 (Core)

Release: 7.2.1511

2.2 配置局域网YUM源

根据实际需要配置局域网YUM源。

【参考配置】如下

===============================================================================

在/etc/yum.repos.d目录下创建local-yum.repo文件

[root@hue~]# mkdir /etc/yum.repos.d/local-yum.repo

该文件的内容如下:

[local_yum]

name=local_yum

baseurl=ftp://192.168.8.211/vg0_lv1/Packages/

enabled=1

gpgcheck=0

##enabled=1  查看是否已经创建成功,yum list显示存在lan的包则成功

## /etc/yum.repos.d/目录下可能会有系统自带的yum文件,需要先删掉才能识别自己创建的yum文件,记得删除前需备份。

 

接下来删除/etc/yum.repos.d目录下除local-yum.repo外的所有文件

[root@hue yum.repos.d]# ll

total 28

-rw-r--r--  1 root root   71 Aug 17 14:32 centos_211.repo

-rw-r--r--. 1 root root 1991 Oct 23  2014 CentOS-Base.repo

-rw-r--r--. 1 root root  647 Oct 23  2014 CentOS-Debuginfo.repo

-rw-r--r--. 1 root root  289 Oct 23  2014 CentOS-fasttrack.repo

-rw-r--r--. 1 root root  630 Oct 23  2014 CentOS-Media.repo

-rw-r--r--. 1 root root 5394 Oct 23  2014 CentOS-Vault.repo

[root@hue yum.repos.d]# rm -rf CentOS-*

[root@hue yum.repos.d]# ll

 

配完local-yum.repo文件后执行下面语句查看yum源是否配置成。

yum clean all

yum makecache

yum list |wc -l   ##执行后显示yum数目,这里显示的就是6353

6353           ##不同的yum配置 数目会有所不一样

2.3 防火墙

关闭防火墙

[root@hue~]# systemctl stop firewalld

[root@hue~]# systemctl disable firewalld

2.4 官方依赖

centos:

Ÿ Oracle's JDK

Ÿ ant

Ÿ asciidoc

Ÿ cyrus-sasl-devel

Ÿ cyrus-sasl-gssapi

Ÿ cyrus-sasl-plain

Ÿ gcc

Ÿ gcc-c++

Ÿ krb5-devel

Ÿ libffi-devel

Ÿ libtidy (for unit tests only)

Ÿ libxml2-devel

Ÿ libxslt-devel

Ÿ make

Ÿ mvn (from apache-maven package or maven3 tarball)

Ÿ mysql

Ÿ mysql-devel

Ÿ openldap-devel

Ÿ python-devel

Ÿ sqlite-devel

Ÿ openssl-devel (for version 7+)

Ÿ gmp-devel

2.5 添加用户

Ø 添加用户:

[root@hue~]#useradd hadoop

Ø 设置密码:

[root@hue~]#passwd hadoop

并输入两次以设置密码

安装部署

3.1 集群

Ø hadoop001~006上安装hadoop集群。参考《省份大数据平台部署运维规范(贵州模版)》。

3.2 beh软件包

Ø 将hadoop集群任意节点中的beh软件包scp到hue主机上预留一份

3.3 安装依赖

Ø 在hue主机上安装依赖

[root@hue~]# yum -y install asciidoc cyrus-sasl-devel cyrus-sasl-gssapi cyrus-sasl-plain gcc gcc-c++  krb5-devel libffi-devel libtidy libxml2-devel libxslt-devel make mysql mysql-devel openldap-devel python-devel sqlite-devel openssl-devel gmp-devel

特别检查asciidoc、libffi-devel是否正确安装,若无yum源,请自行下载安装。

3.3.1 JDK

[root@hue~]# vi /etc/profile

export JAVA_HOME=/opt/beh/core/jdk

export PATH=$JAVA_HOME/bin:$PATH

export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

[root@hue~]# source /etc/profile

[root@hue~]# java –version

3.3.2 ant

Ø 下载解压

[root@hue~]# wget http://mirror.bit.edu.cn/apache/ant/binaries/apache-ant-1.9.7-bin.tar.gz

[root@hue~]# tar -zxvf apache-ant-1.9.7-bin.tar.gz –C /usr/local

Ø 切到local目录

[root@hue local]# mv apache-ant-1.9.7 ant

[root@hue local]# vi /etc/profile

export ANT_HOME=/usr/local/ant

export PATH=$ANT_HOME/bin:$PATH

[root@hue local]# source /etc/profile

[root@hue local]# ant –v

3.3.3 mvn

Ø 下载解压

[root@hue~]# wget http://mirror.bit.edu.cn/apache/maven/maven-3/3.3.9/binaries/apache-maven-3.3.9-bin.tar.gz

[root@hue~]# tar -zxvf apache-maven-3.3.9-bin.tar.gz –C /usr/local

Ø 切到local目录

[root@hue local]# mv apache-maven-3.3.9 maven

[root@hue local]# vi /etc/profile

export M2_HOME=/usr/local/maven

export PATH=$M2_HOME/bin:$PATH

[root@hue local]# source /etc/profile

[root@hue local]# mvn –v

3.3.4 更新pytz

[root@hue~]# easy_install --upgrade pytz

3.3.5 更新cffi

[root@hue~]# easy_install --upgrade cffi

3.4 编译安装

3.4.1 安装

Ø 下载并上传

http://gethue.com/category/release/ download最新版本hue,当前最新hue-3-10

Ø 解压到hue家目录中并重命名文件夹

[root@hue~]# tar -zxvf hue-3.10.0.tgz –C /home/hue/

[root@hue~]# cd /home/hue

[root@hue hue]# mv hue-3.10.0 hue

3.4.2 编译

[root@hue hue]# cd hue

[root@hue hue]# make apps

3.4.3 测试

[root@hue hue]# build/env/bin/hue runserver

可在http://localhost:8000 web页面看到hue的登录界面

 

3.4.4 常见问题

编译出错,缺少依赖,安装pip,缺什么依赖通过pip install xxxx进行安装,然后执行make clean后再次编译。

time out,与目标服务器的连接不稳定,执行make clean后再次编译。

3.4.5 权限

[root@hue hue]# chown –R hadoop:hadoop hue/

配置

4.1 配置各组件及所需服务开启

默认配置文件:$HUE_HOME/desktop/conf/hue.ini

bin目录:$HUE_HOME/build/env/bin/

4.1.1 webserver listens

  http_host=192.168.1.207

  http_port=8000

4.1.2 修改hadoop用户权限

  # Webserver runs as this user

   server_user=hadoop

   server_group=hadoop

 

  # This should be the Hue admin and proxy user

   default_user=hue

 

  # This should be the hadoop cluster admin

   default_hdfs_superuser=hadoop

4.1.3 使用mysql数据库

4.1.3.1 修改配置文件

# Note for MariaDB use the 'mysql' engine.

     engine=mysql

     host=hadoop001

     port=3306

     user=hue

     password=hue

    # Execute this script to produce the database password. This will be used when `password` is not set.

    ## password_script=/path/script

     name=hue

4.1.3.2 创建hue相关mysql数据

[hadoop@hadoop001 ~]$ mysql –u root –p

Ø 执行创建数据库、创建用户并授权操作

create database hue;

grant all on hue.* to 'hue'@'%' identified by 'hue';

grant all on hue.* to 'hue'@'localhost' identified by 'hue';

grant all on hue.* to 'hue'@'hadoop001' identified by 'hue';

4.1.3.3 同步初始化数据

[hadoop@hue home]$ hue/hue/build/env/bin/hue syncdb

[hadoop@hue home]$ hue/hue/build/env/bin/hue migrate

 

4.1.4 HDFS

配置HDFS用于支持其他功能的使用。

需要将hdfs-site.xml中webhdfs权限开启,并重启服务

   dfs.webhdfs.enabled

   true

然后修改hue的配置文件

[[[default]]]

      # Enter the filesystem uri

      fs_defaultfs=hdfs://beh

 

      # NameNode logical name.

      ## logical_name=

 

      # Use WebHdfs/HttpFs as the communication mechanism.

      # Domain should be the NameNode or HttpFs host.

      # Default port is 14000 for HttpFs.

       webhdfs_url=http://192.168.1.201:50070/webhdfs/v1

 

      # Change this if your HDFS cluster is Kerberos-secured

       security_enabled=true

hadoop_conf_dir=$HADOOP_CONF_DIR when set or '/opt/beh/core/hadoop/etc/hadoop/conf'

4.1.5 YARN

[[yarn_clusters]]

 

      # Enter the host on which you are running the ResourceManager

       resourcemanager_host=192.168.1.201

 

      # The port where the ResourceManager IPC listens on

       resourcemanager_port=23140

 

      # Whether to submit jobs to this cluster

      submit_to=True

 

      # Resource Manager logical name (required for HA)

      ## logical_name=

 

      # Change this if your YARN cluster is Kerberos-secured

      ## security_enabled=false

 

      # URL of the ResourceManager API

      resourcemanager_api_url=http://hadoop001:23188

 

      # URL of the ProxyServer API

      proxy_api_url=http://hadoop001:8088

 

      # URL of the HistoryServer API

       history_server_api_url=http://hadoop001:19888

 

      # URL of the Spark History Server

      spark_history_server_url=http://hadoop001:18088

 

      # In secure mode (HTTPS), if SSL certificates from YARN Rest APIs

      # have to be verified against certificate authority

      ## ssl_cert_ca_verify=True

[[yarn_clusters]]

 

      # Enter the host on which you are running the ResourceManager

       resourcemanager_host=192.168.1.201

 

      # The port where the ResourceManager IPC listens on

       resourcemanager_port=23140

 

      # Whether to submit jobs to this cluster

      submit_to=True

 

      # Resource Manager logical name (required for HA)

      ## logical_name=

 

      # Change this if your YARN cluster is Kerberos-secured

      ## security_enabled=false

 

      # URL of the ResourceManager API

      resourcemanager_api_url=http://hadoop001:23188

 

      # URL of the ProxyServer API

      proxy_api_url=http://hadoop001:8088

 

      # URL of the HistoryServer API

       history_server_api_url=http://hadoop001:19888

 

      # URL of the Spark History Server

      spark_history_server_url=http://hadoop001:18088

 

      # In secure mode (HTTPS), if SSL certificates from YARN Rest APIs

      # have to be verified against certificate authority

      ## ssl_cert_ca_verify=True

4.1.6 HA

# HA support by specifying multiple clusters.

    # Redefine different properties there.

    # e.g.

 

     [[[ha]]]

      # Resource Manager logical name (required for HA)

      logical_name=beh

 

      # Un-comment to enable

      submit_to=True

 

      # URL of the ResourceManager API

      resourcemanager_api_url=http://hadoop003:23188

 

4.1.7 Hive

Hive需要hadoop001开启thriftserver2服务

[hadoop@hadoop001 ~]$ hive_start hiveserver2 &

Ø 然后修改hue配置文件

[beeswax]

 

  # Host where HiveServer2 is running.

  # If Kerberos security is enabled, use fully-qualified domain name (FQDN).

  hive_server_host=192.168.1.201

 

  # Port where HiveServer2 Thrift server runs on.

  hive_server_port=10000

 

  # Hive configuration directory, where hive-site.xml is located

   hive_conf_dir=/opt/beh/core/hive/conf

4.1.8 HBase

HBase 需要hadoop001开启ThriftServer服务

[hadoop@hadoop001 ~]$ hbase-daemon.sh start thrift

Ø 然后修改hue配置文件

[hbase]

  # Comma-separated list of HBase Thrift servers for clusters in the format of '(name|host:port)'.

  # Use full hostname with security.

  # If using Kerberos we assume GSSAPI SASL, not PLAIN.

   hbase_clusters=(Cluster|192.168.1.201:9090)

  #hbase_clusters=(hadoop|192.168.1.201:9090)

  # HBase configuration directory, where hbase-site.xml is located.

   hbase_conf_dir=/opt/beh/core/hbase/conf

4.1.9 ZooKeeper

只需要hadoop集群开启zookeeper服务即可

Ø 修改hue配置文件

[zookeeper]

 

  [[clusters]]

 

    [[[default]]]

      # Zookeeper ensemble. Comma separated list of Host/Port.

      # e.g. localhost:2181,localhost:2182,localhost:2183

      host_ports=192.168.1.201:2181,192.168.1.202:2181,192.168.1.203:2181

 

      # The URL of the REST contrib service (required for znode browsing).

       rest_url=http://192.168.1.201:9998

 

      # Name of Kerberos principal when using security.

## principal_name=zookeeper

4.1.10 Spark

Spark需要livy-server,spark-Jobserver,spark-thrift-server

Ø hue上安装启动Livv-server

[hadoop@hue ~]# git clone http://archive.cloudera.com/beta/livy/livy-server-0.2.0.zip

[hadoop@hue ~]# sudo zip livy-server-0.2.0.zip

[hadoop@hue ~]# cd livy-server-0.2.0

[hadoop@hue ~]# ./bin/livy-server

Ø hadoop001上启动spark-thrift-server

[hadoop@hadoop001 ~]# cd /opt/beh/core/spark/sbin

[hadoop@hadoop001 sbin]# ./start-thriftserver.sh  --hiveconf hive.server2.thrift.port=10001

Ø hadoop001上启动spark-jobserver

[hadoop@hadoop001 ~]# rpm -ivh https://dl.bintray.com/sbt/rpm/sbt-0.13.6.rpm

[hadoop@hadoop001 ~]# git clone https://github.com/ooyala/spark-jobserver.git

[hadoop@hadoop001 ~]# cd spark-jobserver

[hadoop@hadoop001 ~]# sbt

> re-start

Ø 修改hue配置文件

spark]

  # Host address of the Livy Server.

  livy_server_host=hue

 

  # Port of the Livy Server.

  livy_server_port=8998

 

  # Configure livy to start in local 'process' mode, or 'yarn' workers.

   livy_server_session_kind=process

 

  # If livy should use proxy users when submitting a job.

  ## livy_impersonation_enabled=true

 

  # Host of the Sql Server

  sql_server_host=192.168.1.201

 

  # Port of the Sql Server

   sql_server_port=10001

4.1.11 File Browser

需要将hdfs-site.xml中用户认证关闭,并重启服务。否则报类似cannot connect的错误。

   dfs.permissions.enabled

   false

需要在core-site.xml中添加如下代理用户配置,并重启服务。否则可能报类似cannot impersonate的错误。

  hadoop.proxyuser.hue.hosts

  *

  hadoop.proxyuser.hue.groups

  *

转载于:https://my.oschina.net/u/2547078/blog/810319

你可能感兴趣的:(数据库,git,java)