Hue安装部署

1 Hue简介

1.1 Hue介绍

Hue是一个开源的Apache Hadoop UI系统，最早是由Cloudera Desktop演化而来，由Cloudera贡献给开源社区，它是基于Python Web框架Django实现的。通过使用Hue我们可以在浏览器端的Web控制台上与Hadoop集群进行交互来分析处理数据。

1.2 功能和特性集合

Ø 默认基于轻量级sqlite数据库管理会话数据，用户认证和授权，可以自定义为MySQL、Postgresql，以及Oracle

Ø 基于文件浏览器（File Browser）访问HDFS

Ø 基于Hive编辑器来开发和运行Hive查询

Ø 支持基于Solr进行搜索的应用，并提供可视化的数据视图，以及仪表板（Dashboard）

Ø 支持基于Impala的应用进行交互式查询

Ø 支持Spark编辑器和仪表板（Dashboard）

Ø 支持Pig编辑器，并能够提交脚本任务

Ø 支持Oozie编辑器，可以通过仪表板提交和监控Workflow、Coordinator和Bundle

Ø 支持HBase浏览器，能够可视化数据、查询数据、修改HBase表

Ø 支持Metastore浏览器，可以访问Hive的元数据，以及HCatalog

Ø 支持Job浏览器，能够访问MapReduce Job（MR1/MR2-YARN）

Ø 支持Job设计器，能够创建MapReduce/Streaming/Java Job

Ø 支持Sqoop 2编辑器和仪表板（Dashboard）

Ø 支持ZooKeeper浏览器和编辑器

Ø 支持MySql、PostGresql、Sqlite和Oracle数据库查询编辑器

1.3 Hadoop集群服务分布图

1.3.1 ZOOKEEPER&HDFS

服务器	ZOOKEEPER	HDFS
IP地址	主机名	zookeeper	journalnode	namenode	zkfc	datanode
192.168.1.201	hadoop001	√	√	√	√
192.168.1.202	hadoop002	√	√			√
192.168.1.203	hadoop003	√	√	√	√
192.168.1.204	hadoop004					√
192.168.1.205	hadoop005					√
192.168.1.206	hadoop006					√
192.168.1.207	Hue

1.3.2 YARN

服务器	YARN
IP地址	主机名	resourcemanager	nodemanager	timeline	JobHistory
192.168.1.201	hadoop001	√		√	√
192.168.1.202	hadoop002		√
192.168.1.203	hadoop003	√
192.168.1.204	hadoop004		√
192.168.1.205	hadoop005		√
192.168.1.206	hadoop006		√
192.168.1.207	hue

1.3.3 HIVE&MYSQL

服务器	HIVE	MYSQL
IP地址	主机名	metastore	hiveserver2	mariadb
192.168.1.201	hadoop001	√	√	√(主)
192.168.1.202	hadoop002
192.168.1.203	hadoop003			√(备)
192.168.1.204	hadoop004
192.168.1.205	hadoop005
192.168.1.206	hadoop006
192.168.1.207	hue

1.3.4 HBASE

服务器	HBASE
IP地址	主机名	master	regionserver
192.168.1.201	hadoop001	√
192.168.1.202	hadoop002		√
192.168.1.203	hadoop003	√
192.168.1.204	hadoop004		√
192.168.1.205	hadoop005		√
192.168.1.206	hadoop006		√
192.168.1.207	hue

1.4 Hue部署情况

192.168.1.207
组件	完成情况
HIVE	√
HBASE	√
SPARKSQL	√
HDFS	√
ZOOKEEPER	√
DBQuery
Impala
HDFS
Job Browser	√
Ooize
Pig
Notebook	√

2 环境及依赖

2.1 操作系统

Description: CentOS Linux release 7.2.1511 (Core)

Release: 7.2.1511

2.2 配置局域网YUM源

根据实际需要配置局域网YUM源。

【参考配置】如下

===============================================================================

在/etc/yum.repos.d目录下创建local-yum.repo文件

[root@hue~]# mkdir /etc/yum.repos.d/local-yum.repo

该文件的内容如下：

[local_yum]

name=local_yum

baseurl=ftp://192.168.8.211/vg0_lv1/Packages/

enabled=1

gpgcheck=0

##enabled=1 查看是否已经创建成功,yum list显示存在lan的包则成功

## /etc/yum.repos.d/目录下可能会有系统自带的yum文件，需要先删掉才能识别自己创建的yum文件，记得删除前需备份。

接下来删除/etc/yum.repos.d目录下除local-yum.repo外的所有文件

[root@hue yum.repos.d]# ll

total 28

-rw-r--r-- 1 root root 71 Aug 17 14:32 centos_211.repo

-rw-r--r--. 1 root root 1991 Oct 23 2014 CentOS-Base.repo

-rw-r--r--. 1 root root 647 Oct 23 2014 CentOS-Debuginfo.repo

-rw-r--r--. 1 root root 289 Oct 23 2014 CentOS-fasttrack.repo

-rw-r--r--. 1 root root 630 Oct 23 2014 CentOS-Media.repo

-rw-r--r--. 1 root root 5394 Oct 23 2014 CentOS-Vault.repo

[root@hue yum.repos.d]# rm -rf CentOS-*

[root@hue yum.repos.d]# ll

配完local-yum.repo文件后执行下面语句查看yum源是否配置成。

yum clean all

yum makecache

yum list |wc -l ##执行后显示yum数目，这里显示的就是6353

6353 ##不同的yum配置数目会有所不一样

2.3 防火墙

关闭防火墙

[root@hue~]# systemctl stop firewalld

[root@hue~]# systemctl disable firewalld

2.4 官方依赖

centos：

Ÿ Oracle's JDK

Ÿ ant

Ÿ asciidoc

Ÿ cyrus-sasl-devel

Ÿ cyrus-sasl-gssapi

Ÿ cyrus-sasl-plain

Ÿ gcc

Ÿ gcc-c++

Ÿ krb5-devel

Ÿ libffi-devel

Ÿ libtidy (for unit tests only)

Ÿ libxml2-devel

Ÿ libxslt-devel

Ÿ make

Ÿ mvn (from apache-maven package or maven3 tarball)

Ÿ mysql

Ÿ mysql-devel

Ÿ openldap-devel

Ÿ python-devel

Ÿ sqlite-devel

Ÿ openssl-devel (for version 7+)

Ÿ gmp-devel

2.5 添加用户

Ø 添加用户：

[root@hue~]#useradd hadoop

Ø 设置密码：

[root@hue~]#passwd hadoop

并输入两次以设置密码

3 安装部署

3.1 集群

Ø hadoop001~006上安装hadoop集群。参考《省份大数据平台部署运维规范(贵州模版)》。

3.2 beh软件包

Ø 将hadoop集群任意节点中的beh软件包scp到hue主机上预留一份

3.3 安装依赖

Ø 在hue主机上安装依赖

[root@hue~]# yum -y install asciidoc cyrus-sasl-devel cyrus-sasl-gssapi cyrus-sasl-plain gcc gcc-c++ krb5-devel libffi-devel libtidy libxml2-devel libxslt-devel make mysql mysql-devel openldap-devel python-devel sqlite-devel openssl-devel gmp-devel

特别检查asciidoc、libffi-devel是否正确安装，若无yum源，请自行下载安装。

3.3.1 JDK

[root@hue~]# vi /etc/profile

export JAVA_HOME=/opt/beh/core/jdk

export PATH=$JAVA_HOME/bin:$PATH

export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

[root@hue~]# source /etc/profile

[root@hue~]# java –version

3.3.2 ant

Ø 下载解压

[root@hue~]# wget http://mirror.bit.edu.cn/apache/ant/binaries/apache-ant-1.9.7-bin.tar.gz

[root@hue~]# tar -zxvf apache-ant-1.9.7-bin.tar.gz –C /usr/local

Ø 切到local目录

[root@hue local]# mv apache-ant-1.9.7 ant

[root@hue local]# vi /etc/profile

export ANT_HOME=/usr/local/ant

export PATH=$ANT_HOME/bin:$PATH

[root@hue local]# source /etc/profile

[root@hue local]# ant –v

3.3.3 mvn

Ø 下载解压

[root@hue~]# wget http://mirror.bit.edu.cn/apache/maven/maven-3/3.3.9/binaries/apache-maven-3.3.9-bin.tar.gz

[root@hue~]# tar -zxvf apache-maven-3.3.9-bin.tar.gz –C /usr/local

Ø 切到local目录

[root@hue local]# mv apache-maven-3.3.9 maven

[root@hue local]# vi /etc/profile

export M2_HOME=/usr/local/maven

export PATH=$M2_HOME/bin:$PATH

[root@hue local]# source /etc/profile

[root@hue local]# mvn –v

3.3.4 更新pytz

[root@hue~]# easy_install --upgrade pytz

3.3.5 更新cffi

[root@hue~]# easy_install --upgrade cffi

3.4 编译安装

3.4.1 安装

Ø 下载并上传

http://gethue.com/category/release/ download最新版本hue，当前最新hue-3-10

Ø 解压到hue家目录中并重命名文件夹

[root@hue~]# tar -zxvf hue-3.10.0.tgz –C /home/hue/

[root@hue~]# cd /home/hue

[root@hue hue]# mv hue-3.10.0 hue

3.4.2 编译

[root@hue hue]# cd hue

[root@hue hue]# make apps

3.4.3 测试

[root@hue hue]# build/env/bin/hue runserver

可在http://localhost:8000 web页面看到hue的登录界面

3.4.4 常见问题

编译出错，缺少依赖，安装pip，缺什么依赖通过pip install xxxx进行安装，然后执行make clean后再次编译。

time out，与目标服务器的连接不稳定，执行make clean后再次编译。

3.4.5 权限

[root@hue hue]# chown –R hadoop:hadoop hue/

4 配置

4.1 配置各组件及所需服务开启

默认配置文件:$HUE_HOME/desktop/conf/hue.ini

bin目录:$HUE_HOME/build/env/bin/

4.1.1 webserver listens

http_host=192.168.1.207

http_port=8000

4.1.2 修改hadoop用户权限

# Webserver runs as this user

server_user=hadoop

server_group=hadoop

# This should be the Hue admin and proxy user

default_user=hue

# This should be the hadoop cluster admin

default_hdfs_superuser=hadoop

4.1.3 使用mysql数据库

4.1.3.1 修改配置文件

# Note for MariaDB use the 'mysql' engine.

engine=mysql

host=hadoop001

port=3306

user=hue

password=hue

# Execute this script to produce the database password. This will be used when `password` is not set.

## password_script=/path/script

name=hue

4.1.3.2 创建hue相关mysql数据

[hadoop@hadoop001 ~]$ mysql –u root –p

Ø 执行创建数据库、创建用户并授权操作

create database hue;

grant all on hue.* to 'hue'@'%' identified by 'hue';

grant all on hue.* to 'hue'@'localhost' identified by 'hue';

grant all on hue.* to 'hue'@'hadoop001' identified by 'hue';

4.1.3.3 同步初始化数据

[hadoop@hue home]$ hue/hue/build/env/bin/hue syncdb

[hadoop@hue home]$ hue/hue/build/env/bin/hue migrate

4.1.4 HDFS

配置HDFS用于支持其他功能的使用。

需要将hdfs-site.xml中webhdfs权限开启，并重启服务

dfs.webhdfs.enabled

true

然后修改hue的配置文件

[[[default]]]

# Enter the filesystem uri

fs_defaultfs=hdfs://beh

# NameNode logical name.

## logical_name=

# Use WebHdfs/HttpFs as the communication mechanism.

# Domain should be the NameNode or HttpFs host.

# Default port is 14000 for HttpFs.

webhdfs_url=http://192.168.1.201:50070/webhdfs/v1

# Change this if your HDFS cluster is Kerberos-secured

security_enabled=true

hadoop_conf_dir=$HADOOP_CONF_DIR when set or '/opt/beh/core/hadoop/etc/hadoop/conf'

4.1.5 YARN

[[yarn_clusters]]

# Enter the host on which you are running the ResourceManager

resourcemanager_host=192.168.1.201

# The port where the ResourceManager IPC listens on

resourcemanager_port=23140

# Whether to submit jobs to this cluster

submit_to=True

# Resource Manager logical name (required for HA)

## logical_name=

# Change this if your YARN cluster is Kerberos-secured

## security_enabled=false

# URL of the ResourceManager API

resourcemanager_api_url=http://hadoop001:23188

# URL of the ProxyServer API

proxy_api_url=http://hadoop001:8088

# URL of the HistoryServer API

history_server_api_url=http://hadoop001:19888

# URL of the Spark History Server

spark_history_server_url=http://hadoop001:18088

# In secure mode (HTTPS), if SSL certificates from YARN Rest APIs

# have to be verified against certificate authority

## ssl_cert_ca_verify=True

[[yarn_clusters]]

# Enter the host on which you are running the ResourceManager

resourcemanager_host=192.168.1.201

# The port where the ResourceManager IPC listens on

resourcemanager_port=23140

# Whether to submit jobs to this cluster

submit_to=True

# Resource Manager logical name (required for HA)

## logical_name=

# Change this if your YARN cluster is Kerberos-secured

## security_enabled=false

# URL of the ResourceManager API

resourcemanager_api_url=http://hadoop001:23188

# URL of the ProxyServer API

proxy_api_url=http://hadoop001:8088

# URL of the HistoryServer API

history_server_api_url=http://hadoop001:19888

# URL of the Spark History Server

spark_history_server_url=http://hadoop001:18088

# In secure mode (HTTPS), if SSL certificates from YARN Rest APIs

# have to be verified against certificate authority

## ssl_cert_ca_verify=True

4.1.6 HA

# HA support by specifying multiple clusters.

# Redefine different properties there.

# e.g.

[[[ha]]]

# Resource Manager logical name (required for HA)

logical_name=beh

# Un-comment to enable

submit_to=True

# URL of the ResourceManager API

resourcemanager_api_url=http://hadoop003:23188

4.1.7 Hive

Hive需要hadoop001开启thriftserver2服务

[hadoop@hadoop001 ~]$ hive_start hiveserver2 &

Ø 然后修改hue配置文件

[beeswax]

# Host where HiveServer2 is running.

# If Kerberos security is enabled, use fully-qualified domain name (FQDN).

hive_server_host=192.168.1.201

# Port where HiveServer2 Thrift server runs on.

hive_server_port=10000

# Hive configuration directory, where hive-site.xml is located

hive_conf_dir=/opt/beh/core/hive/conf

4.1.8 HBase

HBase 需要hadoop001开启ThriftServer服务

[hadoop@hadoop001 ~]$ hbase-daemon.sh start thrift

Ø 然后修改hue配置文件

[hbase]

# Comma-separated list of HBase Thrift servers for clusters in the format of '(name|host:port)'.

# Use full hostname with security.

# If using Kerberos we assume GSSAPI SASL, not PLAIN.

hbase_clusters=(Cluster|192.168.1.201:9090)

#hbase_clusters=(hadoop|192.168.1.201:9090)

# HBase configuration directory, where hbase-site.xml is located.

hbase_conf_dir=/opt/beh/core/hbase/conf

4.1.9 ZooKeeper

只需要hadoop集群开启zookeeper服务即可

Ø 修改hue配置文件

[zookeeper]

[[clusters]]

[[[default]]]

# Zookeeper ensemble. Comma separated list of Host/Port.

# e.g. localhost:2181,localhost:2182,localhost:2183

host_ports=192.168.1.201:2181,192.168.1.202:2181,192.168.1.203:2181

# The URL of the REST contrib service (required for znode browsing).

rest_url=http://192.168.1.201:9998

# Name of Kerberos principal when using security.

## principal_name=zookeeper

4.1.10 Spark

Spark需要livy-server，spark-Jobserver，spark-thrift-server

Ø hue上安装启动Livv-server

[hadoop@hue ~]# git clone http://archive.cloudera.com/beta/livy/livy-server-0.2.0.zip

[hadoop@hue ~]# sudo zip livy-server-0.2.0.zip

[hadoop@hue ~]# cd livy-server-0.2.0

[hadoop@hue ~]# ./bin/livy-server

Ø hadoop001上启动spark-thrift-server

[hadoop@hadoop001 ~]# cd /opt/beh/core/spark/sbin

[hadoop@hadoop001 sbin]# ./start-thriftserver.sh --hiveconf hive.server2.thrift.port=10001

Ø hadoop001上启动spark-jobserver

[hadoop@hadoop001 ~]# rpm -ivh https://dl.bintray.com/sbt/rpm/sbt-0.13.6.rpm

[hadoop@hadoop001 ~]# git clone https://github.com/ooyala/spark-jobserver.git

[hadoop@hadoop001 ~]# cd spark-jobserver

[hadoop@hadoop001 ~]# sbt

> re-start

Ø 修改hue配置文件

spark]

# Host address of the Livy Server.

livy_server_host=hue

# Port of the Livy Server.

livy_server_port=8998

# Configure livy to start in local 'process' mode, or 'yarn' workers.

livy_server_session_kind=process

# If livy should use proxy users when submitting a job.

## livy_impersonation_enabled=true

# Host of the Sql Server

sql_server_host=192.168.1.201

# Port of the Sql Server

sql_server_port=10001

4.1.11 File Browser

需要将hdfs-site.xml中用户认证关闭，并重启服务。否则报类似cannot connect的错误。

dfs.permissions.enabled

false

需要在core-site.xml中添加如下代理用户配置，并重启服务。否则可能报类似cannot impersonate的错误。

hadoop.proxyuser.hue.hosts

hadoop.proxyuser.hue.groups

Hue安装部署

1 Hue简介

1.1 Hue介绍

1.2 功能和特性集合

1.3 Hadoop集群服务分布图

1.3.1 ZOOKEEPER&HDFS

1.3.2 YARN

1.3.3 HIVE&MYSQL

1.3.4 HBASE

1.4 Hue部署情况

2 环境及依赖

2.1 操作系统

2.2 配置局域网YUM源

2.3 防火墙

2.4 官方依赖

2.5 添加用户

3 安装部署

3.1 集群

3.2 beh软件包

3.3 安装依赖

3.3.1 JDK

3.3.2 ant

3.3.3 mvn

3.3.4 更新pytz

3.3.5 更新cffi

3.4 编译安装

3.4.1 安装

3.4.2 编译

3.4.3 测试

3.4.4 常见问题

3.4.5 权限

4 配置

4.1 配置各组件及所需服务开启

4.1.1 webserver listens

4.1.2 修改hadoop用户权限

4.1.3 使用mysql数据库

4.1.3.1 修改配置文件

4.1.3.2 创建hue相关mysql数据

4.1.3.3 同步初始化数据

4.1.4 HDFS

4.1.5 YARN

4.1.6 HA

4.1.7 Hive

4.1.8 HBase

4.1.9 ZooKeeper

4.1.10 Spark

4.1.11 File Browser

你可能感兴趣的:(数据库,git,java)