1 简介
Hue 是什么?
Hue=Hadoop User Experience(Hadoop 用户体验),直白来说就一个开源的 Apache Hadoop UI 系统,它是基于Python Web 框架 Django 实现的,通过使用 Hue 我们可以在浏览器端的 Web 控制台上与 Hadoop 集群进行交互来分析处理数据。
2 安装部署
2.1、帮助文档
http://archive.cloudera.com/cdh5/cdh/5/hue-3.7.0-cdh5.3.0/manual.html
2.2、Hue 安装
1.安装前准备
必备的软件环境:
Centos 7.6+Python 2.7.5+JDK8+Maven-3.3.9+Ant-1.8.1+Hue-3.7.0
必备的集群环境:
Hadoop+HBase+Hive+ZK+MySQL+Oozie
配置环境变量
#JAVA_HOME
export JAVA_HOME=/opt/module/jdk1.8.0_144
export PATH=JAVA_HOME/bin
MAVEN_HOME
export MAVEN_HOME=/opt/module/maven-3.3.9
export PATH=MAVEN_HOME/bin
HADOOP_HOME
export HADOOP_HOME=/opt/module/hadoop-2.7.2
export PATH=HADOOP_HOME/bin:$HADOOP_HOME/sbin
HIVE_HOME
export HIVE_HOME=/opt/module/hive-1.2.1
export PATH=HIVE_HOME/bin
HBASE_HOME
export HIVE_HOME=/opt/module/hbase-1.3.1
export PATH=HBASE_HOME/bin
ANT_HOME
export ANT_HOME=/opt/module/ant-1.8.1
export PATH=ANT_HOME/bin
重新加载 profile 文件,测试 maven、ant 是否安装成功
[djm@hadoop102 ~] mvn -version
Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-11T00:41:47+08:00)
Maven home: /opt/module/maven-3.3.9
Java version: 1.8.0_144, vendor: Oracle Corporation
Java home: /opt/module/jdk1.8.0_144/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "3.10.0-957.el7.x86_64", arch: "amd64", family: "unix"
[djm@hadoop102 ~]$ ant -v
Apache Ant version 1.8.1 compiled on April 30 2010
Trying the default build file: build.xml
Buildfile: build.xml does not exist!
Build failed
安装 Hue 所需要的依赖包
yum install asciidoc cyrus-sasl-devel cyrus-sasl-gssapi cyrus-sasl-plain gcc gcc-c++ krb5-devel libffi-devel libtidy libxml2-devel libxslt-devel make mysql mysql-devel openldap-devel python-devel sqlite-devel openssl-devel gmp-devel -y
解压
[djm@hadoop102 ~]$ tar -zxvf hue-3.7.0-cdh5.3.6.tar -C /opt/module
编译
[djm@hadoop102 ~] make apps
在编译时出现了下面的问题
OpenSSL/crypto/crl.c:6:23: error: static declaration of ‘X509_REVOKED_dup’ follows non-static declaration static X509_REVOKED * X509_REVOKED_dup(X509_REVOKED *orig) { ^ In file included from /usr/include/openssl/ssl.h:156:0, from OpenSSL/crypto/x509.h:17, from OpenSSL/crypto/crypto.h:30, from OpenSSL/crypto/crl.c:3: /usr/include/openssl/x509.h:751:15: note: previous declaration of ‘X509_REVOKED_dup’ was here X509_REVOKED *X509_REVOKED_dup(X509_REVOKED *rev); ^ error: command gcc failed with exit status 1 make[2]: [/opt/modules/hue-3.7.0-cdh5.3.6/desktop/core/build/pyopenssl/egg.stamp] Error 1 make[2]: Leaving directory /opt/modules/hue-3.7.0-cdh5.3.6/desktop/core make[1]: [.recursive-env-install/core] Error 2 make[1]: Leaving directory /opt/modules/hue-3.7.0-cdh5.3.6/desktop make: [desktop] Error 2
解决方法:删除 /usr/include/openssl/x509.h 的 751、752 行
X509_REVOKED *X509_REVOKED_dup(X509_REVOKED *rev);
X509_REQ *X509_REQ_dup(X509_REQ *req);
然后重新编译
编辑 hue.ini 文件
[djm@hadoop102 hue-3.7.0-cdh5.3.6]$ vim desktop/conf/hue.ini
secret_key=jFE93j;2[290-eiw.KEiwN2s3['d;/.q[eIW^y#e=+Iei*@Mn http_host=hadoop102
http_port=8888
time_zone=Asia/Shanghai
启动服务
[djm@hadoop102 hue-3.7.0-cdh5.3.6]$ build/env/bin/supervisor
3 与其他框架集成
3.1 Hue 与 HDFS
编辑 hdfs-site.xml 文件
dfs.webhdfs.enabled
true
编辑 core-site.xml 文件
hadoop.proxyuser.hue.hosts
hadoop.proxyuser.hue.groups
hadoop.proxyuser.djm.hosts
hadoop.proxyuser.djm.groups
hadoop.proxyuser.httpfs.hosts
hadoop.proxyuser.httpfs.groups
编辑 httpfs-site.xml 文件
httpfs.proxyuser.hue.hosts
httpfs.proxyuser.hue.groups
分发
启动 httpfs 服务
[djm@hadoop102 ~]$ /opt/module/hadoop-2.7.2/sbin/httpfs.sh start
编辑 hue.ini 文件
# Configuration for HDFS NameNode
------------------------------------------------------------------------
[[hdfs_clusters]]
HA support by using HttpFs
[[[default]]]
Enter the filesystem uri
fs_defaultfs=hdfs://hadoop102:9000
NameNode logical name.
logical_name=
Use WebHdfs/HttpFs as the communication mechanism.
Domain should be the NameNode or HttpFs host.
Default port is 14000 for HttpFs.
webhdfs_url=http://hadoop102:50070/webhdfs/v1
Change this if your HDFS cluster is Kerberos-secured
security_enabled=false
Default umask for file and directory creation, specified in an octal value.
umask=022
Directory of the Hadoop configuration
hadoop_conf_dir=$HADOOP_CONF_DIR when set or '/etc/hadoop/conf'
hadoop_conf_dir=/opt/module/hadoop-2.7.2/etc/hadoop
hadoop_hdfs_home=/opt/module/hadoop-2.7.2
hadoop_bin=/opt/module/hadoop-2.7.2/bin
重启 Hue 服务
3.2 Hue 与 YRAN
编辑 hue.ini 文件
# Configuration for YARN (MR2)
------------------------------------------------------------------------
[[yarn_clusters]]
[[[default]]]
Enter the host on which you are running the ResourceManager
resourcemanager_host=hadoop102
The port where the ResourceManager IPC listens on
resourcemanager_port=8032
Whether to submit jobs to this cluster
submit_to=True
Resource Manager logical name (required for HA)
logical_name=
Change this if your YARN cluster is Kerberos-secured
security_enabled=false
URL of the ResourceManager API
resourcemanager_api_url=http://hadoop103:8088
URL of the ProxyServer API
proxy_api_url=http://hadoop103:8088
URL of the HistoryServer API
history_server_api_url=http://hadoop104:19888
In secure mode (HTTPS), if SSL certificates from Resource Manager's
Rest Server have to be verified against certificate authority
ssl_cert_ca_verify=False
HA support by specifying multiple clusters
e.g.
[[[ha]]]
Resource Manager logical name (required for HA)
logical_name=my-rm-name
重启 Hue 服务
3.3 Hue 与 Hive
修改 hive-site.xml
hive.server2.thrift.port
10000
hive.server2.thrift.bind.host
hadoop102
hive.server2.long.polling.timeout
5000
hive.metastore.uris
thrift://192.168.10.102:9083
启动 Hive 相关服务
[djm@hadoop102 hive-1.2.1] bin/hive --service hiveserver2 &
编辑 hue.ini 文件
[beeswax]
Host where HiveServer2 is running.
If Kerberos security is enabled, use fully-qualified domain name (FQDN).
hive_server_host=hadoop102
Port where HiveServer2 Thrift server runs on.
hive_server_port=10000
Hive configuration directory, where hive-site.xml is located
hive_conf_dir=/opt/module/hive-1.2.1/conf
Timeout in seconds for thrift calls to Hive service
server_conn_timeout=120
Choose whether Hue uses the GetLog() thrift call to retrieve Hive logs.
If false, Hue will use the FetchResults() thrift call instead.
use_get_log_api=true
Set a LIMIT clause when browsing a partitioned table.
A positive value will be set as the LIMIT. If 0 or negative, do not set any limit.
browse_partitioned_table_limit=250
A limit to the number of rows that can be downloaded from a query.
A value of -1 means there will be no limit.
A maximum of 65,000 is applied to XLS downloads.
download_row_limit=1000000
Hue will try to close the Hive query when the user leaves the editor page.
This will free all the query resources in HiveServer2, but also make its results inaccessible.
close_queries=false
Thrift version to use when communicating with HiveServer2
thrift_version=5
重启 Hue 服务
3.4 Hue 与 MySQL
编辑 hue.ini 文件
[[[mysql]]]
Name to show in the UI.
nice_name="My SQL DB"
For MySQL and PostgreSQL, name is the name of the database.
For Oracle, Name is instance of the Oracle server. For express edition
this is 'xe' by default.
name=mysqldb
Database backend to use. This can be:
1. mysql
2. postgresql
3. oracle
engine=mysql
IP or hostname of the database to connect to.
host=hadoop102
Port the database server is listening to. Defaults are:
1. MySQL: 3306
2. PostgreSQL: 5432
3. Oracle Express Edition: 1521
port=3306
Username to authenticate with when connecting to the database.
user=root
Password matching the username to authenticate with when
connecting to the database.
password=123456
Database options to send to the server when connecting.
https://docs.djangoproject.com/en/1.4/ref/databases/
options={}
重启 Hue 服务
3.5 Hue 与 Zookeeper
编辑 hue.ini 文件
[zookeeper]
[[clusters]]
[[[default]]]
Zookeeper ensemble. Comma separated list of Host/Port.
e.g. localhost:2181,localhost:2182,localhost:2183
host_ports=hadoop102:2181,hadoop103:2181,hadoop104:2181
The URL of the REST contrib service (required for znode browsing)
rest_url=http://localhost:9998
重启 Hue 服务
3.6 Hue 与 HBase
启动 HBase 服务和Hbase的 thrift 服务
[djm@hadoop102 hbase-1.3.1] bin/hbase-daemon.sh start thrift
编辑 hue.ini 文件
[hbase]
Comma-separated list of HBase Thrift servers for clusters in the format of '(name|host:port)'.
Use full hostname with security.
hbase_clusters=(Cluster|hadoop102:9090)
HBase configuration directory, where hbase-site.xml is located.
hbase_conf_dir=/opt/module/hbase-1.3.1/conf
Hard limit of rows or columns per row fetched before truncating.
truncate_limit = 500
'buffered' is the default of the HBase Thrift Server and supports security.
'framed' can be used to chunk up responses,
which is useful when used in conjunction with the nonblocking server in Thrift.
thrift_transport=buffered
重启 Hue 服务
3.7 Hue 与 Oozie
编辑 hue.ini 文件
[liboozie]
The URL where the Oozie service runs on. This is required in order for
users to submit jobs. Empty value disables the config check.
oozie_url=http://hadoop:11000/oozie
Requires FQDN in oozie_url if enabled
security_enabled=false
Location on HDFS where the workflows/coordinator are deployed when submitted.
remote_deployement_dir=/user/djm/oozie-apps
###########################################################################
Settings to configure the Oozie app
###########################################################################
[oozie]
Location on local FS where the examples are stored.
local_data_dir=/opt/module/oozie-4.0.0-cdh5.3.6/examples
Location on local FS where the data for the examples is stored.
sample_data_dir=/opt/module/oozie-4.0.0-cdh5.3.6/oozie-apps
Location on HDFS where the oozie examples and workflows are stored.
remote_data_dir=/user/djm/oozie-apps
Maximum of Oozie workflows or coodinators to retrieve in one API call.
oozie_jobs_count=100
Use Cron format for defining the frequency of a Coordinator instead of the old frequency number/unit.
enable_cron_scheduling=true
重启 Hue 服务