Hue 入门

1 简介

Hue 是什么?

Hue=Hadoop User Experience(Hadoop 用户体验),直白来说就一个开源的 Apache Hadoop UI 系统,它是基于Python Web 框架 Django 实现的,通过使用 Hue 我们可以在浏览器端的 Web 控制台上与 Hadoop 集群进行交互来分析处理数据。

2 安装部署

2.1、帮助文档

http://archive.cloudera.com/cdh5/cdh/5/hue-3.7.0-cdh5.3.0/manual.html

2.2、Hue 安装

1.安装前准备

必备的软件环境:

Centos 7.6+Python 2.7.5+JDK8+Maven-3.3.9+Ant-1.8.1+Hue-3.7.0

必备的集群环境:

Hadoop+HBase+Hive+ZK+MySQL+Oozie

配置环境变量

 #JAVA_HOME
export JAVA_HOME=/opt/module/jdk1.8.0_144
export PATH=JAVA_HOME/bin

MAVEN_HOME

export MAVEN_HOME=/opt/module/maven-3.3.9
export PATH=MAVEN_HOME/bin

HADOOP_HOME

export HADOOP_HOME=/opt/module/hadoop-2.7.2
export PATH=HADOOP_HOME/bin:$HADOOP_HOME/sbin

HIVE_HOME

export HIVE_HOME=/opt/module/hive-1.2.1
export PATH=HIVE_HOME/bin

HBASE_HOME

export HIVE_HOME=/opt/module/hbase-1.3.1
export PATH=HBASE_HOME/bin

ANT_HOME

export ANT_HOME=/opt/module/ant-1.8.1
export PATH=ANT_HOME/bin

重新加载 profile 文件,测试 maven、ant 是否安装成功

 [djm@hadoop102 ~] mvn -version
Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-11T00:41:47+08:00)
Maven home: /opt/module/maven-3.3.9
Java version: 1.8.0_144, vendor: Oracle Corporation
Java home: /opt/module/jdk1.8.0_144/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "3.10.0-957.el7.x86_64", arch: "amd64", family: "unix"
[djm@hadoop102 ~]$ ant -v
Apache Ant version 1.8.1 compiled on April 30 2010
Trying the default build file: build.xml
Buildfile: build.xml does not exist!
Build failed

安装 Hue 所需要的依赖包

 yum install asciidoc cyrus-sasl-devel cyrus-sasl-gssapi cyrus-sasl-plain gcc gcc-c++ krb5-devel libffi-devel libtidy libxml2-devel libxslt-devel make mysql mysql-devel openldap-devel python-devel sqlite-devel openssl-devel gmp-devel -y

解压

 [djm@hadoop102 ~]$ tar -zxvf hue-3.7.0-cdh5.3.6.tar -C /opt/module

编译

 [djm@hadoop102 ~] make apps

在编译时出现了下面的问题

OpenSSL/crypto/crl.c:6:23: error: static declaration of ‘X509_REVOKED_dup’ follows non-static declaration static X509_REVOKED * X509_REVOKED_dup(X509_REVOKED *orig) { ^ In file included from /usr/include/openssl/ssl.h:156:0, from OpenSSL/crypto/x509.h:17, from OpenSSL/crypto/crypto.h:30, from OpenSSL/crypto/crl.c:3: /usr/include/openssl/x509.h:751:15: note: previous declaration of ‘X509_REVOKED_dup’ was here X509_REVOKED *X509_REVOKED_dup(X509_REVOKED *rev); ^ error: command gcc failed with exit status 1 make[2]: [/opt/modules/hue-3.7.0-cdh5.3.6/desktop/core/build/pyopenssl/egg.stamp] Error 1 make[2]: Leaving directory /opt/modules/hue-3.7.0-cdh5.3.6/desktop/core make[1]: [.recursive-env-install/core] Error 2 make[1]: Leaving directory /opt/modules/hue-3.7.0-cdh5.3.6/desktop make: [desktop] Error 2

解决方法:删除 /usr/include/openssl/x509.h 的 751、752 行

X509_REVOKED *X509_REVOKED_dup(X509_REVOKED *rev);

X509_REQ *X509_REQ_dup(X509_REQ *req);

然后重新编译

编辑 hue.ini 文件

 [djm@hadoop102 hue-3.7.0-cdh5.3.6]$ vim desktop/conf/hue.ini
secret_key=jFE93j;2[290-eiw.KEiwN2s3['d;/.q[eIW^y#e=+Iei*@Mn http_host=hadoop102
http_port=8888
time_zone=Asia/Shanghai

启动服务

 [djm@hadoop102 hue-3.7.0-cdh5.3.6]$ build/env/bin/supervisor 

3 与其他框架集成

3.1 Hue 与 HDFS

编辑 hdfs-site.xml 文件

 
dfs.webhdfs.enabled
true

编辑 core-site.xml 文件

 
hadoop.proxyuser.hue.hosts



hadoop.proxyuser.hue.groups




hadoop.proxyuser.djm.hosts



hadoop.proxyuser.djm.groups





hadoop.proxyuser.httpfs.hosts



hadoop.proxyuser.httpfs.groups

编辑 httpfs-site.xml 文件

 
httpfs.proxyuser.hue.hosts



httpfs.proxyuser.hue.groups

分发

启动 httpfs 服务

 [djm@hadoop102 ~]$ /opt/module/hadoop-2.7.2/sbin/httpfs.sh start

编辑 hue.ini 文件

 # Configuration for HDFS NameNode

------------------------------------------------------------------------

[[hdfs_clusters]]

HA support by using HttpFs


[[[default]]]

Enter the filesystem uri

fs_defaultfs=hdfs://hadoop102:9000

NameNode logical name.

logical_name=

Use WebHdfs/HttpFs as the communication mechanism.

Domain should be the NameNode or HttpFs host.

Default port is 14000 for HttpFs.

webhdfs_url=http://hadoop102:50070/webhdfs/v1

Change this if your HDFS cluster is Kerberos-secured

security_enabled=false

Default umask for file and directory creation, specified in an octal value.

umask=022

Directory of the Hadoop configuration

hadoop_conf_dir=$HADOOP_CONF_DIR when set or '/etc/hadoop/conf'

hadoop_conf_dir=/opt/module/hadoop-2.7.2/etc/hadoop
hadoop_hdfs_home=/opt/module/hadoop-2.7.2
hadoop_bin=/opt/module/hadoop-2.7.2/bin

重启 Hue 服务

3.2 Hue 与 YRAN

编辑 hue.ini 文件

 # Configuration for YARN (MR2)

------------------------------------------------------------------------

[[yarn_clusters]]

[[[default]]]

Enter the host on which you are running the ResourceManager

resourcemanager_host=hadoop102

The port where the ResourceManager IPC listens on

resourcemanager_port=8032

Whether to submit jobs to this cluster

submit_to=True

Resource Manager logical name (required for HA)

logical_name=

Change this if your YARN cluster is Kerberos-secured

security_enabled=false

URL of the ResourceManager API

resourcemanager_api_url=http://hadoop103:8088

URL of the ProxyServer API

proxy_api_url=http://hadoop103:8088

URL of the HistoryServer API

history_server_api_url=http://hadoop104:19888

In secure mode (HTTPS), if SSL certificates from Resource Manager's

Rest Server have to be verified against certificate authority

ssl_cert_ca_verify=False

HA support by specifying multiple clusters

e.g.

[[[ha]]]

Resource Manager logical name (required for HA)

logical_name=my-rm-name

重启 Hue 服务

3.3 Hue 与 Hive

修改 hive-site.xml



hive.server2.thrift.port
10000



hive.server2.thrift.bind.host
hadoop102



hive.server2.long.polling.timeout
5000



hive.metastore.uris
thrift://192.168.10.102:9083

启动 Hive 相关服务

 [djm@hadoop102 hive-1.2.1] bin/hive --service hiveserver2 &

编辑 hue.ini 文件

 [beeswax]

Host where HiveServer2 is running.

If Kerberos security is enabled, use fully-qualified domain name (FQDN).

hive_server_host=hadoop102

Port where HiveServer2 Thrift server runs on.

hive_server_port=10000

Hive configuration directory, where hive-site.xml is located

hive_conf_dir=/opt/module/hive-1.2.1/conf

Timeout in seconds for thrift calls to Hive service

server_conn_timeout=120

Choose whether Hue uses the GetLog() thrift call to retrieve Hive logs.

If false, Hue will use the FetchResults() thrift call instead.

use_get_log_api=true

Set a LIMIT clause when browsing a partitioned table.

A positive value will be set as the LIMIT. If 0 or negative, do not set any limit.

browse_partitioned_table_limit=250

A limit to the number of rows that can be downloaded from a query.

A value of -1 means there will be no limit.

A maximum of 65,000 is applied to XLS downloads.

download_row_limit=1000000

Hue will try to close the Hive query when the user leaves the editor page.

This will free all the query resources in HiveServer2, but also make its results inaccessible.

close_queries=false

Thrift version to use when communicating with HiveServer2

thrift_version=5

重启 Hue 服务

3.4 Hue 与 MySQL

编辑 hue.ini 文件

 [[[mysql]]]

Name to show in the UI.

nice_name="My SQL DB"

For MySQL and PostgreSQL, name is the name of the database.

For Oracle, Name is instance of the Oracle server. For express edition

this is 'xe' by default.

name=mysqldb

Database backend to use. This can be:

1. mysql

2. postgresql

3. oracle

engine=mysql

IP or hostname of the database to connect to.

host=hadoop102

Port the database server is listening to. Defaults are:

1. MySQL: 3306

2. PostgreSQL: 5432

3. Oracle Express Edition: 1521

port=3306

Username to authenticate with when connecting to the database.

user=root

Password matching the username to authenticate with when

connecting to the database.

password=123456

Database options to send to the server when connecting.

https://docs.djangoproject.com/en/1.4/ref/databases/

options={}

重启 Hue 服务

3.5 Hue 与 Zookeeper

编辑 hue.ini 文件

 [zookeeper]

[[clusters]]

[[[default]]]

Zookeeper ensemble. Comma separated list of Host/Port.

e.g. localhost:2181,localhost:2182,localhost:2183

host_ports=hadoop102:2181,hadoop103:2181,hadoop104:2181

The URL of the REST contrib service (required for znode browsing)

rest_url=http://localhost:9998

重启 Hue 服务

3.6 Hue 与 HBase

启动 HBase 服务和Hbase的 thrift 服务

 [djm@hadoop102 hbase-1.3.1] bin/hbase-daemon.sh start thrift

编辑 hue.ini 文件

 [hbase]

Comma-separated list of HBase Thrift servers for clusters in the format of '(name|host:port)'.

Use full hostname with security.

hbase_clusters=(Cluster|hadoop102:9090)

HBase configuration directory, where hbase-site.xml is located.

hbase_conf_dir=/opt/module/hbase-1.3.1/conf

Hard limit of rows or columns per row fetched before truncating.

truncate_limit = 500

'buffered' is the default of the HBase Thrift Server and supports security.

'framed' can be used to chunk up responses,

which is useful when used in conjunction with the nonblocking server in Thrift.

thrift_transport=buffered

重启 Hue 服务

3.7 Hue 与 Oozie

编辑 hue.ini 文件

 [liboozie]

The URL where the Oozie service runs on. This is required in order for

users to submit jobs. Empty value disables the config check.

oozie_url=http://hadoop:11000/oozie

Requires FQDN in oozie_url if enabled

security_enabled=false

Location on HDFS where the workflows/coordinator are deployed when submitted.

remote_deployement_dir=/user/djm/oozie-apps


###########################################################################

Settings to configure the Oozie app

###########################################################################

[oozie]

Location on local FS where the examples are stored.

local_data_dir=/opt/module/oozie-4.0.0-cdh5.3.6/examples

Location on local FS where the data for the examples is stored.

sample_data_dir=/opt/module/oozie-4.0.0-cdh5.3.6/oozie-apps

Location on HDFS where the oozie examples and workflows are stored.

remote_data_dir=/user/djm/oozie-apps

Maximum of Oozie workflows or coodinators to retrieve in one API call.

oozie_jobs_count=100

Use Cron format for defining the frequency of a Coordinator instead of the old frequency number/unit.

enable_cron_scheduling=true

重启 Hue 服务

你可能感兴趣的:(Hue 入门)