Apache Superset是一个开源的、现代的、轻量级BI分析工具,能够对接多种数据源、拥有丰富的图标展示形式、支持自定义仪表盘,且拥有友好的用户界面,十分易用。
Apache Superset的前身是Caravel,是指明短租房企业airbub的开源项目。
由于Superset能够对接常用的大数据分析工具,如Hive、Kylin、Druid等,且支持自定义仪表盘,故可作为数仓的可视化工具
Superset官网地址:http://superset.apache.org/
mysql | 5.6.24 |
---|---|
hadoop | 2.7.2 |
hive | 1.2.1 |
zookeeper | 3.4.10 |
hbase | 1.3.1 |
kylin | 2.5.1 |
superset | 0.36.0 |
CentOS7安过程省略。预先创建用户/用户组zhouchen
预先安装mysql
预先安装Hadoop
预先安装hive
预先安装zookeeper
预先安装HBASE
预先安装Kylin
针对CentOS7以下
1.查看防火墙状态
service iptables status
2.停止防火墙
service iptables stop
3.启动防火墙
service iptables start
Superset是由Python语言编写的Web应用,要求Python3.6的环境
conda是一个开源的包、环境管理器,可以用于在同一个机器上安装不同Python版本的软件包及其依赖,并能够在不同的Python环境之间切换,Anaconda包括Conda、Python以及一大堆安装好的工具包,比如:numpy、pandas等,Miniconda包括Conda、Python。
此处,我们不需要如此多的工具包,故选择MiniConda。
1)下载Miniconda(Python3版本)
下载地址:https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
2)安装Miniconda
(1)执行以下命令进行安装,并按照提示操作,直到安装完成。
[zhouchen@hadoop202 lib]$ bash Miniconda3-latest-Linux-x86_64.sh
(2)在安装过程中,出现以下提示时,可以指定安装路径
(3)出现以下字样,即为安装完成
3)配置环境变量
修改sudo vi /etc/profile.d/my_env.sh文件,内容如下
#CONDA_HOME
export CONDA_HOME=/opt/module/miniconda3
export PATH=$CONDA_HOME/bin:$PATH
4)取消激活base环境
Miniconda安装完成后,每次打开终端都会激活其默认的base环境,我们可通过以下命令,禁止激活默认base环境。
[zhouchen@hadoop202 minicondas]$ conda config --set auto_activate_base false
1)配置conda国内镜像
[zhouchen@hadoop202 minicondas]$ conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
[zhouchen@hadoop202 minicondas]$ conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
[zhouchen@hadoop202 minicondas]$ conda config --set show_channel_urls yes
2)创建Python3.6环境
[zhouchen@hadoop202 minicondas]$ conda create --name superset python=3.6
说明:conda环境管理常用命令
创建环境:conda create -n env_name
查看所有环境:conda info --envs
删除一个环境:conda remove -n env_name --all
3)激活superset环境
[zhouchen@hadoop202 minicondas]$ source activate superset
激活后效果如下图所示
说明:退出当前环境
(superset)[zhouchen@hadoop202 minicondas]$ source deactivate
安装Superset之前,需安装以下所需依赖
[zhouchen@hadoop202 software]$ sudo yum install -y python-setuptools
[zhouchen@hadoop202 software]$ sudo yum install -y gcc gcc-c++ libffi-devel python-devel python-pip python-wheel openssl-devel cyrus-sasl-devel openldap-devel
1)安装(更新)setuptools和pip
(superset)[zhouchen@hadoop202 software]$ pip install --upgrade setuptools pip -i https://pypi.douban.com/simple/
说明:pip是python的包管理工具,可以和centos中的yum类比
2)安装Supetset
(superset)[zhouchen@hadoop202 software]$ pip install apache-superset -i https://pypi.douban.com/simple/
说明:-i的作用是指定镜像,这里选择国内镜像
3)初始化Supetset数据库
(superset)[zhouchen@hadoop202 software]$ superset db upgrade
4)创建管理员用户
(superset)[zhouchen@hadoop202 software]$ export FLASK_APP=superset
(superset)[zhouchen@hadoop202 software]$ flask fab create-admin
说明:flask是一个python web框架,Superset使用的就是flask
5)Superset初始化
(superset)[zhouchen@hadoop202 software]$ superset init
6)安装pykylin4superset
(superset)[zhouchen@hadoop202 software]$ pip install kylinpy -i https://pypi.douban.com/simple/
1)安装gunicorn
(superset)[zhouchen@hadoop202 software]$ pip install gunicorn -i https://pypi.douban.com/simple/
说明:gunicorn是一个Python Web Server,可以和java中的TomCat类比
2)启动Superset
第一步:确保当前conda环境为superset,及下图所示
第二步:启动
(superset)[zhouchen@hadoop202 software]$ superset run -h hadoop202 -p 8787
或者
[zhouchen@hadoop202 minicondas]$ bin/superset run -h hadoop202 -p 8787
或者
(superset)[zhouchen@hadoop202 software]$ gunicorn --workers 5 --timeout 120 --bind hadoop202:8787 superset:app --daemon
说明:
–workers:指定进程个数
–timeout:worker进程超时时间,超时会自动重启
–bind:绑定本机地址,即为Superset访问地址
–daemon:后台运行
3)停止superset
停掉gunicorn进程
ps -ef | awk '/gunicorn/ && !/awk/{print $2}' | xargs kill -9
退出superset环境
(superset)[zhouchen@hadoop202 software]$ source deactivate
4)登录Superset
访问http://hadoop202:8787,并使用2.3.2节中第4步创建的管理员账号进行登录
(superset)[zhouchen@hadoop202 software]$ source install mysqlclient
说明:对接不同的数据源,需安装不同的依赖,以下地址为官网说明
[zhouchen@hadoop202 minicondas]$ vim envs/superset/lib/python3.6/site-packages/superset/config.py
#修改内容如下:
# The SQLAlchemy connection string.
#SQLALCHEMY_DATABASE_URI = "sqlite:///" + os.path.join(DATA_DIR, "superset.db")
SQLALCHEMY_DATABASE_URI = 'mysql://root:000000@hadoop202:3306/zhongxin_test'
1)Database配置
Step1:点击Sources/Databases
Step2:点击+
Step3:点击填写Database及SQL Alchemy URI
注:SQL Alchemy URI编写规范:mysql://账号:密码@IP/数据库名称
**如果这里连接测试不成功,一定要按照3.1.2修改mysql配置
Step4:点击Test Connection,出现“Seems Ok!”提示即表示连接成功
2)Table配置
Step1:点击Sources/Tables
Step2:点击Sources/Tables
6)按照说明配置图表
7)点击“Run Query”
8)保存图表,并将其添加到仪表盘
启动Hadoop集群、启动zookeeper集群、启动hive、启动HBASE集群。启动kylin。
检查kylin Web页面:http://hadoop102:7070/kylin/query
默认登陆账号:ADMIN/KYLIN
1.database随意
2.SQL Alchemy URI格式:kylin://ADMIN:KYLIN@ip:7070/kylin/api?project=KylinProjectName
IP为kylin安装的服务器IP;project为kylin构建的项目
使用Superset添加Kylin可视化,安装过程有一点麻烦,由于众所周知的原因,官网404不可访问,使用国内镜像安装的时候网速也比较卡顿,很难安装成功,需要使用Conda配套安装。
SuperSet可视化Kylin配置简单,但是前后上下游的配置链比较长,只要是在Kylin测需要配置好project、module、table等。
另外一点需要说明的是,Superset支持的可视化类型多样化,但是只支持单表的可视化,不支持多表join的可视化。
报错:
==========================[QUERY]===============================
Got an error in pre_add for DEFAULT.dwd_payment_info
Traceback (most recent call last):
File "/opt/module/miniconda3/envs/superset/lib/python3.7/site-packages/kylinpy/sqla_dialect.py", line 129, in get_columns
columns = conn.connection.connection.get_table_source(table_name, schema).columns
File "/opt/module/miniconda3/envs/superset/lib/python3.7/site-packages/kylinpy/kylinpy.py", line 114, in get_table_source
return TableSource(name, schema, self.service.tables_and_columns().get(fullname))
File "/opt/module/miniconda3/envs/superset/lib/python3.7/site-packages/kylinpy/datasource/table_source.py", line 17, in __init__
raise NoSuchTableError
kylinpy.exceptions.NoSuchTableError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/module/miniconda3/envs/superset/lib/python3.7/site-packages/superset/connectors/sqla/views.py", line 389, in pre_add
table.get_sqla_table_object()
File "/opt/module/miniconda3/envs/superset/lib/python3.7/site-packages/superset/connectors/sqla/models.py", line 1074, in get_sqla_table_object
return self.database.get_table(self.table_name, schema=self.schema)
File "/opt/module/miniconda3/envs/superset/lib/python3.7/site-packages/superset/models/core.py", line 580, in get_table
autoload_with=self.get_sqla_engine(),
File "" , line 2, in __new__
File "/opt/module/miniconda3/envs/superset/lib/python3.7/site-packages/sqlalchemy/util/deprecations.py", line 139, in warned
return fn(*args, **kwargs)
File "/opt/module/miniconda3/envs/superset/lib/python3.7/site-packages/sqlalchemy/sql/schema.py", line 559, in __new__
metadata._remove_table(name, schema)
File "/opt/module/miniconda3/envs/superset/lib/python3.7/site-packages/sqlalchemy/util/langhelpers.py", line 69, in __exit__
exc_value, with_traceback=exc_tb,
File "/opt/module/miniconda3/envs/superset/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 178, in raise_
raise exception
File "/opt/module/miniconda3/envs/superset/lib/python3.7/site-packages/sqlalchemy/sql/schema.py", line 554, in __new__
table._init(name, metadata, *args, **kw)
File "/opt/module/miniconda3/envs/superset/lib/python3.7/site-packages/sqlalchemy/sql/schema.py", line 648, in _init
resolve_fks=resolve_fks,
File "/opt/module/miniconda3/envs/superset/lib/python3.7/site-packages/sqlalchemy/sql/schema.py", line 672, in _autoload
_extend_on=_extend_on,
File "/opt/module/miniconda3/envs/superset/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 2215, in run_callable
return conn.run_callable(callable_, *args, **kwargs)
File "/opt/module/miniconda3/envs/superset/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1654, in run_callable
return callable_(self, *args, **kwargs)
File "/opt/module/miniconda3/envs/superset/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 470, in reflecttable
table, include_columns, exclude_columns, resolve_fks, **opts
File "/opt/module/miniconda3/envs/superset/lib/python3.7/site-packages/sqlalchemy/engine/reflection.py", line 666, in reflecttable
table_name, schema, **table.dialect_kwargs
File "/opt/module/miniconda3/envs/superset/lib/python3.7/site-packages/sqlalchemy/engine/reflection.py", line 392, in get_columns
self.bind, table_name, schema, info_cache=self.info_cache, **kw
File "/opt/module/miniconda3/envs/superset/lib/python3.7/site-packages/kylinpy/sqla_dialect.py", line 135, in get_columns
raise sqlalchemy.exc.NoSuchTableError
sqlalchemy.exc.NoSuchTableError: ()
ERROR:superset.connectors.sqla.views:Got an error in pre_add for DEFAULT.dwd_payment_info
Traceback (most recent call last):
File "/opt/module/miniconda3/envs/superset/lib/python3.7/site-packages/kylinpy/sqla_dialect.py", line 129, in get_columns
columns = conn.connection.connection.get_table_source(table_name, schema).columns
File "/opt/module/miniconda3/envs/superset/lib/python3.7/site-packages/kylinpy/kylinpy.py", line 114, in get_table_source
return TableSource(name, schema, self.service.tables_and_columns().get(fullname))
File "/opt/module/miniconda3/envs/superset/lib/python3.7/site-packages/kylinpy/datasource/table_source.py", line 17, in __init__
raise NoSuchTableError
kylinpy.exceptions.NoSuchTableError
解决:
Superset添加Kylin可视化的时候,需要添加Kylin表。这里有一个大坑,表名必须大写,并且填Schema/模式需要填Kylin的project: