注意:superset官方文档说明不支持Windows系统
,我开始使用Windows,后面在配置缩略图有错误。
OS:Ubuntu20.04
Superset:1.3.2
这是我在安装中遇到的问题及查询搜集的资料,一些问题的解决办法可能不适合你,但希望还是能够帮到需要的人。
下面是一些我在安装过程中总结的一些需要注意的:
pip install Pillow -i https://pypi.douban.com/simple
# 如果不安装xlrd,页面数据列不会出现 上传Excel
pip install xlrd
# 如果不安装mysqlclient,连接MySQL会报错,我之前改数据源为MySQL时就因为这个,排了很长时间的错
pip install mysqlclient
pip install pymysql
# 建议正式安装之前先配置config.py,改变数据源为MySQL,现在的默认源sqlite后续官方将会舍弃
SQLALCHEMY_DATABASE_URI = 'mysql://root:123456@localhost/superset_meta?charset=utf8'
对于安装过程中的superset load_examples
提示提示urllib.error.HTTPError: HTTP Error 429: too many requests的错误:
这是我在网上找的的办法:
在上面第三步的终端里面输入 python -m http.server
7.找到superset中helper.py
我的目录是 ~/桌面/superset_venv/lib/python3.9/site-packages/superset/examples/helper.py
8.修改读取url地址为刚才我们本地的地址
BASE_URL = "http://192.168.140.72:8000/"
9.重新运行 superset load_examples
即可。
这是我根据之前在Windows上改的,应该差不多。
注意:此为网上搜索整理
# 通过Flask-WTF来保护表单免受CSRF攻击
# Flask-WTF flag for CSRF
WTF_CSRF_ENABLED = True
ENABLE_PROXY_FIX = True # 当superset是运行在load balancer(nginx 或者ELB)中, 需要告诉gunicorn有哪些 X-Forwarded-* headers是可以信任的. 通过设置--forwarded-allow-ips来设置一系列的可靠IP地址.
DEFAULT_FEATURE_FLAGS: = {
# allow dashboard to use sub-domains to send chart request
# you also need ENABLE_CORS and
# SUPERSET_WEBSERVER_DOMAINS for list of domains
"ALLOW_DASHBOARD_DOMAIN_SHARDING": True, # 允许仪表板的域分片
# Experimental feature introducing a client (browser) cache
"CLIENT_CACHE": False,
"DISABLE_DATASET_SOURCE_EDIT": False, # 禁用数据集编辑,变为只读模式
# TODO CWJ
"DYNAMIC_PLUGINS": True, # 动态插件
# "DYNAMIC_PLUGINS": False,
# For some security concerns, you may need to enforce CSRF protection on
# all query request to explore_json endpoint. In Superset, we use
# `flask-csrf `_ add csrf protection
# for all POST requests, but this protection doesn't apply to GET method.
# When ENABLE_EXPLORE_JSON_CSRF_PROTECTION is set to true, your users cannot
# make GET request to explore_json. explore_json accepts both GET and POST request.
# See `PR 7935 `_ for more details.
"ENABLE_EXPLORE_JSON_CSRF_PROTECTION": False,
"ENABLE_TEMPLATE_PROCESSING": False,
"ENABLE_TEMPLATE_REMOVE_FILTERS": False,
"KV_STORE": False,
# When this feature is enabled, nested types in Presto will be
# expanded into extra columns and/or arrays. This is experimental,
# and doesn't work with all nested types.
"PRESTO_EXPAND_DATA": False,
# Exposes API endpoint to compute thumbnails
# TODO CWJ
"THUMBNAILS": True, # 开启缩略图
# "THUMBNAILS": False,
"DASHBOARD_CACHE": False, # 看板查询缓存
"REMOVE_SLICE_LEVEL_LABEL_COLORS": False,
"SHARE_QUERIES_VIA_KV_STORE": False,
"TAGGING_SYSTEM": False, # 开启标记功能
"SQLLAB_BACKEND_PERSISTENCE": False,
"LISTVIEWS_DEFAULT_CARD_VIEW": False, # 卡片显示
# Enables the replacement React views for all the FAB views (list, edit, show) with
# designs introduced in https://github.com/apache/superset/issues/8976
# (SIP-34). This is a work in progress so not all features available in FAB have
# been implemented.
"ENABLE_REACT_CRUD_VIEWS": True,
# When True, this flag allows display of HTML tags in Markdown components
"DISPLAY_MARKDOWN_HTML": True,
# When True, this escapes HTML (rather than rendering it) in Markdown components
"ESCAPE_MARKDOWN_HTML": False,
# TODO CWJ
"DASHBOARD_NATIVE_FILTERS": True, # 开启过滤级联
"DASHBOARD_CROSS_FILTERS": True, # 开启交叉过滤器
"DASHBOARD_NATIVE_FILTERS_SET": True, # 过滤集
# "DASHBOARD_NATIVE_FILTERS": False,
# "DASHBOARD_CROSS_FILTERS": False,
# "DASHBOARD_NATIVE_FILTERS_SET": False,
"DASHBOARD_FILTERS_EXPERIMENTAL": False,
"GLOBAL_ASYNC_QUERIES": False,
# TODO CWJ
"VERSIONED_EXPORT": True, # 开启导入功能
# "VERSIONED_EXPORT": False,
# Note that: RowLevelSecurityFilter is only given by default to the Admin role
# and the Admin Role does have the all_datasources security permission.
# But, if users create a specific role with access to RowLevelSecurityFilter MVC
# and a custom datasource access, the table dropdown will not be correctly filtered
# by that custom datasource access. So we are assuming a default security config,
# a custom security config could potentially give access to setting filters on
# tables that users do not have access to.
"ROW_LEVEL_SECURITY": True, # 行级权限
# Enables Alerts and reports new implementation
# TODO CWJ
# "ALERT_REPORTS": False,
"ALERT_REPORTS": True, # 警报和报告
# Enable experimental feature to search for other dashboards
"OMNIBAR": False, # 开启下拉可见性和键盘命令
"DASHBOARD_RBAC": False,
"ENABLE_EXPLORE_DRAG_AND_DROP": False,
# Enabling ALERTS_ATTACH_REPORTS, the system sends email and slack message
# with screenshot and link
# Disables ALERTS_ATTACH_REPORTS, the system DOES NOT generate screenshot
# for report with type 'alert' and sends email and slack message with only link;
# for report with type 'report' still send with email and slack message with
# screenshot and link
"ALERTS_ATTACH_REPORTS": True,
# FORCE_DATABASE_CONNECTIONS_SSL is depreciated.
"FORCE_DATABASE_CONNECTIONS_SSL": False,
# Enabling ENFORCE_DB_ENCRYPTION_UI forces all database connections to be
# encrypted before being saved into superset metastore.
"ENFORCE_DB_ENCRYPTION_UI": False,
# Allow users to export full CSV of table viz type.
# This could cause the server to run out of memory or compute.
"ALLOW_FULL_CSV_EXPORT": False,
"UX_BETA": False,
}
# 配置
"ALLOW_DASHBOARD_DOMAIN_SHARDING": True, # 允许仪表板的域分片
# Chrome 允许每个域一次最多打开 6 个连接。当dashboard超过6个slice时,大量的fetch请求会排队等待下一个可用的socket。此 PR 尝试允许Superset 的域分片,并且此功能将仅通过配置启用(默认情况下 Superset 不允许跨域请求)
"CLIENT_CACHE": True, # 开启客户端缓存
"DISABLE_DATASET_SOURCE_EDIT": True, # 为数据集编辑器的源选项卡添加只读模式
# 默认:每个人都可以编辑源:
"ENABLE_EXPLORE_JSON_CSRF_PROTECTION": False, # 仅通过功能标志允许 explore_json 的 GET 方法
# 当ENABLE_EXPLORE_JSON_CSRF_PROTECTION功能标志设置为 时True,explore_json 端点中的 GET 将被禁用,后端不执行任何操作并以405 MethodNotAllowed响应,POST被允许
"ENABLE_TEMPLATE_PROCESSING": True, # 开启模板处理
"DASHBOARD_CACHE": True, # 面板缓存开启
"TAGGING_SYSTEM": True, # 启用该功能后,所有者可以标记图表
# 图表添加标签
# 面板添加标签
# 请注意,标签可以自动完成。不是所有者的用户将看到不可编辑的标签。单击标签时,用户将被带到包含该标签内容的页面
"SQLLAB_BACKEND_PERSISTENCE": False, # 启用后端持久性
# 如果SQLLAB_BACKEND_PERSISTENCE启用,我们将向浏览器发送包含用户整个查询历史记录的负载。此 PR 更改了行为,以便仅发送与现有查询编辑器关联的查询。
# 当功能标志SQLLAB_BACKEND_PERSISTENCE设置为 False 时,用户数据当前保存在本地存储中。最近的更改将角色信息添加到引导用户对象。但是在关闭标志后,本地存储数据最终会覆盖引导程序数据,并且角色丢失,从而导致页面中断。此更改从本地存储中删除用户数据,并防止它覆盖引导程序数据。
"LISTVIEWS_DEFAULT_CARD_VIEW": True, # 开启卡片显示
"ENABLE_REACT_CRUD_VIEWS": True, # 为所有FAB视图(列表、编辑、显示)启用替换React视图
# 使用https://github.com/apache/superset/issues/8976 # (SIP-34)中引入的# designs,为所有FAB视图(列表、编辑、显示)启用替换React视图。这是一项正在进行的工作,因此并不是FAB中所有可用的功能都已经实现了
"DISPLAY_MARKDOWN_HTML": True, # 启用时 HTML 会转义
# 添加功能标志(使用默认值设置以匹配当前行为)(a)可以转义/显示 HTML 代码,或(b)隐藏 HTML 标记的输出
"ESCAPE_MARKDOWN_HTML": True, # HTML 会隐藏
# ESCAPE_MARKDOWN_HTML启用时 HTML 会转义,而当 DISPLAY_MARKDOWN_HTML 关闭时 HTML 会隐藏。
"DASHBOARD_NATIVE_FILTERS": True, # 开启过滤级联
"DASHBOARD_CROSS_FILTERS": True, # 开启交叉过滤器
"DASHBOARD_NATIVE_FILTERS_SET": True, # 面板过滤集
"GLOBAL_ASYNC_QUERIES": True, # 全局异步查询
"VERSIONED_EXPORT": True, # 版本化导出
# 使用版本化导出时,您可以指定是否要覆盖导入的元素。(例如仪表板、图表、数据集、数据库) 但是,此选项仅适用于导入级别(例如,仅覆盖仪表板但不会覆盖更新的图表)。更新仪表板时,通常会触及数据集、图表和仪表板的组合(例如,新字段、图表中的字段、仪表板中的更多空间用于其他字段),因此每个级别的覆盖选项可能会很有用。 目前,您必须仅对修改过的元素进行 3 次导出(因此在导出前过滤图表/数据集),然后再一次一次地重新导入它们。
"ROW_LEVEL_SECURITY": True, # 开启行级安全目录
"ALERT_REPORTS": True, # 开启警报报警目录
"OMNIBAR": True, # 开启下拉可见性和键盘命令
# 使用 cmd/ctrl + k就可以看见了
# 具体修改可以查看这里,传送门:https://github.com/apache/superset/pull/16168
"DASHBOARD_RBAC": False, # RBAC控制面板
"ENABLE_EXPLORE_DRAG_AND_DROP": False, # 拖放查询面板的 POC。范围 - 仅限 Groupby 控制 默认关闭
"ALERTS_ATTACH_REPORTS": True, # 开启警报可以发送附件
TALISMAN_ENABLED: Talisman
图表下钻
1.3.0之后的版本的实现思路( https://github.com/askstylo/superset/pull/1)
我所配置的superset_config.py:
from superset.typing import CacheConfig
# 源数据库配置
SQLALCHEMY_DATABASE_URI = 'mysql://root:123456@localhost/superset_meta?charset=utf8'
# 汉化
BABEL_DEFAULT_LOCALE = "zh"
# Superset specific config
ROW_LIMIT = 5000
# SUPERSET_WEBSERVER_PORT = 8088
# Flask App Builder configuration
# Your App secret key
SECRET_KEY = '\2\1thisismyscretkey\1\2\e\y\y\h'
# Flask-WTF flag for CSRF
# 通过Flask-WTF来保护表单免受CSRF攻击
WTF_CSRF_ENABLED = True
# Add endpoints that need to be exempt from CSRF protection
WTF_CSRF_EXEMPT_LIST = []
# A CSRF token that expires in 1 year
WTF_CSRF_TIME_LIMIT = 60 * 60 * 24 * 365
# Set this API key to enable Mapbox visualizations
MAPBOX_API_KEY = ''
# 全局异步查询秘钥
GLOBAL_ASYNC_QUERIES_JWT_SECRET = "test-secret-change-me-new-key-added-later"
# 配置
FEATURE_FLAGS = {
# 缩略图配置
"THUMBNAILS": True,
"THUMBNAILS_SQLA_LISTENERS": True,
# 面板缓存
"DASHBOARD_CACHE": True,
# 全局异步查询
"GLOBAL_ASYNC_QUERIES": True,
# 动态插件
"DYNAMIC_PLUGINS": True,
# 交叉过滤
"DASHBOARD_NATIVE_FILTERS": True, # 开启过滤级联
"DASHBOARD_CROSS_FILTERS": True, # 开启交叉过滤器
"DASHBOARD_NATIVE_FILTERS_SET": True, # 过滤集
# 开启导入 导出??功能
"VERSIONED_EXPORT": True,
# 警报和报告
"ALERT_REPORTS": True,
}
# 缓存配置
CACHE_CONFIG = {
'CACHE_TYPE': 'redis',
'CACHE_DEFAULT_TIMEOUT': 60 * 60 * 24,
'CACHE_KEY_PREFIX': 'superset_',
'CACHE_REDIS_HOST': 'localhost',
'CACHE_REDIS_PORT': 6379,
'CACHE_REDIS_DB': 0,
'CACHE_REDIS_URL': 'redis://localhost:6379/0'
}
DATA_CACHE_CONFIG = {
'CACHE_TYPE': 'redis',
'CACHE_DEFAULT_TIMEOUT': 60 * 60 * 24, # 1 day default (in secs)
'CACHE_KEY_PREFIX': 'superset_results',
'CACHE_REDIS_URL': 'redis://localhost:6379/0',
}
# Async selenium thumbnail task will use the following user
# 缩略图
THUMBNAIL_SELENIUM_USER = "admin"
THUMBNAIL_CACHE_CONFIG: CacheConfig = {
'CACHE_TYPE': 'redis',
'CACHE_DEFAULT_TIMEOUT': 24 * 60 * 60 * 7,
'CACHE_KEY_PREFIX': 'thumbnail_',
'CACHE_NO_NULL_WARNING': True,
'CACHE_REDIS_URL': 'redis://localhost:6379/0'
}
# celery配置
class CeleryConfig(object):
BROKER_URL = "redis://localhost:6379/0"
CELERY_IMPORTS = ("superset.sql_lab", "superset.tasks", "superset.tasks.thumbnails",)
CELERY_RESULT_BACKEND = "redis://localhost:6379/0"
CELERYD_PREFETCH_MULTIPLIER = 10
CELERY_ACKS_LATE = True
CELERY_CONFIG = CeleryConfig
# 解决SQL异步查询错误
from cachelib.redis import RedisCache
RESULTS_BACKEND = RedisCache(
host='localhost', port=6379, key_prefix='superset_results')
# 缩略图所需
WEBDRIVER_TYPE = "chrome"
# for older versions this was EMAIL_REPORTS_WEBDRIVER = "chrome"
WEBDRIVER_OPTION_ARGS = [
"--force-device-scale-factor=2.0",
"--high-dpi-support=2.0",
"--headless",
"--disable-gpu",
"--disable-dev-shm-usage",
"--no-sandbox",
"--disable-setuid-sandbox",
"--disable-extensions",
]
# The base URL to query for accessing the user interface
WEBDRIVER_BASEURL = "http://localhost:8001/"
注意对于缩略图
:
pip install selenium
安装chromedriver
这是在Windows上执行的,在Ubuntu应该类似,我是直接把之前Windows上的项目复制到Ubuntu中的。
打开虚拟环境下Lib/site-packages/superset/config.py,设置BABEL_DEFAULT_LOCALE = “zh”
BABEL_DEFAULT_LOCALE = "zh"
在Anaconda3\install\envs\superset_env\Lib\site-packages\superset文件夹下面:
(superset_env) D:\Anaconda3\install\envs\superset_env\Lib\site-packages\superset>pybabel compile -d translations
重启文件你会发现大多数都已经汉化了,但是并不完全汉化
这时候你要编辑: D:\Anaconda3\install\envs\superset_env\Lib\site-packages\flask_appbuilder\translations\zh\LC_MESSAGES文件下的po文件,在flask_appbuilder文件下执行命令
pybabel compile -d translations
复制Gamma的权限,重命名为Public
再添加一个all datasource access on all_datasource_access
即可
其中
如果让dashboard左侧出现筛选器,dashboard可交叉筛选,将以下三个参数设置为True
# TODO CWJ
"DASHBOARD_NATIVE_FILTERS": True, # 开启过滤级联
"DASHBOARD_CROSS_FILTERS": True, # 开启交叉过滤器
"DASHBOARD_NATIVE_FILTERS_SET": True, # 过滤集
注意dashboard可交叉筛选是由 DASHBOARD_CROSS_FILTERS 控制,设置了这个后,还需要在要实现交叉筛选的chart上,
勾选上EMIT DASHBOARD CROSS FILTERS,并不是所有图表都有这个选项的,饼图、table、雷达图上有,其他有没有没细看,
只有有的才能交叉筛选
交叉筛选效果见https://www.cnblogs.com/datawalkman/p/15131350.html的图
日志时间差8个小时
superset\models\core.py
修改dttm
# TODO CWJ
# dttm = Column(DateTime, default=datetime.utcnow)
dttm = Column(DateTime, default=datetime.now)
pip install flower
celery --app=superset.tasks.celery_app:app worker --pool=prefork -O fair -c 4
celery --app=superset.tasks.celery_app:app flower
错误:Results backend needed for asynchronous queries is not configured.
因为我已经在前面设置缩略图时配置过celery
和Redis
,所以这个错误很奇怪。
解决:
在web中先开启MySQL的异步查询
然后在celery配置后面添加
from cachelib.redis import RedisCache
RESULTS_BACKEND = RedisCache(
host='localhost', port=6379, key_prefix='superset_results')
# 如果配置全局异步查询,重新添加为多余32位
"GLOBAL_ASYNC_QUERIES": True,
# 全局异步查询秘钥
GLOBAL_ASYNC_QUERIES_JWT_SECRET = "test-secret-change-me-new-key-added-later"