Great Expectations

在这里插入图片描述

Great Expectations

    • 1.Introduction
      • 1.1 Introduction
      • 1.2 SLACK
      • 1.3 Integrations
      • 1.4 What does Great Expectations not do
    • 2.Demo File
      • 2.1 Connect DB
      • 2.2 create expectation
    • 3.RDB
      • 3.1 Connect DB
      • 3.2 create expectation
      • 3.3 Run on Jupyter Notebook
    • 4.Waken


Great Expectations_第1张图片

Great Expectations_第2张图片

Great Expectations_第3张图片
Great Expectations_第4张图片
Greate Expectation Website: https://greatexpectations.io/.
Greate Expectation Document: https://docs.greatexpectations.io/docs/.
Greate Expectation Github: https://github.com/great-expectations/great_expectations.

Great Expectations_第5张图片
Great Expectations_第6张图片


1.Introduction

1.1 Introduction

  • 一共三点:验证,记录和分析。
    Great Expectations_第7张图片

1.2 SLACK

  • 可以进slack社区提问。
    Great Expectations_第8张图片

1.3 Integrations

Integration Notes
Pandas Great for in-memory machine learning pipelines!
Great Expectations_第9张图片 Spark Good for really big data.
Great Expectations_第10张图片 Postgres Leading open source database
BigQuery Google serverless massive-scale SQL analytics platform
Great Expectations_第11张图片 Databricks Managed Spark Analytics Platform
MySQL Leading open source database
Great Expectations_第12张图片 AWS Redshift Cloud-based data warehouse
Great Expectations_第13张图片 AWS S3 Cloud based blob storage
Great Expectations_第14张图片 Snowflake Cloud-based data warehouse
Great Expectations_第15张图片 Apache Airflow An open source orchestration engine
Prefect An open source workflow management system
Other SQL Relational DBs Most RDBMS are supported via SQLalchemy
Jupyter Notebooks The best way to build Expectations
Great Expectations_第16张图片 Slack Get automatic data quality notifications!

1.4 What does Great Expectations not do

Great Expectations_第17张图片

2.Demo File

  • install great_expectation
    sudo pip3 install great_expectations
  • 看看安装路径
    sudo python3 -m site
    /usr/local/python3/binGreat Expectations_第18张图片
  • 查询版号
    /usr/local/python3/bin/great_expectations --version
  • 初始化init
    /usr/local/python3/bin/great_expectations init
    Great Expectations_第19张图片
  • 创建软连
    sudo ln -s /usr/local/python3/bin/great_expectations /usr/bin/great_expectations

2.1 Connect DB

  • 初始化init
    /usr/local/python3/bin/great_expectations datasource new --no-jupyter
    enter option 1
    enter option 1 Great Expectations_第20张图片
    Great Expectations_第21张图片
    Great Expectations_第22张图片

2.2 create expectation

  • 另开窗口,继续执行
    /usr/local/python3/bin/great_expectations suite new
  • select
    enter option 3
    enter option 1Great Expectations_第23张图片
    Great Expectations_第24张图片
  • Enter the file name
    Name the new Expectation Suite [yellow_tripdata_sample_2019-01.csv.warning]: getting_started_expectation_suite_taxi.demo
    在这里插入图片描述
  • open Jupyter Notebook
    直接访问提示的地址,进入Jupyter Notebook
  • run validate
    /usr/local/python3/bin/great_expectations checkpoint new my_checkpoint

3.RDB

3.1 Connect DB

  • install great_expectation
    sudo pip3 install great_expectations
  • 看看安装路径
    sudo python3 -m site
    /usr/local/python3/binGreat Expectations_第25张图片
  • 查询版号
    /usr/local/python3/bin/great_expectations --version
  • 初始化init
    /usr/local/python3/bin/great_expectations init
    Great Expectations_第26张图片
  • 连接DB
    /usr/local/python3/bin/great_expectations datasource new --no-jupyter
    enter option 2
    enter option 1 =>我用的mysql
    在这里插入图片描述
  • 我是用python3,所以要手动执行
    sudo pip3 install psycopg2-binary
    在这里插入图片描述
  • 重新执行上一步
    /usr/local/python3/bin/great_expectations datasource new --no-jupyter
    在这里插入图片描述
  • 按照提示继续
    jupyter notebook /home/os-nan.zhao/great_expectations/uncommitted/datasource_new.ipynb --allow-root --ip 0.0.0.0
    在这里插入图片描述
  • 浏览器访问红框中的地址
    在这里插入图片描述
  • 将token输入,enter new password
    datahub@123在这里插入图片描述
  • 点进datasource_new.ipynd
    sudo pip3 install pymysql
    sudo pip3 install pymssqlGreat Expectations_第27张图片
#----第三步----
host = "YOUR_HOST"
port = "3306"/"1433"
username = "YOUR_USERNAME"
password = "YOUR_PASSWORD"
database = "YOUR_DATABASE"
schema_name = "YOUR_SCHEMA"


#----第四步----
example_yaml = f"""
name: {datasource_name}
class_name: Datasource
execution_engine:
  class_name: SqlAlchemyExecutionEngine
  credentials:
    host: {host}
    port: '{port}'
    username: {username}
    password: {password}
    database: {database}
    schema_name: {schema_name}
    drivername: mysql+pymysql/mssql+pymssql    =>对应前面的port
data_connectors:
  default_runtime_data_connector_name:
    class_name: RuntimeDataConnector
    batch_identifiers:
      - default_identifier_name
  default_inferred_data_connector_name:
    class_name: InferredAssetSqlDataConnector
    include_schema_name: True"""
print(example_yaml)

3.2 create expectation

  • 另开窗口,继续执行
    /usr/local/python3/bin/great_expectations suite new
  • select 2
    enter option 2
    在这里插入图片描述
  • Index of the table of which you want to create the suite
    enter option 10
  • Enter the file name
    demo01
    在这里插入图片描述
  • 晕死,没有开8889的port
    这个datahub,真难提前开好所有port
  • 编辑
    /usr/local/python3/bin/great_expectations suite edit --no-jupyter
    jupyter notebook /great_expectations/uncommitted/edit_.ipynb --allow-root --ip 0.0.0.0
  • 执行
    /usr/local/python3/bin/great_expectations checkpoint new --no-jupyter
  • next
    jupyter notebook /great_expectations/uncommitted/edit_checkpoint_.ipynb --allow-root --ip 0.0.0.0

3.3 Run on Jupyter Notebook

4.Waken

         在一秒钟内看到本质的人和花半辈子也看不清一件事本质的人,自然是不一样的命运。
在这里插入图片描述

你可能感兴趣的:(大数据,大数据)