datax datax-web forGreenplum安装详细过程

1.datax for greenplum安装

下载地址
https://github.com/HashDataInc/DataX

安装准备

安装mevan
下载地址1:https://maven.apache.org/download.cgi

安装包版本3.5.4,下载二进制包,解压即可使用
下载地址2:

wget https://mirrors.cnnic.cn/apache/maven/maven-3/3.5.4/binaries/apache-maven-3.5.4-bin.tar.gz --no-check-certificate

2)解压安装maven软件包

tar -xf apache-maven-3.5.4-bin.tar.gz 
mv apache-maven-3.5.4 /usr/local/maven
ln -s /usr/local/maven/bin/mvn  /usr/bin/mvn    # 与jenkins联合使用时,jenkins会到/usr/bin/下找mvn命令,如果没有回报错
ll /usr/local/maven/
ll /usr/bin/mvn

3)配置环境变量

echo " ">>/etc/profile
echo "# Made for mvn env by zhaoshuai on $(date +%F)">>/etc/profile
echo 'export MAVEN_HOME=/usr/local/maven'>>/etc/profile
echo 'export PATH=$MAVEN_HOME/bin:$PATH'>>/etc/profile
tail -4 /etc/profile
source /etc/profile
echo $PATH

4)查看安装的mvn版本号

which mvn
mvn -version

至此maven安装完成

开始安装源码版本

目录结构!!!这个是源码版本,因此目录结构不一样

adswriter                      elasticsearchwriter  hbase094xwriter    hdfsreader         mongodbreader  odpswriter    otsstreamreader                   postgresqlreader  rpm              txtfilereader
common                         ftpreader            hbase11xreader     hdfswriter         mongodbwriter  oraclereader  otswriter                         postgresqlwriter  sqlserverreader  txtfilewriter
core                           ftpwriter            hbase11xsqlwriter  images             mysqlreader    oraclewriter  package.xml                       rdbmsreader       sqlserverwriter  userGuid.md
datax-opensource-dingding.png  gpdbjsonwriter       hbase11xwriter     introduction.md    mysqlwriter    ossreader     plugin-rdbms-util                 rdbmswriter       streamreader
drdsreader                     gpdbwriter           hbasereader        license.txt        ocswriter      osswriter     plugin-unstructured-storage-util  README            streamwriter
drdswriter                     hbase094xreader      hbasewriter        mongodbjsonreader  odpsreader     otsreader     pom.xml                           README.md         transformer

编译安装:

mvn -U clean package assembly:assembly -Dmaven.test.skip=true

最后结果

[WARNING] Assembly file: /app/DataX/target/datax-v1.0.4-hashdata is not a regular file (it may be a directory). It cannot be attached to the project build for installation or deployment.
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] datax-all v1.0.4-hashdata .......................... SUCCESS [ 11.003 s]
[INFO] datax-common ....................................... SUCCESS [01:33 min]
[INFO] datax-transformer .................................. SUCCESS [ 47.629 s]
[INFO] datax-core ......................................... SUCCESS [ 26.107 s]
[INFO] plugin-rdbms-util .................................. SUCCESS [  8.208 s]
[INFO] mysqlreader ........................................ SUCCESS [  0.990 s]
[INFO] sqlserverreader .................................... SUCCESS [  3.124 s]
[INFO] streamreader ....................................... SUCCESS [  5.794 s]
[INFO] mysqlwriter ........................................ SUCCESS [  0.730 s]
[INFO] streamwriter ....................................... SUCCESS [  0.582 s]
[INFO] sqlserverwriter .................................... SUCCESS [  0.715 s]
[INFO] gpdbwriter ......................................... SUCCESS [  2.225 s]
[INFO] plugin-unstructured-storage-util v1.0.4-hashdata ... SUCCESS [01:39 min]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 05:11 min
[INFO] Finished at: 2020-12-24T11:01:03+08:00
[INFO] ------------------------------------------------------------------------
找到目录

打包成功后的DataX包位于 {DataX_source_code_home}/target/datax-v1.0.4-hashdata/datax/ ,结构如下:

这个与官网文档不一样,该目录位置在打包成功后的提示文档中!!注意查找!

[root@ares datax]# ls /app/DataX/target/datax-v1.0.4-hashdata
datax
[root@ares datax]# ls /app/DataX/target/datax-v1.0.4-hashdata/datax/
bin  conf  job  lib  plugin  script  tmp
自检脚本

自检脚本: python {YOUR_DATAX_HOME}/bin/datax.py {YOUR_DATAX_HOME}/job/job.json

python /app/DataX/target/datax-v1.0.4-hashdata/datax/bin/datax.py /app/datax/job/job.json

我这里的job.json 是用的一键安装版的json,目的仅仅是测试下功能,他自带的那个job.json目的不明

2020-12-24 11:13:58.859 [main] WARN  Engine - prioriy set to 0, because NumberFormatException, the value is: null
2020-12-24 11:13:58.862 [main] INFO  PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0
2020-12-24 11:13:58.862 [main] INFO  JobContainer - DataX jobContainer starts job.
2020-12-24 11:13:58.865 [main] INFO  JobContainer - Set jobId = 0
2020-12-24 11:13:58.890 [job-0] INFO  JobContainer - jobContainer starts to do prepare ...
2020-12-24 11:13:58.890 [job-0] INFO  JobContainer - DataX Reader.Job [streamreader] do prepare work .
2020-12-24 11:13:58.891 [job-0] INFO  JobContainer - DataX Writer.Job [streamwriter] do prepare work .
2020-12-24 11:13:58.891 [job-0] INFO  JobContainer - jobContainer starts to do split ...
2020-12-24 11:13:58.893 [job-0] INFO  JobContainer - Job set Max-Byte-Speed to 10485760 bytes.
2020-12-24 11:13:58.894 [job-0] INFO  JobContainer - DataX Reader.Job [streamreader] splits to [1] tasks.
2020-12-24 11:13:58.895 [job-0] INFO  JobContainer - DataX Writer.Job [streamwriter] splits to [1] tasks.
2020-12-24 11:13:58.924 [job-0] INFO  JobContainer - jobContainer starts to do schedule ...
2020-12-24 11:13:58.930 [job-0] INFO  JobContainer - Scheduler starts [1] taskGroups.
2020-12-24 11:13:58.933 [job-0] INFO  JobContainer - Running by standalone Mode.
2020-12-24 11:13:58.944 [taskGroup-0] INFO  TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
2020-12-24 11:13:58.950 [taskGroup-0] INFO  Channel - Channel set byte_speed_limit to -1, No bps activated.
2020-12-24 11:13:58.950 [taskGroup-0] INFO  Channel - Channel set record_speed_limit to -1, No tps activated.
2020-12-24 11:13:58.966 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
2020-12-24 11:13:59.067 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[102]ms
2020-12-24 11:13:59.068 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] completed it's tasks.
2020-12-24 11:14:08.958 [job-0] INFO  StandAloneJobContainerCommunicator - Total 100000 records, 2600000 bytes | Speed 253.91KB/s, 10000 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.051s |  All Task WaitReaderTime 0.065s | Percentage 100.00%
2020-12-24 11:14:08.958 [job-0] INFO  AbstractScheduler - Scheduler accomplished all tasks.
2020-12-24 11:14:08.959 [job-0] INFO  JobContainer - DataX Writer.Job [streamwriter] do post work.
2020-12-24 11:14:08.960 [job-0] INFO  JobContainer - DataX Reader.Job [streamreader] do post work.
2020-12-24 11:14:08.960 [job-0] INFO  JobContainer - DataX jobId [0] completed successfully.
2020-12-24 11:14:08.961 [job-0] INFO  HookInvoker - No hook invoked, because base dir not exists or is a file: /app/DataX/target/datax-v1.0.4-hashdata/datax/hook
2020-12-24 11:14:08.963 [job-0] INFO  JobContainer - 
         [total cpu info] => 
                averageCpu                     | maxDeltaCpu                    | minDeltaCpu                    
                -1.00%                         | -1.00%                         | -1.00%
                        

         [total gc info] => 
                 NAME                 | totalGCCount       | maxDeltaGCCount    | minDeltaGCCount    | totalGCTime        | maxDeltaGCTime     | minDeltaGCTime     
                 PS MarkSweep         | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s             
                 PS Scavenge          | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s             

2020-12-24 11:14:08.964 [job-0] INFO  JobContainer - PerfTrace not enable!
2020-12-24 11:14:08.964 [job-0] INFO  StandAloneJobContainerCommunicator - Total 100000 records, 2600000 bytes | Speed 253.91KB/s, 10000 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.051s |  All Task WaitReaderTime 0.065s | Percentage 100.00%
2020-12-24 11:14:08.965 [job-0] INFO  JobContainer - 
任务启动时刻                    : 2020-12-24 11:13:58
任务结束时刻                    : 2020-12-24 11:14:08
任务总计耗时                    :                 10s
任务平均流量                    :          253.91KB/s
记录写入速度                    :          10000rec/s
读出记录总数                    :              100000
读写失败总数                    :                   0

自带json内容如下

{
    "job": {
        "setting": {
            "speed": {
                "byte": 1048576
            }
        },
        "content": [
            {
                "reader": {
                    "name": "sqlserverreader",
                    "parameter": {
                        //  数据库连接用户名
                        "username": "ReadOnly01",
                        //  数据库连接密码
                        "password": "1qaz!QAZ",
                        "column": [
                            "id"
                        ],
                        // "splitPk": "db_id",
                        "connection": [
                            {
                                "table": [
                                    "table"
                                ],
                                "jdbcUrl": [
                                    "jdbc:sqlserver://192.168.0.65;DatabaseName=MyCost_Erp352"
                                ]
                            }
                        ]
                    }
                },
                "writer": {
                    "name": "sqlserverwriter",
                    "parameter": {
                        "username": "root",
                        "password": "root",
                        "column": [
                            "db_id",
                            "db_type",
                            "db_ip",
                            "db_port",
                            "db_role",
                            "db_name",
                            "db_username",
                            "db_password",
                            "db_modify_time",
                            "db_modify_user",
                            "db_description",
                            "db_tddl_info"
                        ],
                        "connection": [
                            {
                                "table": [
                                    "db_info_for_writer"
                                ],
                                "jdbcUrl": "jdbc:sqlserver://[HOST_NAME]:PORT;DatabaseName=[DATABASE_NAME]"
                            }
                        ],
                        "preSql": [
                            "delete from @table where db_id = -1;"
                        ],
                        "postSql": [
                            "update @table set db_modify_time = now() where db_id = 1;"
                        ]
                    }
                }
            }
        ]
    }

2.datax-web安装

项目地址:

https://github.com/WeiYe-Jing/datax-web

我这里采用的是一键部署安装

https://pan.baidu.com/s/13yoqhGpD00I82K4lOYtQhg 提取码:cpsk

解压后目录格式如下:

[root@ares app]# ls datax-web-2.1.2  
bin  modules  packages  README.md  userGuid.md

以下为全文转载

开始部署

1)解压安装包

在选定的安装目录,解压安装包

tar -zxvf datax-web-{VERSION}.tar.gz

2)执行一键安装脚本

进入解压后的目录,找到bin目录下面的install.sh文件,如果选择交互式的安装,则直接执行

./bin/install.sh

在交互模式下,对各个模块的package压缩包的解压以及configure配置脚本的调用,都会请求用户确认,可根据提示查看是否安装成功,如果没有安装成功,可以重复尝试; 如果不想使用交互模式,跳过确认过程,则执行以下命令安装

./bin/install.sh --force

3)数据库初始化

如果你的服务上安装有mysql命令,在执行安装脚本的过程中则会出现以下提醒:

Scan out mysql command, so begin to initalize the database
Do you want to initalize database with sql: [{INSTALL_PATH}/bin/db/datax-web.sql]? (Y/N)y
Please input the db host(default: 127.0.0.1): 
Please input the db port(default: 3306): 
Please input the db username(default: root): 
Please input the db password(default: ): 
Please input the db name(default: exchangis)

按照提示输入数据库地址,端口号,用户名,密码以及数据库名称,大部分情况下即可快速完成初始化。 如果服务上并没有安装mysql命令,则可以取用目录下/bin/db/datax-web.sql脚本去手动执行,完成后修改相关配置文件

vi ./modules/datax-admin/conf/bootstrap.properties
#Database
#DB_HOST=
#DB_PORT=
#DB_USERNAME=
#DB_PASSWORD=
#DB_DATABASE=

按照具体情况配置对应的值即可。

4) 配置

安装完成之后,

在项目目录: /modules/datax-admin/bin/env.properties 配置邮件服务(可跳过)

MAIL_USERNAME=""
MAIL_PASSWORD=""

此文件中包括一些默认配置参数,例如:server.port,具体请查看文件。

在项目目录下/modules/datax-execute/bin/env.properties 指定PYTHON_PATH的路径 非常重要!!!!
/app/DataX/target/datax-v1.0.4-hashdata/datax/bin/datax.py

vim /app/datax-web-2.1.2/modules/datax-executor/bin/env.properties

vi ./modules/{module_name}/bin/env.properties

### 执行datax的python脚本地址
PYTHON_PATH=

### 保持和datax-admin服务的端口一致;默认是9527,如果没改datax-admin的端口,可以忽略
DATAX_ADMIN_PORT=

此文件中包括一些默认配置参数,例如:executor.port,json.path,data.path等,具体请查看文件。

5)启动服务

- 一键启动所有服务
./bin/start-all.sh

中途可能发生部分模块启动失败或者卡住,可以退出重复执行,如果需要改变某一模块服务端口号,则:

vi ./modules/{module_name}/bin/env.properties

找到SERVER_PORT配置项,改变它的值即可。 当然也可以单一地启动某一模块服务:

./bin/start.sh -m {module_name}
- 一键取消所有服务
./bin/stop-all.sh

当然也可以单一地停止某一模块服务:

./bin/stop.sh -m {module_name}

6)查看服务(注意!注意!)

在Linux环境下使用JPS命令,查看是否出现DataXAdminApplication和DataXExecutorApplication进程,如果存在这表示项目运行成功

如果项目启动失败,请检查启动日志:modules/datax-admin/bin/console.out或者modules/datax-executor/bin/console.out

Tips: 脚本使用的都是bash指令集,如若使用sh调用脚本,可能会有未知的错误

7)运行

部署完成后,在浏览器中输入 http://ip:port/index.html 就可以访问对应的主界面(ip为datax-admin部署所在服务器ip,port为为datax-admin 指定的运行端口)

输入用户名 admin 密码 123456 就可以直接访问系统

8) 运行日志

部署完成之后,在modules/对应的项目/data/applogs下(用户也可以自己指定日志,修改application.yml 中的logpath地址即可),用户可以根据此日志跟踪项目实际启动情况

如果执行器启动比admin快,执行器会连接失败,日志报"拒绝连接"的错误,一般是先启动admin,再启动executor,30秒之后会重连,如果成功请忽略这个异常。

访问datax-web 记住务必加/index.html

http://172.18.1.25:9527/index.html

不加报错!

http://192.168.10.227:9527/

Whitelabel Error Page

This application has no explicit mapping for /error, so you are seeing this as a fallback.
Thu Dec 24 11:36:39 CST 2020
There was an unexpected error (type=Forbidden, status=403).
Access Denied
图片.png

!!!未完之配置,邮件设置!!!

源码安装datax-web 非一键部署方式

文件目录

[root@ares datax-web-master]# ls /app/datax-web-master
bin  datax-admin  datax-assembly  datax-core  datax-executor  datax-rpc  doc  LICENSE  pom.xml  README.md  userGuid.md

执行打包,耗时较长,网速相关!
mvn clean install 
[INFO] Building tar : /app/datax-web-master/build/datax-web-2.1.2.tar.gz
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] datax-web 2.1.2 .................................... SUCCESS [ 20.613 s]
[INFO] datax-rpc .......................................... SUCCESS [06:09 min]
[INFO] datax-core ......................................... SUCCESS [06:23 min]
[INFO] datax-admin ........................................ SUCCESS [44:46 min]
[INFO] datax-executor ..................................... SUCCESS [ 21.653 s]
[INFO] datax-assembly 2.1.2 ............................... SUCCESS [ 13.877 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 58:34 min
[INFO] Finished at: 2020-12-24T15:06:53+08:00
[INFO] ------------------------------------------------------------------------

1.linux环境部署

linux环境部署

2.开发环境部署(或参考文档 Debug)

2.1 创建数据库

执行bin/db下面的datax_web.sql文件(注意老版本更新语句有指定库名)

2.2 修改项目配置

1.修改datax_admin下resources/application.yml文件

#数据源
  datasource:
    username: root
    password: root
    url: jdbc:mysql://localhost:3306/datax_web?serverTimezone=Asia/Shanghai&useLegacyDatetimeCode=false&useSSL=false&nullNamePatternMatchesAll=true&useUnicode=true&characterEncoding=UTF-8
    driver-class-name: com.mysql.jdbc.Driver

修改数据源配置,目前仅支持mysql

# 配置mybatis-plus打印sql日志
logging:
  level:
    com.wugui.datax.admin.mapper: error
  path: ./data/applogs/admin

修改日志路径path

  # datax-web email
  mail:
    host: smtp.qq.com
    port: 25
    username: [email protected]
    password: xxx
    properties:
      mail:
        smtp:
          auth: true
          starttls:
            enable: true
            required: true
        socketFactory:
          class: javax.net.ssl.SSLSocketFactory

修改邮件发送配置(不需要可以不修改)

2.修改datax_executor下resources/application.yml文件

# log config
logging:
  config: classpath:logback.xml
  path: ./data/applogs/executor/jobhandler

修改日志路径path

datax:
  job:
    admin:
      ### datax-web admin address
      addresses: http://127.0.0.1:8080
    executor:
      appname: datax-executor
      ip:
      port: 9999
      ### job log path
      logpath: ./data/applogs/executor/jobhandler
      ### job log retention days
      logretentiondays: 30
  executor:
    jsonpath: /Users/mac/data/applogs

  pypath: /Users/mac/tools/datax/bin/datax.py

修改datax.job配置

  • admin.addresses datax_admin部署地址,如调度中心集群部署存在多个地址则用逗号分隔,执行器将会使用该地址进行"执行器心跳注册"和"任务结果回调";
  • executor.appname 执行器AppName,每个执行器机器集群的唯一标示,执行器心跳注册分组依据;
  • executor.ip 默认为空表示自动获取IP,多网卡时可手动设置指定IP,该IP不会绑定Host仅作为通讯实用;地址信息用于 "执行器注册" 和 "调度中心请求并触发任务";
  • executor.port 执行器Server端口号,默认端口为9999,单机部署多个执行器时,注意要配置不同执行器端口;
  • executor.logpath 执行器运行日志文件存储磁盘路径,需要对该路径拥有读写权限;
  • executor.logretentiondays 执行器日志文件保存天数,过期日志自动清理, 限制值大于等于3时生效; 否则, 如-1, 关闭自动清理功能;
  • executor.jsonpath datax json临时文件保存路径
  • pypath DataX启动脚本地址,例如:xxx/datax/bin/datax.py
    如果系统配置DataX环境变量(DATAX_HOME),logpath、jsonpath、pypath可不配,log文件和临时json存放在环境变量路径下。

四、启动项目

1.本地idea开发环境

  • 1.运行datax_admin下 DataXAdminApplication
  • 2.运行datax_executor下 DataXExecutorApplication
image

admin启动成功后日志会输出三个地址,两个接口文档地址,一个前端页面地址

五、启动成功

启动成功后打开页面(默认管理员用户名:admin 密码:123456)
http://localhost:8080/index.html#/dashboard

image

邮件老是弄不对,可能格式有问题,放弃了!!!!!!

你可能感兴趣的:(datax datax-web forGreenplum安装详细过程)