6、NIFI综合应用场景-离线同步Mysql数据到HDFS中

Apache NiFi系列文章

1、nifi-1.9.2介绍、单机部署及简单验证
2、NIFI应用示例-GetFile和PutFile应用
3、NIFI处理器介绍、FlowFlie常见属性、模板介绍和运行情况信息查看
4、集群部署及验证、监控及节点管理
5、NiFi FileFlow示例和NIFI模板示例
6、NIFI应用场景-离线同步Mysql数据到HDFS中
7、NIFI综合应用场景-将mysql查询出的json数据转换成txt后存储至HDFS中
8、NIFI综合应用场景-NiFi监控MySQL binlog进行实时同步到hive
9、NIFI综合应用场景-通过NIFI配置kafka的数据同步


文章目录

  • Apache NiFi系列文章
  • 一、实现流程
    • 1、模板
      • 1)、模板1
      • 2)、模板2
    • 2、处理器流程
      • 1)、模板1处理流程
      • 2)、模板2处理流程
  • 二、处理器说明
    • 1、QueryDatabaseTable
      • 1)、描述
      • 2)、属性配置
    • 2、ConvertAvroToJSON
      • 1)、描述
      • 2)、属性配置
    • 3、SplitJson
      • 1)、描述
      • 2)、属性配置
    • 4、PutHDFS
      • 1)、描述
      • 2)、属性配置
  • 三、操作
    • 1、创建组
    • 2、创建并配置QueryDatabaseTable
    • 3、创建并配置Mysql连接池
      • 1)、创建
      • 2)、配置
      • 3)、启动连接池
    • 4、创建并配置ConvertAvroToJSON
      • 1)、创建配置ConvertAvroToJSON
      • 2)、连接
      • 3)、负载均衡消费数据
    • 5、创建并配置SplitJson
    • 6、创建并配置PutHDFS
  • 四、验证
    • 1、启动QueryDatabaseTable,并查看队列中数据
    • 2、启动ConvertAvroToJSON,并查看队列中数据
    • 3、启动SplitJson,并查看队列中数据
    • 4、启动PutHDFS,并查看处理器接收和输出的数据
    • 5、查看HDFS数据


本文旨在说明将mysql数据同步至HDFS中,并进行验证。阅读本文前最好是阅读本系列的前面文章关于模板中的介绍。
本文的前提依赖是mysql环境有数据、hadoop、nifi、hive、hue环境是搭建好的。如果hue环境没有,则在hdfs中进行验证。
本文分为四部分,即实现流程、使用的处理器介绍、在nifi中操作和验证结果。

一、实现流程

1、模板

1)、模板1

该模板可能出现异常–在验证中有说明–不同的环境可能存在不同。


<template encoding-version="1.2">
    <description>将mysql中的数据导入到Hdfs中,并且使用lzo压缩方式。
存在重复的数据description>
    <groupId>2f7d3766-0186-1000-0000-00006e07b64agroupId>
    <name>MysqlToHDFSByLzoname>
    <snippet>
        <connections>
            <id>8bacaebe-bce0-31e8-0000-000000000000id>
            <parentGroupId>60d38136-211b-3d16-0000-000000000000parentGroupId>
            <backPressureDataSizeThreshold>1 GBbackPressureDataSizeThreshold>
            <backPressureObjectThreshold>10000backPressureObjectThreshold>
            <destination>
                <groupId>60d38136-211b-3d16-0000-000000000000groupId>
                <id>4cb1eb1d-ca3a-34e0-0000-000000000000id>
                <type>PROCESSORtype>
            destination>
            <flowFileExpiration>0 secflowFileExpiration>
            <labelIndex>1labelIndex>
            <loadBalanceCompression>COMPRESS_ATTRIBUTES_AND_CONTENTloadBalanceCompression>
            <loadBalancePartitionAttribute>loadBalancePartitionAttribute>
            <loadBalanceStatus>LOAD_BALANCE_INACTIVEloadBalanceStatus>
            <loadBalanceStrategy>ROUND_ROBINloadBalanceStrategy>
            <name>Q_Cname>
            <selectedRelationships>successselectedRelationships>
            <source>
                <groupId>60d38136-211b-3d16-0000-000000000000groupId>
                <id>c16280cc-6d1d-355c-0000-000000000000id>
                <type>PROCESSORtype>
            source>
            <zIndex>0zIndex>
        connections>
        <connections>
            <id>ce7dcdb2-bcd9-38a8-0000-000000000000id>
            <parentGroupId>60d38136-211b-3d16-0000-000000000000parentGroupId>
            <backPressureDataSizeThreshold>1 GBbackPressureDataSizeThreshold>
            <backPressureObjectThreshold>10000backPressureObjectThreshold>
            <destination>
                <groupId>60d38136-211b-3d16-0000-000000000000groupId>
                <id>26c8401a-8807-3771-0000-000000000000id>
                <type>PROCESSORtype>
            destination>
            <flowFileExpiration>0 secflowFileExpiration>
            <labelIndex>1labelIndex>
            <loadBalanceCompression>COMPRESS_ATTRIBUTES_AND_CONTENTloadBalanceCompression>
            <loadBalancePartitionAttribute>loadBalancePartitionAttribute>
            <loadBalanceStatus>LOAD_BALANCE_INACTIVEloadBalanceStatus>
            <loadBalanceStrategy>ROUND_ROBINloadBalanceStrategy>
            <name>S_Pname>
            <selectedRelationships>splitselectedRelationships>
            <source>
                <groupId>60d38136-211b-3d16-0000-000000000000groupId>
                <id>20f76bcb-e978-3263-0000-000000000000id>
                <type>PROCESSORtype>
            source>
            <zIndex>0zIndex>
        connections>
        <connections>
            <id>f5322759-8583-3753-0000-000000000000id>
            <parentGroupId>60d38136-211b-3d16-0000-000000000000parentGroupId>
            <backPressureDataSizeThreshold>1 GBbackPressureDataSizeThreshold>
            <backPressureObjectThreshold>10000backPressureObjectThreshold>
            <destination>
                <groupId>60d38136-211b-3d16-0000-000000000000groupId>
                <id>20f76bcb-e978-3263-0000-000000000000id>
                <type>PROCESSORtype>
            destination>
            <flowFileExpiration>0 secflowFileExpiration>
            <labelIndex>1labelIndex>
            <loadBalanceCompression>DO_NOT_COMPRESSloadBalanceCompression>
            <loadBalancePartitionAttribute>loadBalancePartitionAttribute>
            <loadBalanceStatus>LOAD_BALANCE_NOT_CONFIGUREDloadBalanceStatus>
            <loadBalanceStrategy>DO_NOT_LOAD_BALANCEloadBalanceStrategy>
            <name>C_Sname>
            <selectedRelationships>successselectedRelationships>
            <source>
                <groupId>60d38136-211b-3d16-0000-000000000000groupId>
                <id>4cb1eb1d-ca3a-34e0-0000-000000000000id>
                <type>PROCESSORtype>
            source>
            <zIndex>0zIndex>
        connections>
        <controllerServices>
            <id>55bee1a0-0b0c-3a63-0000-000000000000id>
            <parentGroupId>60d38136-211b-3d16-0000-000000000000parentGroupId>
            <bundle>
                <artifact>nifi-dbcp-service-narartifact>
                <group>org.apache.nifigroup>
                <version>1.9.2version>
            bundle>
            <comments>comments>
            <descriptors>
                <entry>
                    <key>Database Connection URLkey>
                    <value>
                        <name>Database Connection URLname>
                    value>
                entry>
                <entry>
                    <key>Database Driver Class Namekey>
                    <value>
                        <name>Database Driver Class Namename>
                    value>
                entry>
                <entry>
                    <key>database-driver-locationskey>
                    <value>
                        <name>database-driver-locationsname>
                    value>
                entry>
                <entry>
                    <key>kerberos-credentials-servicekey>
                    <value>
                        <identifiesControllerService>org.apache.nifi.kerberos.KerberosCredentialsServiceidentifiesControllerService>
                        <name>kerberos-credentials-servicename>
                    value>
                entry>
                <entry>
                    <key>Database Userkey>
                    <value>
                        <name>Database Username>
                    value>
                entry>
                <entry>
                    <key>Passwordkey>
                    <value>
                        <name>Passwordname>
                    value>
                entry>
                <entry>
                    <key>Max Wait Timekey>
                    <value>
                        <name>Max Wait Timename>
                    value>
                entry>
                <entry>
                    <key>Max Total Connectionskey>
                    <value>
                        <name>Max Total Connectionsname>
                    value>
                entry>
                <entry>
                    <key>Validation-querykey>
                    <value>
                        <name>Validation-queryname>
                    value>
                entry>
                <entry>
                    <key>dbcp-min-idle-connskey>
                    <value>
                        <name>dbcp-min-idle-connsname>
                    value>
                entry>
                <entry>
                    <key>dbcp-max-idle-connskey>
                    <value>
                        <name>dbcp-max-idle-connsname>
                    value>
                entry>
                <entry>
                    <key>dbcp-max-conn-lifetimekey>
                    <value>
                        <name>dbcp-max-conn-lifetimename>
                    value>
                entry>
                <entry>
                    <key>dbcp-time-between-eviction-runskey>
                    <value>
                        <name>dbcp-time-between-eviction-runsname>
                    value>
                entry>
                <entry>
                    <key>dbcp-min-evictable-idle-timekey>
                    <value>
                        <name>dbcp-min-evictable-idle-timename>
                    value>
                entry>
                <entry>
                    <key>dbcp-soft-min-evictable-idle-timekey>
                    <value>
                        <name>dbcp-soft-min-evictable-idle-timename>
                    value>
                entry>
            descriptors>
            <name>MySQL_ConnectionPoolname>
            <persistsState>falsepersistsState>
            <properties>
                <entry>
                    <key>Database Connection URLkey>
                    <value>jdbc:mysql://192.168.10.44:3306/test?characterEncoding=UTF-8&useSSL=false&allowPublicKeyRetrieval=truevalue>
                entry>
                <entry>
                    <key>Database Driver Class Namekey>
                    <value>com.mysql.jdbc.Drivervalue>
                entry>
                <entry>
                    <key>database-driver-locationskey>
                    <value>/usr/local/bigdata/testdata/mysql-connector-java-5.1.44.jarvalue>
                entry>
                <entry>
                    <key>kerberos-credentials-servicekey>
                entry>
                <entry>
                    <key>Database Userkey>
                    <value>rootvalue>
                entry>
                <entry>
                    <key>Passwordkey>
                entry>
                <entry>
                    <key>Max Wait Timekey>
                entry>
                <entry>
                    <key>Max Total Connectionskey>
                entry>
                <entry>
                    <key>Validation-querykey>
                entry>
                <entry>
                    <key>dbcp-min-idle-connskey>
                entry>
                <entry>
                    <key>dbcp-max-idle-connskey>
                entry>
                <entry>
                    <key>dbcp-max-conn-lifetimekey>
                entry>
                <entry>
                    <key>dbcp-time-between-eviction-runskey>
                entry>
                <entry>
                    <key>dbcp-min-evictable-idle-timekey>
                entry>
                <entry>
                    <key>dbcp-soft-min-evictable-idle-timekey>
                entry>
            properties>
            <state>ENABLEDstate>
            <type>org.apache.nifi.dbcp.DBCPConnectionPooltype>
        controllerServices>
        <processors>
            <id>20f76bcb-e978-3263-0000-000000000000id>
            <parentGroupId>60d38136-211b-3d16-0000-000000000000parentGroupId>
            <position>
                <x>4.0x>
                <y>413.5y>
            position>
            <bundle>
                <artifact>nifi-standard-narartifact>
                <group>org.apache.nifigroup>
                <version>1.9.2version>
            bundle>
            <config>
                <bulletinLevel>WARNbulletinLevel>
                <comments>comments>
                <concurrentlySchedulableTaskCount>1concurrentlySchedulableTaskCount>
                <descriptors>
                    <entry>
                        <key>JsonPath Expressionkey>
                        <value>
                            <name>JsonPath Expressionname>
                        value>
                    entry>
                    <entry>
                        <key>Null Value Representationkey>
                        <value>
                            <name>Null Value Representationname>
                        value>
                    entry>
                descriptors>
                <executionNode>ALLexecutionNode>
                <lossTolerant>falselossTolerant>
                <penaltyDuration>30 secpenaltyDuration>
                <properties>
                    <entry>
                        <key>JsonPath Expressionkey>
                        <value>$.*value>
                    entry>
                    <entry>
                        <key>Null Value Representationkey>
                        <value>empty stringvalue>
                    entry>
                properties>
                <runDurationMillis>0runDurationMillis>
                <schedulingPeriod>0 secschedulingPeriod>
                <schedulingStrategy>TIMER_DRIVENschedulingStrategy>
                <yieldDuration>1 secyieldDuration>
            config>
            <executionNodeRestricted>falseexecutionNodeRestricted>
            <name>SplitJson_Demoname>
            <relationships>
                <autoTerminate>trueautoTerminate>
                <name>failurename>
            relationships>
            <relationships>
                <autoTerminate>trueautoTerminate>
                <name>originalname>
            relationships>
            <relationships>
                <autoTerminate>falseautoTerminate>
                <name>splitname>
            relationships>
            <state>STOPPEDstate>
            <style/>
            <type>org.apache.nifi.processors.standard.SplitJsontype>
        processors>
        <processors>
            <id>26c8401a-8807-3771-0000-000000000000id>
            <parentGroupId>60d38136-211b-3d16-0000-000000000000parentGroupId>
            <position>
                <x>3.0x>
                <y>624.5y>
            position>
            <bundle>
                <artifact>nifi-hadoop-narartifact>
                <group>org.apache.nifigroup>
                <version>1.9.2version>
            bundle>
            <config>
                <bulletinLevel>WARNbulletinLevel>
                <comments>comments>
                <concurrentlySchedulableTaskCount>1concurrentlySchedulableTaskCount>
                <descriptors>
                    <entry>
                        <key>Hadoop Configuration Resourceskey>
                        <value>
                            <name>Hadoop Configuration Resourcesname>
                        value>
                    entry>
                    <entry>
                        <key>kerberos-credentials-servicekey>
                        <value>
                            <identifiesControllerService>org.apache.nifi.kerberos.KerberosCredentialsServiceidentifiesControllerService>
                            <name>kerberos-credentials-servicename>
                        value>
                    entry>
                    <entry>
                        <key>Kerberos Principalkey>
                        <value>
                            <name>Kerberos Principalname>
                        value>
                    entry>
                    <entry>
                        <key>Kerberos Keytabkey>
                        <value>
                            <name>Kerberos Keytabname>
                        value>
                    entry>
                    <entry>
                        <key>Kerberos Relogin Periodkey>
                        <value>
                            <name>Kerberos Relogin Periodname>
                        value>
                    entry>
                    <entry>
                        <key>Additional Classpath Resourceskey>
                        <value>
                            <name>Additional Classpath Resourcesname>
                        value>
                    entry>
                    <entry>
                        <key>Directorykey>
                        <value>
                            <name>Directoryname>
                        value>
                    entry>
                    <entry>
                        <key>Conflict Resolution Strategykey>
                        <value>
                            <name>Conflict Resolution Strategyname>
                        value>
                    entry>
                    <entry>
                        <key>Block Sizekey>
                        <value>
                            <name>Block Sizename>
                        value>
                    entry>
                    <entry>
                        <key>IO Buffer Sizekey>
                        <value>
                            <name>IO Buffer Sizename>
                        value>
                    entry>
                    <entry>
                        <key>Replicationkey>
                        <value>
                            <name>Replicationname>
                        value>
                    entry>
                    <entry>
                        <key>Permissions umaskkey>
                        <value>
                            <name>Permissions umaskname>
                        value>
                    entry>
                    <entry>
                        <key>Remote Ownerkey>
                        <value>
                            <name>Remote Ownername>
                        value>
                    entry>
                    <entry>
                        <key>Remote Groupkey>
                        <value>
                            <name>Remote Groupname>
                        value>
                    entry>
                    <entry>
                        <key>Compression codeckey>
                        <value>
                            <name>Compression codecname>
                        value>
                    entry>
                descriptors>
                <executionNode>ALLexecutionNode>
                <lossTolerant>falselossTolerant>
                <penaltyDuration>30 secpenaltyDuration>
                <properties>
                    <entry>
                        <key>Hadoop Configuration Resourceskey>
                        <value>/usr/local/bigdata/hadoop-3.1.4/etc/hadoop/hdfs-site.xml,/usr/local/bigdata/hadoop-3.1.4/etc/hadoop/core-site.xmlvalue>
                    entry>
                    <entry>
                        <key>kerberos-credentials-servicekey>
                    entry>
                    <entry>
                        <key>Kerberos Principalkey>
                    entry>
                    <entry>
                        <key>Kerberos Keytabkey>
                    entry>
                    <entry>
                        <key>Kerberos Relogin Periodkey>
                        <value>4 hoursvalue>
                    entry>
                    <entry>
                        <key>Additional Classpath Resourceskey>
                        <value>/usr/local/bigdata/testdata/hadoop-lzo-0.4.21-SNAPSHOT.jarvalue>
                    entry>
                    <entry>
                        <key>Directorykey>
                        <value>/user/hive/warehouse/test.db/uservalue>
                    entry>
                    <entry>
                        <key>Conflict Resolution Strategykey>
                        <value>appendvalue>
                    entry>
                    <entry>
                        <key>Block Sizekey>
                    entry>
                    <entry>
                        <key>IO Buffer Sizekey>
                    entry>
                    <entry>
                        <key>Replicationkey>
                    entry>
                    <entry>
                        <key>Permissions umaskkey>
                    entry>
                    <entry>
                        <key>Remote Ownerkey>
                    entry>
                    <entry>
                        <key>Remote Groupkey>
                    entry>
                    <entry>
                        <key>Compression codeckey>
                        <value>LZOvalue>
                    entry>
                properties>
                <runDurationMillis>0runDurationMillis>
                <schedulingPeriod>0 secschedulingPeriod>
                <schedulingStrategy>TIMER_DRIVENschedulingStrategy>
                <yieldDuration>1 secyieldDuration>
            config>
            <executionNodeRestricted>falseexecutionNodeRestricted>
            <name>PutHDFS_Demoname>
            <relationships>
                <autoTerminate>trueautoTerminate>
                <name>failurename>
            relationships>
            <relationships>
                <autoTerminate>trueautoTerminate>
                <name>successname>
            relationships>
            <state>STOPPEDstate>
            <style/>
            <type>org.apache.nifi.processors.hadoop.PutHDFStype>
        processors>
        <processors>
            <id>4cb1eb1d-ca3a-34e0-0000-000000000000id>
            <parentGroupId>60d38136-211b-3d16-0000-000000000000parentGroupId>
            <position>
                <x>0.0x>
                <y>206.5y>
            position>
            <bundle>
                <artifact>nifi-avro-narartifact>
                <group>org.apache.nifigroup>
                <version>1.9.2version>
            bundle>
            <config>
                <bulletinLevel>WARNbulletinLevel>
                <comments>comments>
                <concurrentlySchedulableTaskCount>1concurrentlySchedulableTaskCount>
                <descriptors>
                    <entry>
                        <key>JSON container optionskey>
                        <value>
                            <name>JSON container optionsname>
                        value>
                    entry>
                    <entry>
                        <key>Wrap Single Recordkey>
                        <value>
                            <name>Wrap Single Recordname>
                        value>
                    entry>
                    <entry>
                        <key>Avro schemakey>
                        <value>
                            <name>Avro schemaname>
                        value>
                    entry>
                descriptors>
                <executionNode>ALLexecutionNode>
                <lossTolerant>falselossTolerant>
                <penaltyDuration>30 secpenaltyDuration>
                <properties>
                    <entry>
                        <key>JSON container optionskey>
                        <value>arrayvalue>
                    entry>
                    <entry>
                        <key>Wrap Single Recordkey>
                        <value>truevalue>
                    entry>
                    <entry>
                        <key>Avro schemakey>
                    entry>
                properties>
                <runDurationMillis>0runDurationMillis>
                <schedulingPeriod>0 secschedulingPeriod>
                <schedulingStrategy>TIMER_DRIVENschedulingStrategy>
                <yieldDuration>1 secyieldDuration>
            config>
            <executionNodeRestricted>falseexecutionNodeRestricted>
            <name>ConvertAvroToJSON_Demoname>
            <relationships>
                <autoTerminate>trueautoTerminate>
                <name>failurename>
            relationships>
            <relationships>
                <autoTerminate>falseautoTerminate>
                <name>successname>
            relationships>
            <state>STOPPEDstate>
            <style/>
            <type>org.apache.nifi.processors.avro.ConvertAvroToJSONtype>
        processors>
        <processors>
            <id>c16280cc-6d1d-355c-0000-000000000000id>
            <parentGroupId>60d38136-211b-3d16-0000-000000000000parentGroupId>
            <position>
                <x>9.0x>
                <y>0.0y>
            position>
            <bundle>
                <artifact>nifi-standard-narartifact>
                <group>org.apache.nifigroup>
                <version>1.9.2version>
            bundle>
            <config>
                <bulletinLevel>WARNbulletinLevel>
                <comments>comments>
                <concurrentlySchedulableTaskCount>1concurrentlySchedulableTaskCount>
                <descriptors>
                    <entry>
                        <key>Database Connection Pooling Servicekey>
                        <value>
                            <identifiesControllerService>org.apache.nifi.dbcp.DBCPServiceidentifiesControllerService>
                            <name>Database Connection Pooling Servicename>
                        value>
                    entry>
                    <entry>
                        <key>db-fetch-db-typekey>
                        <value>
                            <name>db-fetch-db-typename>
                        value>
                    entry>
                    <entry>
                        <key>Table Namekey>
                        <value>
                            <name>Table Namename>
                        value>
                    entry>
                    <entry>
                        <key>Columns to Returnkey>
                        <value>
                            <name>Columns to Returnname>
                        value>
                    entry>
                    <entry>
                        <key>db-fetch-where-clausekey>
                        <value>
                            <name>db-fetch-where-clausename>
                        value>
                    entry>
                    <entry>
                        <key>db-fetch-sql-querykey>
                        <value>
                            <name>db-fetch-sql-queryname>
                        value>
                    entry>
                    <entry>
                        <key>Maximum-value Columnskey>
                        <value>
                            <name>Maximum-value Columnsname>
                        value>
                    entry>
                    <entry>
                        <key>Max Wait Timekey>
                        <value>
                            <name>Max Wait Timename>
                        value>
                    entry>
                    <entry>
                        <key>Fetch Sizekey>
                        <value>
                            <name>Fetch Sizename>
                        value>
                    entry>
                    <entry>
                        <key>qdbt-max-rowskey>
                        <value>
                            <name>qdbt-max-rowsname>
                        value>
                    entry>
                    <entry>
                        <key>qdbt-output-batch-sizekey>
                        <value>
                            <name>qdbt-output-batch-sizename>
                        value>
                    entry>
                    <entry>
                        <key>qdbt-max-fragskey>
                        <value>
                            <name>qdbt-max-fragsname>
                        value>
                    entry>
                    <entry>
                        <key>dbf-normalizekey>
                        <value>
                            <name>dbf-normalizename>
                        value>
                    entry>
                    <entry>
                        <key>transaction-isolation-levelkey>
                        <value>
                            <name>transaction-isolation-levelname>
                        value>
                    entry>
                    <entry>
                        <key>dbf-user-logical-typeskey>
                        <value>
                            <name>dbf-user-logical-typesname>
                        value>
                    entry>
                    <entry>
                        <key>dbf-default-precisionkey>
                        <value>
                            <name>dbf-default-precisionname>
                        value>
                    entry>
                    <entry>
                        <key>dbf-default-scalekey>
                        <value>
                            <name>dbf-default-scalename>
                        value>
                    entry>
                descriptors>
                <executionNode>PRIMARYexecutionNode>
                <lossTolerant>falselossTolerant>
                <penaltyDuration>30 secpenaltyDuration>
                <properties>
                    <entry>
                        <key>Database Connection Pooling Servicekey>
                        <value>55bee1a0-0b0c-3a63-0000-000000000000value>
                    entry>
                    <entry>
                        <key>db-fetch-db-typekey>
                        <value>MySQLvalue>
                    entry>
                    <entry>
                        <key>Table Namekey>
                        <value>uservalue>
                    entry>
                    <entry>
                        <key>Columns to Returnkey>
                    entry>
                    <entry>
                        <key>db-fetch-where-clausekey>
                    entry>
                    <entry>
                        <key>db-fetch-sql-querykey>
                        <value>select * from uservalue>
                    entry>
                    <entry>
                        <key>Maximum-value Columnskey>
                    entry>
                    <entry>
                        <key>Max Wait Timekey>
                        <value>0 secondsvalue>
                    entry>
                    <entry>
                        <key>Fetch Sizekey>
                        <value>0value>
                    entry>
                    <entry>
                        <key>qdbt-max-rowskey>
                        <value>0value>
                    entry>
                    <entry>
                        <key>qdbt-output-batch-sizekey>
                        <value>0value>
                    entry>
                    <entry>
                        <key>qdbt-max-fragskey>
                        <value>0value>
                    entry>
                    <entry>
                        <key>dbf-normalizekey>
                        <value>falsevalue>
                    entry>
                    <entry>
                        <key>transaction-isolation-levelkey>
                    entry>
                    <entry>
                        <key>dbf-user-logical-typeskey>
                        <value>falsevalue>
                    entry>
                    <entry>
                        <key>dbf-default-precisionkey>
                        <value>10value>
                    entry>
                    <entry>
                        <key>dbf-default-scalekey>
                        <value>0value>
                    entry>
                properties>
                <runDurationMillis>0runDurationMillis>
                <schedulingPeriod>0 secschedulingPeriod>
                <schedulingStrategy>TIMER_DRIVENschedulingStrategy>
                <yieldDuration>1 secyieldDuration>
            config>
            <executionNodeRestricted>trueexecutionNodeRestricted>
            <name>QueryDatabaseTable_demoname>
            <relationships>
                <autoTerminate>falseautoTerminate>
                <name>successname>
            relationships>
            <state>STOPPEDstate>
            <style/>
            <type>org.apache.nifi.processors.standard.QueryDatabaseTabletype>
        processors>
    snippet>
    <timestamp>02/08/2023 08:45:41 GMTtimestamp>
template>

2)、模板2

增加了ControlRate处理器以及日志处理器,经测试未发现异常


<template encoding-version="1.2">
    <description>description>
    <groupId>2f7d3766-0186-1000-0000-00006e07b64agroupId>
    <name>MysqlToHDFSByLzo2name>
    <snippet>
        <connections>
            <id>25c778c6-63df-3672-0000-000000000000id>
            <parentGroupId>60d38136-211b-3d16-0000-000000000000parentGroupId>
            <backPressureDataSizeThreshold>1 GBbackPressureDataSizeThreshold>
            <backPressureObjectThreshold>10000backPressureObjectThreshold>
            <destination>
                <groupId>60d38136-211b-3d16-0000-000000000000groupId>
                <id>203e8481-e4c7-3340-0000-000000000000id>
                <type>PROCESSORtype>
            destination>
            <flowFileExpiration>0 secflowFileExpiration>
            <labelIndex>1labelIndex>
            <loadBalanceCompression>DO_NOT_COMPRESSloadBalanceCompression>
            <loadBalancePartitionAttribute>loadBalancePartitionAttribute>
            <loadBalanceStatus>LOAD_BALANCE_NOT_CONFIGUREDloadBalanceStatus>
            <loadBalanceStrategy>DO_NOT_LOAD_BALANCEloadBalanceStrategy>
            <name>name>
            <selectedRelationships>failureselectedRelationships>
            <source>
                <groupId>60d38136-211b-3d16-0000-000000000000groupId>
                <id>4cb1eb1d-ca3a-34e0-0000-000000000000id>
                <type>PROCESSORtype>
            source>
            <zIndex>0zIndex>
        connections>
        <connections>
            <id>59e154ce-8ca9-329f-0000-000000000000id>
            <parentGroupId>60d38136-211b-3d16-0000-000000000000parentGroupId>
            <backPressureDataSizeThreshold>1 GBbackPressureDataSizeThreshold>
            <backPressureObjectThreshold>10000backPressureObjectThreshold>
            <destination>
                <groupId>60d38136-211b-3d16-0000-000000000000groupId>
                <id>26c8401a-8807-3771-0000-000000000000id>
                <type>PROCESSORtype>
            destination>
            <flowFileExpiration>0 secflowFileExpiration>
            <labelIndex>1labelIndex>
            <loadBalanceCompression>COMPRESS_ATTRIBUTES_AND_CONTENTloadBalanceCompression>
            <loadBalancePartitionAttribute>loadBalancePartitionAttribute>
            <loadBalanceStatus>LOAD_BALANCE_INACTIVEloadBalanceStatus>
            <loadBalanceStrategy>ROUND_ROBINloadBalanceStrategy>
            <name>name>
            <selectedRelationships>successselectedRelationships>
            <source>
                <groupId>60d38136-211b-3d16-0000-000000000000groupId>
                <id>1b9fd194-4cdb-369f-0000-000000000000id>
                <type>PROCESSORtype>
            source>
            <zIndex>0zIndex>
        connections>
        <connections>
            <id>60539d1e-e7f5-396c-0000-000000000000id>
            <parentGroupId>60d38136-211b-3d16-0000-000000000000parentGroupId>
            <backPressureDataSizeThreshold>1 GBbackPressureDataSizeThreshold>
            <backPressureObjectThreshold>10000backPressureObjectThreshold>
            <destination>
                <groupId>60d38136-211b-3d16-0000-000000000000groupId>
                <id>203e8481-e4c7-3340-0000-000000000000id>
                <type>PROCESSORtype>
            destination>
            <flowFileExpiration>0 secflowFileExpiration>
            <labelIndex>1labelIndex>
            <loadBalanceCompression>DO_NOT_COMPRESSloadBalanceCompression>
            <loadBalancePartitionAttribute>loadBalancePartitionAttribute>
            <loadBalanceStatus>LOAD_BALANCE_NOT_CONFIGUREDloadBalanceStatus>
            <loadBalanceStrategy>DO_NOT_LOAD_BALANCEloadBalanceStrategy>
            <name>name>
            <selectedRelationships>failureselectedRelationships>
            <source>
                <groupId>60d38136-211b-3d16-0000-000000000000groupId>
                <id>26c8401a-8807-3771-0000-000000000000id>
                <type>PROCESSORtype>
            source>
            <zIndex>0zIndex>
        connections>
        <connections>
            <id>6e3859ca-2a0d-3560-0000-000000000000id>
            <parentGroupId>60d38136-211b-3d16-0000-000000000000parentGroupId>
            <backPressureDataSizeThreshold>1 GBbackPressureDataSizeThreshold>
            <backPressureObjectThreshold>10000backPressureObjectThreshold>
            <destination>
                <groupId>60d38136-211b-3d16-0000-000000000000groupId>
                <id>1b9fd194-4cdb-369f-0000-000000000000id>
                <type>PROCESSORtype>
            destination>
            <flowFileExpiration>0 secflowFileExpiration>
            <labelIndex>1labelIndex>
            <loadBalanceCompression>COMPRESS_ATTRIBUTES_AND_CONTENTloadBalanceCompression>
            <loadBalancePartitionAttribute>loadBalancePartitionAttribute>
            <loadBalanceStatus>LOAD_BALANCE_INACTIVEloadBalanceStatus>
            <loadBalanceStrategy>ROUND_ROBINloadBalanceStrategy>
            <name>S_Cname>
            <selectedRelationships>splitselectedRelationships>
            <source>
                <groupId>60d38136-211b-3d16-0000-000000000000groupId>
                <id>20f76bcb-e978-3263-0000-000000000000id>
                <type>PROCESSORtype>
            source>
            <zIndex>0zIndex>
        connections>
        <connections>
            <id>7b343e88-ab1a-30ee-0000-000000000000id>
            <parentGroupId>60d38136-211b-3d16-0000-000000000000parentGroupId>
            <backPressureDataSizeThreshold>1 GBbackPressureDataSizeThreshold>
            <backPressureObjectThreshold>10000backPressureObjectThreshold>
            <destination>
                <groupId>60d38136-211b-3d16-0000-000000000000groupId>
                <id>203e8481-e4c7-3340-0000-000000000000id>
                <type>PROCESSORtype>
            destination>
            <flowFileExpiration>0 secflowFileExpiration>
            <labelIndex>1labelIndex>
            <loadBalanceCompression>DO_NOT_COMPRESSloadBalanceCompression>
            <loadBalancePartitionAttribute>loadBalancePartitionAttribute>
            <loadBalanceStatus>LOAD_BALANCE_NOT_CONFIGUREDloadBalanceStatus>
            <loadBalanceStrategy>DO_NOT_LOAD_BALANCEloadBalanceStrategy>
            <name>name>
            <selectedRelationships>successselectedRelationships>
            <source>
                <groupId>60d38136-211b-3d16-0000-000000000000groupId>
                <id>c16280cc-6d1d-355c-0000-000000000000id>
                <type>PROCESSORtype>
            source>
            <zIndex>0zIndex>
        connections>
        <connections>
            <id>8bacaebe-bce0-31e8-0000-000000000000id>
            <parentGroupId>60d38136-211b-3d16-0000-000000000000parentGroupId>
            <backPressureDataSizeThreshold>1 GBbackPressureDataSizeThreshold>
            <backPressureObjectThreshold>10000backPressureObjectThreshold>
            <destination>
                <groupId>60d38136-211b-3d16-0000-000000000000groupId>
                <id>4cb1eb1d-ca3a-34e0-0000-000000000000id>
                <type>PROCESSORtype>
            destination>
            <flowFileExpiration>0 secflowFileExpiration>
            <labelIndex>1labelIndex>
            <loadBalanceCompression>COMPRESS_ATTRIBUTES_AND_CONTENTloadBalanceCompression>
            <loadBalancePartitionAttribute>loadBalancePartitionAttribute>
            <loadBalanceStatus>LOAD_BALANCE_INACTIVEloadBalanceStatus>
            <loadBalanceStrategy>ROUND_ROBINloadBalanceStrategy>
            <name>Q_Cname>
            <selectedRelationships>successselectedRelationships>
            <source>
                <groupId>60d38136-211b-3d16-0000-000000000000groupId>
                <id>c16280cc-6d1d-355c-0000-000000000000id>
                <type>PROCESSORtype>
            source>
            <zIndex>0zIndex>
        connections>
        <connections>
            <id>ee0fcd22-6c7c-3edc-0000-000000000000id>
            <parentGroupId>60d38136-211b-3d16-0000-000000000000parentGroupId>
            <backPressureDataSizeThreshold>1 GBbackPressureDataSizeThreshold>
            <backPressureObjectThreshold>10000backPressureObjectThreshold>
            <destination>
                <groupId>60d38136-211b-3d16-0000-000000000000groupId>
                <id>203e8481-e4c7-3340-0000-000000000000id>
                <type>PROCESSORtype>
            destination>
            <flowFileExpiration>0 secflowFileExpiration>
            <labelIndex>1labelIndex>
            <loadBalanceCompression>COMPRESS_ATTRIBUTES_AND_CONTENTloadBalanceCompression>
            <loadBalancePartitionAttribute>loadBalancePartitionAttribute>
            <loadBalanceStatus>LOAD_BALANCE_INACTIVEloadBalanceStatus>
            <loadBalanceStrategy>ROUND_ROBINloadBalanceStrategy>
            <name>name>
            <selectedRelationships>failureselectedRelationships>
            <source>
                <groupId>60d38136-211b-3d16-0000-000000000000groupId>
                <id>1b9fd194-4cdb-369f-0000-000000000000id>
                <type>PROCESSORtype>
            source>
            <zIndex>0zIndex>
        connections>
        <connections>
            <id>f4577d45-be28-3c83-0000-000000000000id>
            <parentGroupId>60d38136-211b-3d16-0000-000000000000parentGroupId>
            <backPressureDataSizeThreshold>1 GBbackPressureDataSizeThreshold>
            <backPressureObjectThreshold>10000backPressureObjectThreshold>
            <destination>
                <groupId>60d38136-211b-3d16-0000-000000000000groupId>
                <id>203e8481-e4c7-3340-0000-000000000000id>
                <type>PROCESSORtype>
            destination>
            <flowFileExpiration>0 secflowFileExpiration>
            <labelIndex>1labelIndex>
            <loadBalanceCompression>DO_NOT_COMPRESSloadBalanceCompression>
            <loadBalancePartitionAttribute>loadBalancePartitionAttribute>
            <loadBalanceStatus>LOAD_BALANCE_NOT_CONFIGUREDloadBalanceStatus>
            <loadBalanceStrategy>DO_NOT_LOAD_BALANCEloadBalanceStrategy>
            <name>name>
            <selectedRelationships>failureselectedRelationships>
            <source>
                <groupId>60d38136-211b-3d16-0000-000000000000groupId>
                <id>20f76bcb-e978-3263-0000-000000000000id>
                <type>PROCESSORtype>
            source>
            <zIndex>0zIndex>
        connections>
        <connections>
            <id>f5322759-8583-3753-0000-000000000000id>
            <parentGroupId>60d38136-211b-3d16-0000-000000000000parentGroupId>
            <backPressureDataSizeThreshold>1 GBbackPressureDataSizeThreshold>
            <backPressureObjectThreshold>10000backPressureObjectThreshold>
            <destination>
                <groupId>60d38136-211b-3d16-0000-000000000000groupId>
                <id>20f76bcb-e978-3263-0000-000000000000id>
                <type>PROCESSORtype>
            destination>
            <flowFileExpiration>0 secflowFileExpiration>
            <labelIndex>1labelIndex>
            <loadBalanceCompression>COMPRESS_ATTRIBUTES_AND_CONTENTloadBalanceCompression>
            <loadBalancePartitionAttribute>loadBalancePartitionAttribute>
            <loadBalanceStatus>LOAD_BALANCE_INACTIVEloadBalanceStatus>
            <loadBalanceStrategy>ROUND_ROBINloadBalanceStrategy>
            <name>C_Sname>
            <selectedRelationships>successselectedRelationships>
            <source>
                <groupId>60d38136-211b-3d16-0000-000000000000groupId>
                <id>4cb1eb1d-ca3a-34e0-0000-000000000000id>
                <type>PROCESSORtype>
            source>
            <zIndex>0zIndex>
        connections>
        <controllerServices>
            <id>55bee1a0-0b0c-3a63-0000-000000000000id>
            <parentGroupId>60d38136-211b-3d16-0000-000000000000parentGroupId>
            <bundle>
                <artifact>nifi-dbcp-service-narartifact>
                <group>org.apache.nifigroup>
                <version>1.9.2version>
            bundle>
            <comments>comments>
            <descriptors>
                <entry>
                    <key>Database Connection URLkey>
                    <value>
                        <name>Database Connection URLname>
                    value>
                entry>
                <entry>
                    <key>Database Driver Class Namekey>
                    <value>
                        <name>Database Driver Class Namename>
                    value>
                entry>
                <entry>
                    <key>database-driver-locationskey>
                    <value>
                        <name>database-driver-locationsname>
                    value>
                entry>
                <entry>
                    <key>kerberos-credentials-servicekey>
                    <value>
                        <identifiesControllerService>org.apache.nifi.kerberos.KerberosCredentialsServiceidentifiesControllerService>
                        <name>kerberos-credentials-servicename>
                    value>
                entry>
                <entry>
                    <key>Database Userkey>
                    <value>
                        <name>Database Username>
                    value>
                entry>
                <entry>
                    <key>Passwordkey>
                    <value>
                        <name>Passwordname>
                    value>
                entry>
                <entry>
                    <key>Max Wait Timekey>
                    <value>
                        <name>Max Wait Timename>
                    value>
                entry>
                <entry>
                    <key>Max Total Connectionskey>
                    <value>
                        <name>Max Total Connectionsname>
                    value>
                entry>
                <entry>
                    <key>Validation-querykey>
                    <value>
                        <name>Validation-queryname>
                    value>
                entry>
                <entry>
                    <key>dbcp-min-idle-connskey>
                    <value>
                        <name>dbcp-min-idle-connsname>
                    value>
                entry>
                <entry>
                    <key>dbcp-max-idle-connskey>
                    <value>
                        <name>dbcp-max-idle-connsname>
                    value>
                entry>
                <entry>
                    <key>dbcp-max-conn-lifetimekey>
                    <value>
                        <name>dbcp-max-conn-lifetimename>
                    value>
                entry>
                <entry>
                    <key>dbcp-time-between-eviction-runskey>
                    <value>
                        <name>dbcp-time-between-eviction-runsname>
                    value>
                entry>
                <entry>
                    <key>dbcp-min-evictable-idle-timekey>
                    <value>
                        <name>dbcp-min-evictable-idle-timename>
                    value>
                entry>
                <entry>
                    <key>dbcp-soft-min-evictable-idle-timekey>
                    <value>
                        <name>dbcp-soft-min-evictable-idle-timename>
                    value>
                entry>
            descriptors>
            <name>MySQL_ConnectionPoolname>
            <persistsState>falsepersistsState>
            <properties>
                <entry>
                    <key>Database Connection URLkey>
                    <value>jdbc:mysql://192.168.10.44:3306/test?characterEncoding=UTF-8&useSSL=false&allowPublicKeyRetrieval=truevalue>
                entry>
                <entry>
                    <key>Database Driver Class Namekey>
                    <value>com.mysql.jdbc.Drivervalue>
                entry>
                <entry>
                    <key>database-driver-locationskey>
                    <value>/usr/local/bigdata/testdata/mysql-connector-java-5.1.44.jarvalue>
                entry>
                <entry>
                    <key>kerberos-credentials-servicekey>
                entry>
                <entry>
                    <key>Database Userkey>
                    <value>rootvalue>
                entry>
                <entry>
                    <key>Passwordkey>
                entry>
                <entry>
                    <key>Max Wait Timekey>
                    <value>500 millisvalue>
                entry>
                <entry>
                    <key>Max Total Connectionskey>
                    <value>8value>
                entry>
                <entry>
                    <key>Validation-querykey>
                entry>
                <entry>
                    <key>dbcp-min-idle-connskey>
                    <value>0value>
                entry>
                <entry>
                    <key>dbcp-max-idle-connskey>
                    <value>8value>
                entry>
                <entry>
                    <key>dbcp-max-conn-lifetimekey>
                    <value>-1value>
                entry>
                <entry>
                    <key>dbcp-time-between-eviction-runskey>
                    <value>-1value>
                entry>
                <entry>
                    <key>dbcp-min-evictable-idle-timekey>
                    <value>30 minsvalue>
                entry>
                <entry>
                    <key>dbcp-soft-min-evictable-idle-timekey>
                    <value>-1value>
                entry>
            properties>
            <state>ENABLEDstate>
            <type>org.apache.nifi.dbcp.DBCPConnectionPooltype>
        controllerServices>
        <processors>
            <id>1b9fd194-4cdb-369f-0000-000000000000id>
            <parentGroupId>60d38136-211b-3d16-0000-000000000000parentGroupId>
            <position>
                <x>2.974225266934127x>
                <y>627.7810694387299y>
            position>
            <bundle>
                <artifact>nifi-standard-narartifact>
                <group>org.apache.nifigroup>
                <version>1.9.2version>
            bundle>
            <config>
                <bulletinLevel>WARNbulletinLevel>
                <comments>comments>
                <concurrentlySchedulableTaskCount>1concurrentlySchedulableTaskCount>
                <descriptors>
                    <entry>
                        <key>Rate Control Criteriakey>
                        <value>
                            <name>Rate Control Criterianame>
                        value>
                    entry>
                    <entry>
                        <key>Maximum Ratekey>
                        <value>
                            <name>Maximum Ratename>
                        value>
                    entry>
                    <entry>
                        <key>Rate Controlled Attributekey>
                        <value>
                            <name>Rate Controlled Attributename>
                        value>
                    entry>
                    <entry>
                        <key>Time Durationkey>
                        <value>
                            <name>Time Durationname>
                        value>
                    entry>
                    <entry>
                        <key>Grouping Attributekey>
                        <value>
                            <name>Grouping Attributename>
                        value>
                    entry>
                descriptors>
                <executionNode>ALLexecutionNode>
                <lossTolerant>falselossTolerant>
                <penaltyDuration>30 secpenaltyDuration>
                <properties>
                    <entry>
                        <key>Rate Control Criteriakey>
                        <value>flowfile countvalue>
                    entry>
                    <entry>
                        <key>Maximum Ratekey>
                        <value>100000value>
                    entry>
                    <entry>
                        <key>Rate Controlled Attributekey>
                    entry>
                    <entry>
                        <key>Time Durationkey>
                        <value>1 minvalue>
                    entry>
                    <entry>
                        <key>Grouping Attributekey>
                    entry>
                properties>
                <runDurationMillis>0runDurationMillis>
                <schedulingPeriod>0 secschedulingPeriod>
                <schedulingStrategy>TIMER_DRIVENschedulingStrategy>
                <yieldDuration>1 secyieldDuration>
            config>
            <executionNodeRestricted>falseexecutionNodeRestricted>
            <name>ControlRate_demoname>
            <relationships>
                <autoTerminate>falseautoTerminate>
                <name>failurename>
            relationships>
            <relationships>
                <autoTerminate>falseautoTerminate>
                <name>successname>
            relationships>
            <state>STOPPEDstate>
            <style/>
            <type>org.apache.nifi.processors.standard.ControlRatetype>
        processors>
        <processors>
            <id>203e8481-e4c7-3340-0000-000000000000id>
            <parentGroupId>60d38136-211b-3d16-0000-000000000000parentGroupId>
            <position>
                <x>712.1617915050342x>
                <y>435.16275513999926y>
            position>
            <bundle>
                <artifact>nifi-standard-narartifact>
                <group>org.apache.nifigroup>
                <version>1.9.2version>
            bundle>
            <config>
                <bulletinLevel>WARNbulletinLevel>
                <comments>comments>
                <concurrentlySchedulableTaskCount>1concurrentlySchedulableTaskCount>
                <descriptors>
                    <entry>
                        <key>Log Levelkey>
                        <value>
                            <name>Log Levelname>
                        value>
                    entry>
                    <entry>
                        <key>Log Payloadkey>
                        <value>
                            <name>Log Payloadname>
                        value>
                    entry>
                    <entry>
                        <key>Attributes to Logkey>
                        <value>
                            <name>Attributes to Logname>
                        value>
                    entry>
                    <entry>
                        <key>attributes-to-log-regexkey>
                        <value>
                            <name>attributes-to-log-regexname>
                        value>
                    entry>
                    <entry>
                        <key>Attributes to Ignorekey>
                        <value>
                            <name>Attributes to Ignorename>
                        value>
                    entry>
                    <entry>
                        <key>attributes-to-ignore-regexkey>
                        <value>
                            <name>attributes-to-ignore-regexname>
                        value>
                    entry>
                    <entry>
                        <key>Log prefixkey>
                        <value>
                            <name>Log prefixname>
                        value>
                    entry>
                    <entry>
                        <key>character-setkey>
                        <value>
                            <name>character-setname>
                        value>
                    entry>
                descriptors>
                <executionNode>ALLexecutionNode>
                <lossTolerant>falselossTolerant>
                <penaltyDuration>30 secpenaltyDuration>
                <properties>
                    <entry>
                        <key>Log Levelkey>
                        <value>infovalue>
                    entry>
                    <entry>
                        <key>Log Payloadkey>
                        <value>falsevalue>
                    entry>
                    <entry>
                        <key>Attributes to Logkey>
                    entry>
                    <entry>
                        <key>attributes-to-log-regexkey>
                        <value>.*value>
                    entry>
                    <entry>
                        <key>Attributes to Ignorekey>
                    entry>
                    <entry>
                        <key>attributes-to-ignore-regexkey>
                    entry>
                    <entry>
                        <key>Log prefixkey>
                    entry>
                    <entry>
                        <key>character-setkey>
                        <value>UTF-8value>
                    entry>
                properties>
                <runDurationMillis>0runDurationMillis>
                <schedulingPeriod>0 secschedulingPeriod>
                <schedulingStrategy>TIMER_DRIVENschedulingStrategy>
                <yieldDuration>1 secyieldDuration>
            config>
            <executionNodeRestricted>falseexecutionNodeRestricted>
            <name>LogAttribute——demoname>
            <relationships>
                <autoTerminate>trueautoTerminate>
                <name>successname>
            relationships>
            <state>STOPPEDstate>
            <style/>
            <type>org.apache.nifi.processors.standard.LogAttributetype>
        processors>
        <processors>
            <id>20f76bcb-e978-3263-0000-000000000000id>
            <parentGroupId>60d38136-211b-3d16-0000-000000000000parentGroupId>
            <position>
                <x>1.783660888671875x>
                <y>408.520751953125y>
            position>
            <bundle>
                <artifact>nifi-standard-narartifact>
                <group>org.apache.nifigroup>
                <version>1.9.2version>
            bundle>
            <config>
                <bulletinLevel>WARNbulletinLevel>
                <comments>comments>
                <concurrentlySchedulableTaskCount>3concurrentlySchedulableTaskCount>
                <descriptors>
                    <entry>
                        <key>JsonPath Expressionkey>
                        <value>
                            <name>JsonPath Expressionname>
                        value>
                    entry>
                    <entry>
                        <key>Null Value Representationkey>
                        <value>
                            <name>Null Value Representationname>
                        value>
                    entry>
                descriptors>
                <executionNode>ALLexecutionNode>
                <lossTolerant>falselossTolerant>
                <penaltyDuration>30 secpenaltyDuration>
                <properties>
                    <entry>
                        <key>JsonPath Expressionkey>
                        <value>$.*value>
                    entry>
                    <entry>
                        <key>Null Value Representationkey>
                        <value>empty stringvalue>
                    entry>
                properties>
                <runDurationMillis>0runDurationMillis>
                <schedulingPeriod>0 secschedulingPeriod>
                <schedulingStrategy>TIMER_DRIVENschedulingStrategy>
                <yieldDuration>1 secyieldDuration>
            config>
            <executionNodeRestricted>falseexecutionNodeRestricted>
            <name>SplitJson_Demoname>
            <relationships>
                <autoTerminate>falseautoTerminate>
                <name>failurename>
            relationships>
            <relationships>
                <autoTerminate>trueautoTerminate>
                <name>originalname>
            relationships>
            <relationships>
                <autoTerminate>falseautoTerminate>
                <name>splitname>
            relationships>
            <state>STOPPEDstate>
            <style/>
            <type>org.apache.nifi.processors.standard.SplitJsontype>
        processors>
        <processors>
            <id>26c8401a-8807-3771-0000-000000000000id>
            <parentGroupId>60d38136-211b-3d16-0000-000000000000parentGroupId>
            <position>
                <x>0.0x>
                <y>825.9684448242188y>
            position>
            <bundle>
                <artifact>nifi-hadoop-narartifact>
                <group>org.apache.nifigroup>
                <version>1.9.2version>
            bundle>
            <config>
                <bulletinLevel>WARNbulletinLevel>
                <comments>comments>
                <concurrentlySchedulableTaskCount>3concurrentlySchedulableTaskCount>
                <descriptors>
                    <entry>
                        <key>Hadoop Configuration Resourceskey>
                        <value>
                            <name>Hadoop Configuration Resourcesname>
                        value>
                    entry>
                    <entry>
                        <key>kerberos-credentials-servicekey>
                        <value>
                            <identifiesControllerService>org.apache.nifi.kerberos.KerberosCredentialsServiceidentifiesControllerService>
                            <name>kerberos-credentials-servicename>
                        value>
                    entry>
                    <entry>
                        <key>Kerberos Principalkey>
                        <value>
                            <name>Kerberos Principalname>
                        value>
                    entry>
                    <entry>
                        <key>Kerberos Keytabkey>
                        <value>
                            <name>Kerberos Keytabname>
                        value>
                    entry>
                    <entry>
                        <key>Kerberos Relogin Periodkey>
                        <value>
                            <name>Kerberos Relogin Periodname>
                        value>
                    entry>
                    <entry>
                        <key>Additional Classpath Resourceskey>
                        <value>
                            <name>Additional Classpath Resourcesname>
                        value>
                    entry>
                    <entry>
                        <key>Directorykey>
                        <value>
                            <name>Directoryname>
                        value>
                    entry>
                    <entry>
                        <key>Conflict Resolution Strategykey>
                        <value>
                            <name>Conflict Resolution Strategyname>
                        value>
                    entry>
                    <entry>
                        <key>Block Sizekey>
                        <value>
                            <name>Block Sizename>
                        value>
                    entry>
                    <entry>
                        <key>IO Buffer Sizekey>
                        <value>
                            <name>IO Buffer Sizename>
                        value>
                    entry>
                    <entry>
                        <key>Replicationkey>
                        <value>
                            <name>Replicationname>
                        value>
                    entry>
                    <entry>
                        <key>Permissions umaskkey>
                        <value>
                            <name>Permissions umaskname>
                        value>
                    entry>
                    <entry>
                        <key>Remote Ownerkey>
                        <value>
                            <name>Remote Ownername>
                        value>
                    entry>
                    <entry>
                        <key>Remote Groupkey>
                        <value>
                            <name>Remote Groupname>
                        value>
                    entry>
                    <entry>
                        <key>Compression codeckey>
                        <value>
                            <name>Compression codecname>
                        value>
                    entry>
                descriptors>
                <executionNode>ALLexecutionNode>
                <lossTolerant>falselossTolerant>
                <penaltyDuration>30 secpenaltyDuration>
                <properties>
                    <entry>
                        <key>Hadoop Configuration Resourceskey>
                        <value>/usr/local/bigdata/hadoop-3.1.4/etc/hadoop/hdfs-site.xml,/usr/local/bigdata/hadoop-3.1.4/etc/hadoop/core-site.xmlvalue>
                    entry>
                    <entry>
                        <key>kerberos-credentials-servicekey>
                    entry>
                    <entry>
                        <key>Kerberos Principalkey>
                    entry>
                    <entry>
                        <key>Kerberos Keytabkey>
                    entry>
                    <entry>
                        <key>Kerberos Relogin Periodkey>
                        <value>4 hoursvalue>
                    entry>
                    <entry>
                        <key>Additional Classpath Resourceskey>
                        <value>/usr/local/bigdata/testdata/hadoop-lzo-0.4.21-SNAPSHOT.jarvalue>
                    entry>
                    <entry>
                        <key>Directorykey>
                        <value>/user/hive/warehouse/test.db/testuservalue>
                    entry>
                    <entry>
                        <key>Conflict Resolution Strategykey>
                        <value>appendvalue>
                    entry>
                    <entry>
                        <key>Block Sizekey>
                    entry>
                    <entry>
                        <key>IO Buffer Sizekey>
                    entry>
                    <entry>
                        <key>Replicationkey>
                    entry>
                    <entry>
                        <key>Permissions umaskkey>
                    entry>
                    <entry>
                        <key>Remote Ownerkey>
                    entry>
                    <entry>
                        <key>Remote Groupkey>
                    entry>
                    <entry>
                        <key>Compression codeckey>
                        <value>LZOvalue>
                    entry>
                properties>
                <runDurationMillis>0runDurationMillis>
                <schedulingPeriod>0 secschedulingPeriod>
                <schedulingStrategy>TIMER_DRIVENschedulingStrategy>
                <yieldDuration>1 secyieldDuration>
            config>
            <executionNodeRestricted>falseexecutionNodeRestricted>
            <name>PutHDFS_Demoname>
            <relationships>
                <autoTerminate>falseautoTerminate>
                <name>failurename>
            relationships>
            <relationships>
                <autoTerminate>trueautoTerminate>
                <name>successname>
            relationships>
            <state>STOPPEDstate>
            <style/>
            <type>org.apache.nifi.processors.hadoop.PutHDFStype>
        processors>
        <processors>
            <id>4cb1eb1d-ca3a-34e0-0000-000000000000id>
            <parentGroupId>60d38136-211b-3d16-0000-000000000000parentGroupId>
            <position>
                <x>5.04095458984375x>
                <y>203.5y>
            position>
            <bundle>
                <artifact>nifi-avro-narartifact>
                <group>org.apache.nifigroup>
                <version>1.9.2version>
            bundle>
            <config>
                <bulletinLevel>WARNbulletinLevel>
                <comments>comments>
                <concurrentlySchedulableTaskCount>1concurrentlySchedulableTaskCount>
                <descriptors>
                    <entry>
                        <key>JSON container optionskey>
                        <value>
                            <name>JSON container optionsname>
                        value>
                    entry>
                    <entry>
                        <key>Wrap Single Recordkey>
                        <value>
                            <name>Wrap Single Recordname>
                        value>
                    entry>
                    <entry>
                        <key>Avro schemakey>
                        <value>
                            <name>Avro schemaname>
                        value>
                    entry>
                descriptors>
                <executionNode>ALLexecutionNode>
                <lossTolerant>falselossTolerant>
                <penaltyDuration>30 secpenaltyDuration>
                <properties>
                    <entry>
                        <key>JSON container optionskey>
                        <value>arrayvalue>
                    entry>
                    <entry>
                        <key>Wrap Single Recordkey>
                        <value>truevalue>
                    entry>
                    <entry>
                        <key>Avro schemakey>
                    entry>
                properties>
                <runDurationMillis>0runDurationMillis>
                <schedulingPeriod>0 secschedulingPeriod>
                <schedulingStrategy>TIMER_DRIVENschedulingStrategy>
                <yieldDuration>1 secyieldDuration>
            config>
            <executionNodeRestricted>falseexecutionNodeRestricted>
            <name>ConvertAvroToJSON_Demoname>
            <relationships>
                <autoTerminate>falseautoTerminate>
                <name>failurename>
            relationships>
            <relationships>
                <autoTerminate>falseautoTerminate>
                <name>successname>
            relationships>
            <state>STOPPEDstate>
            <style/>
            <type>org.apache.nifi.processors.avro.ConvertAvroToJSONtype>
        processors>
        <processors>
            <id>c16280cc-6d1d-355c-0000-000000000000id>
            <parentGroupId>60d38136-211b-3d16-0000-000000000000parentGroupId>
            <position>
                <x>4.04095458984375x>
                <y>0.0y>
            position>
            <bundle>
                <artifact>nifi-standard-narartifact>
                <group>org.apache.nifigroup>
                <version>1.9.2version>
            bundle>
            <config>
                <bulletinLevel>WARNbulletinLevel>
                <comments>comments>
                <concurrentlySchedulableTaskCount>1concurrentlySchedulableTaskCount>
                <descriptors>
                    <entry>
                        <key>Database Connection Pooling Servicekey>
                        <value>
                            <identifiesControllerService>org.apache.nifi.dbcp.DBCPServiceidentifiesControllerService>
                            <name>Database Connection Pooling Servicename>
                        value>
                    entry>
                    <entry>
                        <key>db-fetch-db-typekey>
                        <value>
                            <name>db-fetch-db-typename>
                        value>
                    entry>
                    <entry>
                        <key>Table Namekey>
                        <value>
                            <name>Table Namename>
                        value>
                    entry>
                    <entry>
                        <key>Columns to Returnkey>
                        <value>
                            <name>Columns to Returnname>
                        value>
                    entry>
                    <entry>
                        <key>db-fetch-where-clausekey>
                        <value>
                            <name>db-fetch-where-clausename>
                        value>
                    entry>
                    <entry>
                        <key>db-fetch-sql-querykey>
                        <value>
                            <name>db-fetch-sql-queryname>
                        value>
                    entry>
                    <entry>
                        <key>Maximum-value Columnskey>
                        <value>
                            <name>Maximum-value Columnsname>
                        value>
                    entry>
                    <entry>
                        <key>Max Wait Timekey>
                        <value>
                            <name>Max Wait Timename>
                        value>
                    entry>
                    <entry>
                        <key>Fetch Sizekey>
                        <value>
                            <name>Fetch Sizename>
                        value>
                    entry>
                    <entry>
                        <key>qdbt-max-rowskey>
                        <value>
                            <name>qdbt-max-rowsname>
                        value>
                    entry>
                    <entry>
                        <key>qdbt-output-batch-sizekey>
                        <value>
                            <name>qdbt-output-batch-sizename>
                        value>
                    entry>
                    <entry>
                        <key>qdbt-max-fragskey>
                        <value>
                            <name>qdbt-max-fragsname>
                        value>
                    entry>
                    <entry>
                        <key>dbf-normalizekey>
                        <value>
                            <name>dbf-normalizename>
                        value>
                    entry>
                    <entry>
                        <key>transaction-isolation-levelkey>
                        <value>
                            <name>transaction-isolation-levelname>
                        value>
                    entry>
                    <entry>
                        <key>dbf-user-logical-typeskey>
                        <value>
                            <name>dbf-user-logical-typesname>
                        value>
                    entry>
                    <entry>
                        <key>dbf-default-precisionkey>
                        <value>
                            <name>dbf-default-precisionname>
                        value>
                    entry>
                    <entry>
                        <key>dbf-default-scalekey>
                        <value>
                            <name>dbf-default-scalename>
                        value>
                    entry>
                descriptors>
                <executionNode>PRIMARYexecutionNode>
                <lossTolerant>falselossTolerant>
                <penaltyDuration>30 secpenaltyDuration>
                <properties>
                    <entry>
                        <key>Database Connection Pooling Servicekey>
                        <value>55bee1a0-0b0c-3a63-0000-000000000000value>
                    entry>
                    <entry>
                        <key>db-fetch-db-typekey>
                        <value>MySQLvalue>
                    entry>
                    <entry>
                        <key>Table Namekey>
                        <value>dx_uservalue>
                    entry>
                    <entry>
                        <key>Columns to Returnkey>
                    entry>
                    <entry>
                        <key>db-fetch-where-clausekey>
                    entry>
                    <entry>
                        <key>db-fetch-sql-querykey>
                        <value>select * from dx_user value>
                    entry>
                    <entry>
                        <key>Maximum-value Columnskey>
                    entry>
                    <entry>
                        <key>Max Wait Timekey>
                        <value>0 secondsvalue>
                    entry>
                    <entry>
                        <key>Fetch Sizekey>
                        <value>0value>
                    entry>
                    <entry>
                        <key>qdbt-max-rowskey>
                        <value>0value>
                    entry>
                    <entry>
                        <key>qdbt-output-batch-sizekey>
                        <value>0value>
                    entry>
                    <entry>
                        <key>qdbt-max-fragskey>
                        <value>0value>
                    entry>
                    <entry>
                        <key>dbf-normalizekey>
                        <value>falsevalue>
                    entry>
                    <entry>
                        <key>transaction-isolation-levelkey>
                    entry>
                    <entry>
                        <key>dbf-user-logical-typeskey>
                        <value>falsevalue>
                    entry>
                    <entry>
                        <key>dbf-default-precisionkey>
                        <value>10value>
                    entry>
                    <entry>
                        <key>dbf-default-scalekey>
                        <value>0value>
                    entry>
                properties>
                <runDurationMillis>0runDurationMillis>
                <schedulingPeriod>86400 secschedulingPeriod>
                <schedulingStrategy>TIMER_DRIVENschedulingStrategy>
                <yieldDuration>1 secyieldDuration>
            config>
            <executionNodeRestricted>trueexecutionNodeRestricted>
            <name>QueryDatabaseTable_demoname>
            <relationships>
                <autoTerminate>falseautoTerminate>
                <name>successname>
            relationships>
            <state>STOPPEDstate>
            <style/>
            <type>org.apache.nifi.processors.standard.QueryDatabaseTabletype>
        processors>
    snippet>
    <timestamp>02/09/2023 05:48:36 GMTtimestamp>
template>

2、处理器流程

1)、模板1处理流程

QueryDatabaseTable ——> ConvertAvroToJSON ——> SplitJson ——> PutHDFS

  • QueryDatabaseTable读取Mysql数据
  • ConvertAvroToJSON将数据转换为可阅读的Json格式
  • SplitJson进行切割获得单独的对象
  • PutHDFS将所有对象写入HDFS中

2)、模板2处理流程

QueryDatabaseTable ——> ConvertAvroToJSON ——> SplitJson ——> ControlRate ——> PutHDFS

  • QueryDatabaseTable读取Mysql数据
  • ConvertAvroToJSON将数据转换为可阅读的Json格式
  • SplitJson进行切割获得单独的对象
  • ControlRate
  • PutHDFS将所有对象写入HDFS中

二、处理器说明

本处介绍该示例使用到的处理。

1、QueryDatabaseTable

1)、描述

生成SQL选择查询,或使用提供的语句,并执行该语句以获取其指定的“最大值”列中的值大于先前看到的最大值的所有行。查询结果将转换为Avro格式。几种属性都支持表达式语言,但不允许传入连接。变量注册表可用于为包含表达式语言的任何属性提供值。如果需要利用流文件属性来执行这些查询,则可以将GenerateTableFetch和/或ExecuteSQL处理器用于此目的。使用流技术,因此支持任意大的结果集。使用标准调度方法,可以将该处理器调度为在计时器或cron表达式上运行。该处理器只能在主节点上运行。

2)、属性配置

在下面的列表中,列出所有默认值,以及属性是否支持NiFi表达式语言
6、NIFI综合应用场景-离线同步Mysql数据到HDFS中_第1张图片

2、ConvertAvroToJSON

1)、描述

​ 将Binary Avro记录转换为JSON对象。该处理器提供了Avro字段到JSON字段的直接映射,因此,生成的JSON将具有与Avro文档相同的层次结构。请注意,Avro模式信息将丢失,因为这不是从二进制Avro到JSON格式的Avro的转换。输出JSON编码为UTF-8编码。如果传入的FlowFile包含多个Avro记录的流,则生成的FlowFile将包含一个JSON Array,其中包含所有Avro记录或JSON对象序列。如果传入的FlowFile不包含任何记录,则输出为空JSON对象。空/单个Avro记录FlowFile输入可以根据“包装单个记录”的要求选择包装在容器中。

2)、属性配置

在下面的列表中,列出属性及其默认值
在这里插入图片描述

3、SplitJson

1)、描述

该处理器使用JsonPath表达式指定需要的数组元素,将JSON数组分割为多个单独的流文件。每个生成的流文件都由指定数组的一个元素组成,并传输到关系“split”,原始文件传输到关系“original”。如果没有找到指定的JsonPath,或者没有对数组元素求值,则将原始文件路由到“failure”,不会生成任何文件。
该处理器需要使用人员掌握JsonPath表达式语言。

2)、属性配置

在下面的列表中,列出属性默认值(如果有默认值),以及属性是否支持表达式语言
在这里插入图片描述

4、PutHDFS

1)、描述

将FlowFile数据写入Hadoop分布式文件系统(HDFS)

2)、属性配置

在下面的列表中,列出所有属性及默认值,以及属性是否支持NiFi表达式语言
6、NIFI综合应用场景-离线同步Mysql数据到HDFS中_第2张图片

三、操作

1、创建组

6、NIFI综合应用场景-离线同步Mysql数据到HDFS中_第3张图片

2、创建并配置QueryDatabaseTable

6、NIFI综合应用场景-离线同步Mysql数据到HDFS中_第4张图片

3、创建并配置Mysql连接池

1)、创建

2)、配置

6、NIFI综合应用场景-离线同步Mysql数据到HDFS中_第5张图片

Database Connection URL = jdbc:mysql://192.168.10.44:3306/test?characterEncoding=UTF-8&useSSL=false&allowPublicKeyRetrieval=true
Database Driver Class Name = com.mysql.jdbc.Driver
#此处的jar包需要提前上传到nifi服务器中
Database Driver Location(s) = /usr/local/bigdata/imply-3.0.4/dist/druid/extensions/mysql-metadata-storage/mysql-connector-java-5.1.44.jar
Database User = root
Password = 8888888

3)、启动连接池

6、NIFI综合应用场景-离线同步Mysql数据到HDFS中_第6张图片

即便参数配置错了,还是能启动的,原因不详

4、创建并配置ConvertAvroToJSON

QueryDatabaseTable从ExecuteSQL里出来的是avro格式的数据,要先将其转化成json格式

1)、创建配置ConvertAvroToJSON

6、NIFI综合应用场景-离线同步Mysql数据到HDFS中_第7张图片

2)、连接

6、NIFI综合应用场景-离线同步Mysql数据到HDFS中_第8张图片

3)、负载均衡消费数据

6、NIFI综合应用场景-离线同步Mysql数据到HDFS中_第9张图片

5、创建并配置SplitJson

从上一步输出的数据是由多条记录构成的整体,需要将其分割成独立的单条数据
拖入一个SplitJson processor到界面中,然后从ConvertAvroToJson连一条线到SplitJson,关系为success。
配置SplitJson,在properties页,将JsonPath Expression设置为$.*
6、NIFI综合应用场景-离线同步Mysql数据到HDFS中_第10张图片
6、NIFI综合应用场景-离线同步Mysql数据到HDFS中_第11张图片

6、创建并配置PutHDFS

6、NIFI综合应用场景-离线同步Mysql数据到HDFS中_第12张图片

6、NIFI综合应用场景-离线同步Mysql数据到HDFS中_第13张图片

Hadoop Configuration Resources = /export/download/config/hdfs-site.xml,/export/download/config/core-site.xml
Directory = /user/hive/warehouse/nifi_test.db/user_info_nifi
Conflict Resolution Strategy = append

根据需要设置QueryDatabaseTable processor的scheduling选项,默认的执行间隔是0秒,即不间断的执行SQL语句,会导致从Mysql中读出大量重复数据。如果仅仅需要将一次SQL查询的结果导入HBase,建议将该值设置大一些,等待执行完毕后手动结束即可;如果需要定期执行,则应设置合适的执行间隔时间。

其不能自己控制每个处理器完成任务的时间,需要人工自己控制。

四、验证

1、启动QueryDatabaseTable,并查看队列中数据

6、NIFI综合应用场景-离线同步Mysql数据到HDFS中_第14张图片

2、启动ConvertAvroToJSON,并查看队列中数据

6、NIFI综合应用场景-离线同步Mysql数据到HDFS中_第15张图片

3、启动SplitJson,并查看队列中数据

6、NIFI综合应用场景-离线同步Mysql数据到HDFS中_第16张图片

4、启动PutHDFS,并查看处理器接收和输出的数据

6、NIFI综合应用场景-离线同步Mysql数据到HDFS中_第17张图片

  • 如果配置的压缩方式与hadoop的压缩方式不一致,则需要配置保持一致;如果NiFi本身没有相应的jar包,则需要添加Additional Classpath Resources指定具体的jar包的位置。
  • NiFi的部署用户与hadoop HDFS用户是否一致,如果不一致,则需要设置一致,一般而言可能需要修改HDFS文件对应的用户权限
    如果按照上述配置,可能存在如下异常
Caused by: org.apache.hadoop.ipc.RemoteException: 
Failed to APPEND_FILE /user/hive/warehouse/test.db/testuser/06b034cf-f4a0-49f1-9742-7b6d74ce024b.lzo_deflate for DFSClient_NONMAPREDUCE_2099184430_144 on 192.168.10.41 
because this file lease is currently owned by DFSClient_NONMAPREDUCE_-1635697973_57 on 192.168.10.42

经查询相关资料提示,需要增加ControlRate处理器,设置最大的速率。具体参考模板2。

5、查看HDFS数据

通过hue查看该表的前提是hive中已经创建表。验证该步骤的前提是已经将数据同步到hive中,并且hue环境好用,否则可以通过hadoop的命令直接查看文件内容。
6、NIFI综合应用场景-离线同步Mysql数据到HDFS中_第18张图片

你可能感兴趣的:(大数据相关组件介绍,hdfs,mysql,大数据,big,data,分布式)