1、nifi-1.9.2介绍、单机部署及简单验证
2、NIFI应用示例-GetFile和PutFile应用
3、NIFI处理器介绍、FlowFlie常见属性、模板介绍和运行情况信息查看
4、集群部署及验证、监控及节点管理
5、NiFi FileFlow示例和NIFI模板示例
6、NIFI应用场景-离线同步Mysql数据到HDFS中
7、NIFI综合应用场景-将mysql查询出的json数据转换成txt后存储至HDFS中
8、NIFI综合应用场景-NiFi监控MySQL binlog进行实时同步到hive
9、NIFI综合应用场景-通过NIFI配置kafka的数据同步
本文旨在说明将mysql数据同步至HDFS中,并进行验证。阅读本文前最好是阅读本系列的前面文章关于模板中的介绍。
本文的前提依赖是mysql环境有数据、hadoop、nifi、hive、hue环境是搭建好的。如果hue环境没有,则在hdfs中进行验证。
本文分为四部分,即实现流程、使用的处理器介绍、在nifi中操作和验证结果。
该模板可能出现异常–在验证中有说明–不同的环境可能存在不同。
<template encoding-version="1.2">
<description>将mysql中的数据导入到Hdfs中,并且使用lzo压缩方式。
存在重复的数据description>
<groupId>2f7d3766-0186-1000-0000-00006e07b64agroupId>
<name>MysqlToHDFSByLzoname>
<snippet>
<connections>
<id>8bacaebe-bce0-31e8-0000-000000000000id>
<parentGroupId>60d38136-211b-3d16-0000-000000000000parentGroupId>
<backPressureDataSizeThreshold>1 GBbackPressureDataSizeThreshold>
<backPressureObjectThreshold>10000backPressureObjectThreshold>
<destination>
<groupId>60d38136-211b-3d16-0000-000000000000groupId>
<id>4cb1eb1d-ca3a-34e0-0000-000000000000id>
<type>PROCESSORtype>
destination>
<flowFileExpiration>0 secflowFileExpiration>
<labelIndex>1labelIndex>
<loadBalanceCompression>COMPRESS_ATTRIBUTES_AND_CONTENTloadBalanceCompression>
<loadBalancePartitionAttribute>loadBalancePartitionAttribute>
<loadBalanceStatus>LOAD_BALANCE_INACTIVEloadBalanceStatus>
<loadBalanceStrategy>ROUND_ROBINloadBalanceStrategy>
<name>Q_Cname>
<selectedRelationships>successselectedRelationships>
<source>
<groupId>60d38136-211b-3d16-0000-000000000000groupId>
<id>c16280cc-6d1d-355c-0000-000000000000id>
<type>PROCESSORtype>
source>
<zIndex>0zIndex>
connections>
<connections>
<id>ce7dcdb2-bcd9-38a8-0000-000000000000id>
<parentGroupId>60d38136-211b-3d16-0000-000000000000parentGroupId>
<backPressureDataSizeThreshold>1 GBbackPressureDataSizeThreshold>
<backPressureObjectThreshold>10000backPressureObjectThreshold>
<destination>
<groupId>60d38136-211b-3d16-0000-000000000000groupId>
<id>26c8401a-8807-3771-0000-000000000000id>
<type>PROCESSORtype>
destination>
<flowFileExpiration>0 secflowFileExpiration>
<labelIndex>1labelIndex>
<loadBalanceCompression>COMPRESS_ATTRIBUTES_AND_CONTENTloadBalanceCompression>
<loadBalancePartitionAttribute>loadBalancePartitionAttribute>
<loadBalanceStatus>LOAD_BALANCE_INACTIVEloadBalanceStatus>
<loadBalanceStrategy>ROUND_ROBINloadBalanceStrategy>
<name>S_Pname>
<selectedRelationships>splitselectedRelationships>
<source>
<groupId>60d38136-211b-3d16-0000-000000000000groupId>
<id>20f76bcb-e978-3263-0000-000000000000id>
<type>PROCESSORtype>
source>
<zIndex>0zIndex>
connections>
<connections>
<id>f5322759-8583-3753-0000-000000000000id>
<parentGroupId>60d38136-211b-3d16-0000-000000000000parentGroupId>
<backPressureDataSizeThreshold>1 GBbackPressureDataSizeThreshold>
<backPressureObjectThreshold>10000backPressureObjectThreshold>
<destination>
<groupId>60d38136-211b-3d16-0000-000000000000groupId>
<id>20f76bcb-e978-3263-0000-000000000000id>
<type>PROCESSORtype>
destination>
<flowFileExpiration>0 secflowFileExpiration>
<labelIndex>1labelIndex>
<loadBalanceCompression>DO_NOT_COMPRESSloadBalanceCompression>
<loadBalancePartitionAttribute>loadBalancePartitionAttribute>
<loadBalanceStatus>LOAD_BALANCE_NOT_CONFIGUREDloadBalanceStatus>
<loadBalanceStrategy>DO_NOT_LOAD_BALANCEloadBalanceStrategy>
<name>C_Sname>
<selectedRelationships>successselectedRelationships>
<source>
<groupId>60d38136-211b-3d16-0000-000000000000groupId>
<id>4cb1eb1d-ca3a-34e0-0000-000000000000id>
<type>PROCESSORtype>
source>
<zIndex>0zIndex>
connections>
<controllerServices>
<id>55bee1a0-0b0c-3a63-0000-000000000000id>
<parentGroupId>60d38136-211b-3d16-0000-000000000000parentGroupId>
<bundle>
<artifact>nifi-dbcp-service-narartifact>
<group>org.apache.nifigroup>
<version>1.9.2version>
bundle>
<comments>comments>
<descriptors>
<entry>
<key>Database Connection URLkey>
<value>
<name>Database Connection URLname>
value>
entry>
<entry>
<key>Database Driver Class Namekey>
<value>
<name>Database Driver Class Namename>
value>
entry>
<entry>
<key>database-driver-locationskey>
<value>
<name>database-driver-locationsname>
value>
entry>
<entry>
<key>kerberos-credentials-servicekey>
<value>
<identifiesControllerService>org.apache.nifi.kerberos.KerberosCredentialsServiceidentifiesControllerService>
<name>kerberos-credentials-servicename>
value>
entry>
<entry>
<key>Database Userkey>
<value>
<name>Database Username>
value>
entry>
<entry>
<key>Passwordkey>
<value>
<name>Passwordname>
value>
entry>
<entry>
<key>Max Wait Timekey>
<value>
<name>Max Wait Timename>
value>
entry>
<entry>
<key>Max Total Connectionskey>
<value>
<name>Max Total Connectionsname>
value>
entry>
<entry>
<key>Validation-querykey>
<value>
<name>Validation-queryname>
value>
entry>
<entry>
<key>dbcp-min-idle-connskey>
<value>
<name>dbcp-min-idle-connsname>
value>
entry>
<entry>
<key>dbcp-max-idle-connskey>
<value>
<name>dbcp-max-idle-connsname>
value>
entry>
<entry>
<key>dbcp-max-conn-lifetimekey>
<value>
<name>dbcp-max-conn-lifetimename>
value>
entry>
<entry>
<key>dbcp-time-between-eviction-runskey>
<value>
<name>dbcp-time-between-eviction-runsname>
value>
entry>
<entry>
<key>dbcp-min-evictable-idle-timekey>
<value>
<name>dbcp-min-evictable-idle-timename>
value>
entry>
<entry>
<key>dbcp-soft-min-evictable-idle-timekey>
<value>
<name>dbcp-soft-min-evictable-idle-timename>
value>
entry>
descriptors>
<name>MySQL_ConnectionPoolname>
<persistsState>falsepersistsState>
<properties>
<entry>
<key>Database Connection URLkey>
<value>jdbc:mysql://192.168.10.44:3306/test?characterEncoding=UTF-8&useSSL=false&allowPublicKeyRetrieval=truevalue>
entry>
<entry>
<key>Database Driver Class Namekey>
<value>com.mysql.jdbc.Drivervalue>
entry>
<entry>
<key>database-driver-locationskey>
<value>/usr/local/bigdata/testdata/mysql-connector-java-5.1.44.jarvalue>
entry>
<entry>
<key>kerberos-credentials-servicekey>
entry>
<entry>
<key>Database Userkey>
<value>rootvalue>
entry>
<entry>
<key>Passwordkey>
entry>
<entry>
<key>Max Wait Timekey>
entry>
<entry>
<key>Max Total Connectionskey>
entry>
<entry>
<key>Validation-querykey>
entry>
<entry>
<key>dbcp-min-idle-connskey>
entry>
<entry>
<key>dbcp-max-idle-connskey>
entry>
<entry>
<key>dbcp-max-conn-lifetimekey>
entry>
<entry>
<key>dbcp-time-between-eviction-runskey>
entry>
<entry>
<key>dbcp-min-evictable-idle-timekey>
entry>
<entry>
<key>dbcp-soft-min-evictable-idle-timekey>
entry>
properties>
<state>ENABLEDstate>
<type>org.apache.nifi.dbcp.DBCPConnectionPooltype>
controllerServices>
<processors>
<id>20f76bcb-e978-3263-0000-000000000000id>
<parentGroupId>60d38136-211b-3d16-0000-000000000000parentGroupId>
<position>
<x>4.0x>
<y>413.5y>
position>
<bundle>
<artifact>nifi-standard-narartifact>
<group>org.apache.nifigroup>
<version>1.9.2version>
bundle>
<config>
<bulletinLevel>WARNbulletinLevel>
<comments>comments>
<concurrentlySchedulableTaskCount>1concurrentlySchedulableTaskCount>
<descriptors>
<entry>
<key>JsonPath Expressionkey>
<value>
<name>JsonPath Expressionname>
value>
entry>
<entry>
<key>Null Value Representationkey>
<value>
<name>Null Value Representationname>
value>
entry>
descriptors>
<executionNode>ALLexecutionNode>
<lossTolerant>falselossTolerant>
<penaltyDuration>30 secpenaltyDuration>
<properties>
<entry>
<key>JsonPath Expressionkey>
<value>$.*value>
entry>
<entry>
<key>Null Value Representationkey>
<value>empty stringvalue>
entry>
properties>
<runDurationMillis>0runDurationMillis>
<schedulingPeriod>0 secschedulingPeriod>
<schedulingStrategy>TIMER_DRIVENschedulingStrategy>
<yieldDuration>1 secyieldDuration>
config>
<executionNodeRestricted>falseexecutionNodeRestricted>
<name>SplitJson_Demoname>
<relationships>
<autoTerminate>trueautoTerminate>
<name>failurename>
relationships>
<relationships>
<autoTerminate>trueautoTerminate>
<name>originalname>
relationships>
<relationships>
<autoTerminate>falseautoTerminate>
<name>splitname>
relationships>
<state>STOPPEDstate>
<style/>
<type>org.apache.nifi.processors.standard.SplitJsontype>
processors>
<processors>
<id>26c8401a-8807-3771-0000-000000000000id>
<parentGroupId>60d38136-211b-3d16-0000-000000000000parentGroupId>
<position>
<x>3.0x>
<y>624.5y>
position>
<bundle>
<artifact>nifi-hadoop-narartifact>
<group>org.apache.nifigroup>
<version>1.9.2version>
bundle>
<config>
<bulletinLevel>WARNbulletinLevel>
<comments>comments>
<concurrentlySchedulableTaskCount>1concurrentlySchedulableTaskCount>
<descriptors>
<entry>
<key>Hadoop Configuration Resourceskey>
<value>
<name>Hadoop Configuration Resourcesname>
value>
entry>
<entry>
<key>kerberos-credentials-servicekey>
<value>
<identifiesControllerService>org.apache.nifi.kerberos.KerberosCredentialsServiceidentifiesControllerService>
<name>kerberos-credentials-servicename>
value>
entry>
<entry>
<key>Kerberos Principalkey>
<value>
<name>Kerberos Principalname>
value>
entry>
<entry>
<key>Kerberos Keytabkey>
<value>
<name>Kerberos Keytabname>
value>
entry>
<entry>
<key>Kerberos Relogin Periodkey>
<value>
<name>Kerberos Relogin Periodname>
value>
entry>
<entry>
<key>Additional Classpath Resourceskey>
<value>
<name>Additional Classpath Resourcesname>
value>
entry>
<entry>
<key>Directorykey>
<value>
<name>Directoryname>
value>
entry>
<entry>
<key>Conflict Resolution Strategykey>
<value>
<name>Conflict Resolution Strategyname>
value>
entry>
<entry>
<key>Block Sizekey>
<value>
<name>Block Sizename>
value>
entry>
<entry>
<key>IO Buffer Sizekey>
<value>
<name>IO Buffer Sizename>
value>
entry>
<entry>
<key>Replicationkey>
<value>
<name>Replicationname>
value>
entry>
<entry>
<key>Permissions umaskkey>
<value>
<name>Permissions umaskname>
value>
entry>
<entry>
<key>Remote Ownerkey>
<value>
<name>Remote Ownername>
value>
entry>
<entry>
<key>Remote Groupkey>
<value>
<name>Remote Groupname>
value>
entry>
<entry>
<key>Compression codeckey>
<value>
<name>Compression codecname>
value>
entry>
descriptors>
<executionNode>ALLexecutionNode>
<lossTolerant>falselossTolerant>
<penaltyDuration>30 secpenaltyDuration>
<properties>
<entry>
<key>Hadoop Configuration Resourceskey>
<value>/usr/local/bigdata/hadoop-3.1.4/etc/hadoop/hdfs-site.xml,/usr/local/bigdata/hadoop-3.1.4/etc/hadoop/core-site.xmlvalue>
entry>
<entry>
<key>kerberos-credentials-servicekey>
entry>
<entry>
<key>Kerberos Principalkey>
entry>
<entry>
<key>Kerberos Keytabkey>
entry>
<entry>
<key>Kerberos Relogin Periodkey>
<value>4 hoursvalue>
entry>
<entry>
<key>Additional Classpath Resourceskey>
<value>/usr/local/bigdata/testdata/hadoop-lzo-0.4.21-SNAPSHOT.jarvalue>
entry>
<entry>
<key>Directorykey>
<value>/user/hive/warehouse/test.db/uservalue>
entry>
<entry>
<key>Conflict Resolution Strategykey>
<value>appendvalue>
entry>
<entry>
<key>Block Sizekey>
entry>
<entry>
<key>IO Buffer Sizekey>
entry>
<entry>
<key>Replicationkey>
entry>
<entry>
<key>Permissions umaskkey>
entry>
<entry>
<key>Remote Ownerkey>
entry>
<entry>
<key>Remote Groupkey>
entry>
<entry>
<key>Compression codeckey>
<value>LZOvalue>
entry>
properties>
<runDurationMillis>0runDurationMillis>
<schedulingPeriod>0 secschedulingPeriod>
<schedulingStrategy>TIMER_DRIVENschedulingStrategy>
<yieldDuration>1 secyieldDuration>
config>
<executionNodeRestricted>falseexecutionNodeRestricted>
<name>PutHDFS_Demoname>
<relationships>
<autoTerminate>trueautoTerminate>
<name>failurename>
relationships>
<relationships>
<autoTerminate>trueautoTerminate>
<name>successname>
relationships>
<state>STOPPEDstate>
<style/>
<type>org.apache.nifi.processors.hadoop.PutHDFStype>
processors>
<processors>
<id>4cb1eb1d-ca3a-34e0-0000-000000000000id>
<parentGroupId>60d38136-211b-3d16-0000-000000000000parentGroupId>
<position>
<x>0.0x>
<y>206.5y>
position>
<bundle>
<artifact>nifi-avro-narartifact>
<group>org.apache.nifigroup>
<version>1.9.2version>
bundle>
<config>
<bulletinLevel>WARNbulletinLevel>
<comments>comments>
<concurrentlySchedulableTaskCount>1concurrentlySchedulableTaskCount>
<descriptors>
<entry>
<key>JSON container optionskey>
<value>
<name>JSON container optionsname>
value>
entry>
<entry>
<key>Wrap Single Recordkey>
<value>
<name>Wrap Single Recordname>
value>
entry>
<entry>
<key>Avro schemakey>
<value>
<name>Avro schemaname>
value>
entry>
descriptors>
<executionNode>ALLexecutionNode>
<lossTolerant>falselossTolerant>
<penaltyDuration>30 secpenaltyDuration>
<properties>
<entry>
<key>JSON container optionskey>
<value>arrayvalue>
entry>
<entry>
<key>Wrap Single Recordkey>
<value>truevalue>
entry>
<entry>
<key>Avro schemakey>
entry>
properties>
<runDurationMillis>0runDurationMillis>
<schedulingPeriod>0 secschedulingPeriod>
<schedulingStrategy>TIMER_DRIVENschedulingStrategy>
<yieldDuration>1 secyieldDuration>
config>
<executionNodeRestricted>falseexecutionNodeRestricted>
<name>ConvertAvroToJSON_Demoname>
<relationships>
<autoTerminate>trueautoTerminate>
<name>failurename>
relationships>
<relationships>
<autoTerminate>falseautoTerminate>
<name>successname>
relationships>
<state>STOPPEDstate>
<style/>
<type>org.apache.nifi.processors.avro.ConvertAvroToJSONtype>
processors>
<processors>
<id>c16280cc-6d1d-355c-0000-000000000000id>
<parentGroupId>60d38136-211b-3d16-0000-000000000000parentGroupId>
<position>
<x>9.0x>
<y>0.0y>
position>
<bundle>
<artifact>nifi-standard-narartifact>
<group>org.apache.nifigroup>
<version>1.9.2version>
bundle>
<config>
<bulletinLevel>WARNbulletinLevel>
<comments>comments>
<concurrentlySchedulableTaskCount>1concurrentlySchedulableTaskCount>
<descriptors>
<entry>
<key>Database Connection Pooling Servicekey>
<value>
<identifiesControllerService>org.apache.nifi.dbcp.DBCPServiceidentifiesControllerService>
<name>Database Connection Pooling Servicename>
value>
entry>
<entry>
<key>db-fetch-db-typekey>
<value>
<name>db-fetch-db-typename>
value>
entry>
<entry>
<key>Table Namekey>
<value>
<name>Table Namename>
value>
entry>
<entry>
<key>Columns to Returnkey>
<value>
<name>Columns to Returnname>
value>
entry>
<entry>
<key>db-fetch-where-clausekey>
<value>
<name>db-fetch-where-clausename>
value>
entry>
<entry>
<key>db-fetch-sql-querykey>
<value>
<name>db-fetch-sql-queryname>
value>
entry>
<entry>
<key>Maximum-value Columnskey>
<value>
<name>Maximum-value Columnsname>
value>
entry>
<entry>
<key>Max Wait Timekey>
<value>
<name>Max Wait Timename>
value>
entry>
<entry>
<key>Fetch Sizekey>
<value>
<name>Fetch Sizename>
value>
entry>
<entry>
<key>qdbt-max-rowskey>
<value>
<name>qdbt-max-rowsname>
value>
entry>
<entry>
<key>qdbt-output-batch-sizekey>
<value>
<name>qdbt-output-batch-sizename>
value>
entry>
<entry>
<key>qdbt-max-fragskey>
<value>
<name>qdbt-max-fragsname>
value>
entry>
<entry>
<key>dbf-normalizekey>
<value>
<name>dbf-normalizename>
value>
entry>
<entry>
<key>transaction-isolation-levelkey>
<value>
<name>transaction-isolation-levelname>
value>
entry>
<entry>
<key>dbf-user-logical-typeskey>
<value>
<name>dbf-user-logical-typesname>
value>
entry>
<entry>
<key>dbf-default-precisionkey>
<value>
<name>dbf-default-precisionname>
value>
entry>
<entry>
<key>dbf-default-scalekey>
<value>
<name>dbf-default-scalename>
value>
entry>
descriptors>
<executionNode>PRIMARYexecutionNode>
<lossTolerant>falselossTolerant>
<penaltyDuration>30 secpenaltyDuration>
<properties>
<entry>
<key>Database Connection Pooling Servicekey>
<value>55bee1a0-0b0c-3a63-0000-000000000000value>
entry>
<entry>
<key>db-fetch-db-typekey>
<value>MySQLvalue>
entry>
<entry>
<key>Table Namekey>
<value>uservalue>
entry>
<entry>
<key>Columns to Returnkey>
entry>
<entry>
<key>db-fetch-where-clausekey>
entry>
<entry>
<key>db-fetch-sql-querykey>
<value>select * from uservalue>
entry>
<entry>
<key>Maximum-value Columnskey>
entry>
<entry>
<key>Max Wait Timekey>
<value>0 secondsvalue>
entry>
<entry>
<key>Fetch Sizekey>
<value>0value>
entry>
<entry>
<key>qdbt-max-rowskey>
<value>0value>
entry>
<entry>
<key>qdbt-output-batch-sizekey>
<value>0value>
entry>
<entry>
<key>qdbt-max-fragskey>
<value>0value>
entry>
<entry>
<key>dbf-normalizekey>
<value>falsevalue>
entry>
<entry>
<key>transaction-isolation-levelkey>
entry>
<entry>
<key>dbf-user-logical-typeskey>
<value>falsevalue>
entry>
<entry>
<key>dbf-default-precisionkey>
<value>10value>
entry>
<entry>
<key>dbf-default-scalekey>
<value>0value>
entry>
properties>
<runDurationMillis>0runDurationMillis>
<schedulingPeriod>0 secschedulingPeriod>
<schedulingStrategy>TIMER_DRIVENschedulingStrategy>
<yieldDuration>1 secyieldDuration>
config>
<executionNodeRestricted>trueexecutionNodeRestricted>
<name>QueryDatabaseTable_demoname>
<relationships>
<autoTerminate>falseautoTerminate>
<name>successname>
relationships>
<state>STOPPEDstate>
<style/>
<type>org.apache.nifi.processors.standard.QueryDatabaseTabletype>
processors>
snippet>
<timestamp>02/08/2023 08:45:41 GMTtimestamp>
template>
增加了ControlRate处理器以及日志处理器,经测试未发现异常
<template encoding-version="1.2">
<description>description>
<groupId>2f7d3766-0186-1000-0000-00006e07b64agroupId>
<name>MysqlToHDFSByLzo2name>
<snippet>
<connections>
<id>25c778c6-63df-3672-0000-000000000000id>
<parentGroupId>60d38136-211b-3d16-0000-000000000000parentGroupId>
<backPressureDataSizeThreshold>1 GBbackPressureDataSizeThreshold>
<backPressureObjectThreshold>10000backPressureObjectThreshold>
<destination>
<groupId>60d38136-211b-3d16-0000-000000000000groupId>
<id>203e8481-e4c7-3340-0000-000000000000id>
<type>PROCESSORtype>
destination>
<flowFileExpiration>0 secflowFileExpiration>
<labelIndex>1labelIndex>
<loadBalanceCompression>DO_NOT_COMPRESSloadBalanceCompression>
<loadBalancePartitionAttribute>loadBalancePartitionAttribute>
<loadBalanceStatus>LOAD_BALANCE_NOT_CONFIGUREDloadBalanceStatus>
<loadBalanceStrategy>DO_NOT_LOAD_BALANCEloadBalanceStrategy>
<name>name>
<selectedRelationships>failureselectedRelationships>
<source>
<groupId>60d38136-211b-3d16-0000-000000000000groupId>
<id>4cb1eb1d-ca3a-34e0-0000-000000000000id>
<type>PROCESSORtype>
source>
<zIndex>0zIndex>
connections>
<connections>
<id>59e154ce-8ca9-329f-0000-000000000000id>
<parentGroupId>60d38136-211b-3d16-0000-000000000000parentGroupId>
<backPressureDataSizeThreshold>1 GBbackPressureDataSizeThreshold>
<backPressureObjectThreshold>10000backPressureObjectThreshold>
<destination>
<groupId>60d38136-211b-3d16-0000-000000000000groupId>
<id>26c8401a-8807-3771-0000-000000000000id>
<type>PROCESSORtype>
destination>
<flowFileExpiration>0 secflowFileExpiration>
<labelIndex>1labelIndex>
<loadBalanceCompression>COMPRESS_ATTRIBUTES_AND_CONTENTloadBalanceCompression>
<loadBalancePartitionAttribute>loadBalancePartitionAttribute>
<loadBalanceStatus>LOAD_BALANCE_INACTIVEloadBalanceStatus>
<loadBalanceStrategy>ROUND_ROBINloadBalanceStrategy>
<name>name>
<selectedRelationships>successselectedRelationships>
<source>
<groupId>60d38136-211b-3d16-0000-000000000000groupId>
<id>1b9fd194-4cdb-369f-0000-000000000000id>
<type>PROCESSORtype>
source>
<zIndex>0zIndex>
connections>
<connections>
<id>60539d1e-e7f5-396c-0000-000000000000id>
<parentGroupId>60d38136-211b-3d16-0000-000000000000parentGroupId>
<backPressureDataSizeThreshold>1 GBbackPressureDataSizeThreshold>
<backPressureObjectThreshold>10000backPressureObjectThreshold>
<destination>
<groupId>60d38136-211b-3d16-0000-000000000000groupId>
<id>203e8481-e4c7-3340-0000-000000000000id>
<type>PROCESSORtype>
destination>
<flowFileExpiration>0 secflowFileExpiration>
<labelIndex>1labelIndex>
<loadBalanceCompression>DO_NOT_COMPRESSloadBalanceCompression>
<loadBalancePartitionAttribute>loadBalancePartitionAttribute>
<loadBalanceStatus>LOAD_BALANCE_NOT_CONFIGUREDloadBalanceStatus>
<loadBalanceStrategy>DO_NOT_LOAD_BALANCEloadBalanceStrategy>
<name>name>
<selectedRelationships>failureselectedRelationships>
<source>
<groupId>60d38136-211b-3d16-0000-000000000000groupId>
<id>26c8401a-8807-3771-0000-000000000000id>
<type>PROCESSORtype>
source>
<zIndex>0zIndex>
connections>
<connections>
<id>6e3859ca-2a0d-3560-0000-000000000000id>
<parentGroupId>60d38136-211b-3d16-0000-000000000000parentGroupId>
<backPressureDataSizeThreshold>1 GBbackPressureDataSizeThreshold>
<backPressureObjectThreshold>10000backPressureObjectThreshold>
<destination>
<groupId>60d38136-211b-3d16-0000-000000000000groupId>
<id>1b9fd194-4cdb-369f-0000-000000000000id>
<type>PROCESSORtype>
destination>
<flowFileExpiration>0 secflowFileExpiration>
<labelIndex>1labelIndex>
<loadBalanceCompression>COMPRESS_ATTRIBUTES_AND_CONTENTloadBalanceCompression>
<loadBalancePartitionAttribute>loadBalancePartitionAttribute>
<loadBalanceStatus>LOAD_BALANCE_INACTIVEloadBalanceStatus>
<loadBalanceStrategy>ROUND_ROBINloadBalanceStrategy>
<name>S_Cname>
<selectedRelationships>splitselectedRelationships>
<source>
<groupId>60d38136-211b-3d16-0000-000000000000groupId>
<id>20f76bcb-e978-3263-0000-000000000000id>
<type>PROCESSORtype>
source>
<zIndex>0zIndex>
connections>
<connections>
<id>7b343e88-ab1a-30ee-0000-000000000000id>
<parentGroupId>60d38136-211b-3d16-0000-000000000000parentGroupId>
<backPressureDataSizeThreshold>1 GBbackPressureDataSizeThreshold>
<backPressureObjectThreshold>10000backPressureObjectThreshold>
<destination>
<groupId>60d38136-211b-3d16-0000-000000000000groupId>
<id>203e8481-e4c7-3340-0000-000000000000id>
<type>PROCESSORtype>
destination>
<flowFileExpiration>0 secflowFileExpiration>
<labelIndex>1labelIndex>
<loadBalanceCompression>DO_NOT_COMPRESSloadBalanceCompression>
<loadBalancePartitionAttribute>loadBalancePartitionAttribute>
<loadBalanceStatus>LOAD_BALANCE_NOT_CONFIGUREDloadBalanceStatus>
<loadBalanceStrategy>DO_NOT_LOAD_BALANCEloadBalanceStrategy>
<name>name>
<selectedRelationships>successselectedRelationships>
<source>
<groupId>60d38136-211b-3d16-0000-000000000000groupId>
<id>c16280cc-6d1d-355c-0000-000000000000id>
<type>PROCESSORtype>
source>
<zIndex>0zIndex>
connections>
<connections>
<id>8bacaebe-bce0-31e8-0000-000000000000id>
<parentGroupId>60d38136-211b-3d16-0000-000000000000parentGroupId>
<backPressureDataSizeThreshold>1 GBbackPressureDataSizeThreshold>
<backPressureObjectThreshold>10000backPressureObjectThreshold>
<destination>
<groupId>60d38136-211b-3d16-0000-000000000000groupId>
<id>4cb1eb1d-ca3a-34e0-0000-000000000000id>
<type>PROCESSORtype>
destination>
<flowFileExpiration>0 secflowFileExpiration>
<labelIndex>1labelIndex>
<loadBalanceCompression>COMPRESS_ATTRIBUTES_AND_CONTENTloadBalanceCompression>
<loadBalancePartitionAttribute>loadBalancePartitionAttribute>
<loadBalanceStatus>LOAD_BALANCE_INACTIVEloadBalanceStatus>
<loadBalanceStrategy>ROUND_ROBINloadBalanceStrategy>
<name>Q_Cname>
<selectedRelationships>successselectedRelationships>
<source>
<groupId>60d38136-211b-3d16-0000-000000000000groupId>
<id>c16280cc-6d1d-355c-0000-000000000000id>
<type>PROCESSORtype>
source>
<zIndex>0zIndex>
connections>
<connections>
<id>ee0fcd22-6c7c-3edc-0000-000000000000id>
<parentGroupId>60d38136-211b-3d16-0000-000000000000parentGroupId>
<backPressureDataSizeThreshold>1 GBbackPressureDataSizeThreshold>
<backPressureObjectThreshold>10000backPressureObjectThreshold>
<destination>
<groupId>60d38136-211b-3d16-0000-000000000000groupId>
<id>203e8481-e4c7-3340-0000-000000000000id>
<type>PROCESSORtype>
destination>
<flowFileExpiration>0 secflowFileExpiration>
<labelIndex>1labelIndex>
<loadBalanceCompression>COMPRESS_ATTRIBUTES_AND_CONTENTloadBalanceCompression>
<loadBalancePartitionAttribute>loadBalancePartitionAttribute>
<loadBalanceStatus>LOAD_BALANCE_INACTIVEloadBalanceStatus>
<loadBalanceStrategy>ROUND_ROBINloadBalanceStrategy>
<name>name>
<selectedRelationships>failureselectedRelationships>
<source>
<groupId>60d38136-211b-3d16-0000-000000000000groupId>
<id>1b9fd194-4cdb-369f-0000-000000000000id>
<type>PROCESSORtype>
source>
<zIndex>0zIndex>
connections>
<connections>
<id>f4577d45-be28-3c83-0000-000000000000id>
<parentGroupId>60d38136-211b-3d16-0000-000000000000parentGroupId>
<backPressureDataSizeThreshold>1 GBbackPressureDataSizeThreshold>
<backPressureObjectThreshold>10000backPressureObjectThreshold>
<destination>
<groupId>60d38136-211b-3d16-0000-000000000000groupId>
<id>203e8481-e4c7-3340-0000-000000000000id>
<type>PROCESSORtype>
destination>
<flowFileExpiration>0 secflowFileExpiration>
<labelIndex>1labelIndex>
<loadBalanceCompression>DO_NOT_COMPRESSloadBalanceCompression>
<loadBalancePartitionAttribute>loadBalancePartitionAttribute>
<loadBalanceStatus>LOAD_BALANCE_NOT_CONFIGUREDloadBalanceStatus>
<loadBalanceStrategy>DO_NOT_LOAD_BALANCEloadBalanceStrategy>
<name>name>
<selectedRelationships>failureselectedRelationships>
<source>
<groupId>60d38136-211b-3d16-0000-000000000000groupId>
<id>20f76bcb-e978-3263-0000-000000000000id>
<type>PROCESSORtype>
source>
<zIndex>0zIndex>
connections>
<connections>
<id>f5322759-8583-3753-0000-000000000000id>
<parentGroupId>60d38136-211b-3d16-0000-000000000000parentGroupId>
<backPressureDataSizeThreshold>1 GBbackPressureDataSizeThreshold>
<backPressureObjectThreshold>10000backPressureObjectThreshold>
<destination>
<groupId>60d38136-211b-3d16-0000-000000000000groupId>
<id>20f76bcb-e978-3263-0000-000000000000id>
<type>PROCESSORtype>
destination>
<flowFileExpiration>0 secflowFileExpiration>
<labelIndex>1labelIndex>
<loadBalanceCompression>COMPRESS_ATTRIBUTES_AND_CONTENTloadBalanceCompression>
<loadBalancePartitionAttribute>loadBalancePartitionAttribute>
<loadBalanceStatus>LOAD_BALANCE_INACTIVEloadBalanceStatus>
<loadBalanceStrategy>ROUND_ROBINloadBalanceStrategy>
<name>C_Sname>
<selectedRelationships>successselectedRelationships>
<source>
<groupId>60d38136-211b-3d16-0000-000000000000groupId>
<id>4cb1eb1d-ca3a-34e0-0000-000000000000id>
<type>PROCESSORtype>
source>
<zIndex>0zIndex>
connections>
<controllerServices>
<id>55bee1a0-0b0c-3a63-0000-000000000000id>
<parentGroupId>60d38136-211b-3d16-0000-000000000000parentGroupId>
<bundle>
<artifact>nifi-dbcp-service-narartifact>
<group>org.apache.nifigroup>
<version>1.9.2version>
bundle>
<comments>comments>
<descriptors>
<entry>
<key>Database Connection URLkey>
<value>
<name>Database Connection URLname>
value>
entry>
<entry>
<key>Database Driver Class Namekey>
<value>
<name>Database Driver Class Namename>
value>
entry>
<entry>
<key>database-driver-locationskey>
<value>
<name>database-driver-locationsname>
value>
entry>
<entry>
<key>kerberos-credentials-servicekey>
<value>
<identifiesControllerService>org.apache.nifi.kerberos.KerberosCredentialsServiceidentifiesControllerService>
<name>kerberos-credentials-servicename>
value>
entry>
<entry>
<key>Database Userkey>
<value>
<name>Database Username>
value>
entry>
<entry>
<key>Passwordkey>
<value>
<name>Passwordname>
value>
entry>
<entry>
<key>Max Wait Timekey>
<value>
<name>Max Wait Timename>
value>
entry>
<entry>
<key>Max Total Connectionskey>
<value>
<name>Max Total Connectionsname>
value>
entry>
<entry>
<key>Validation-querykey>
<value>
<name>Validation-queryname>
value>
entry>
<entry>
<key>dbcp-min-idle-connskey>
<value>
<name>dbcp-min-idle-connsname>
value>
entry>
<entry>
<key>dbcp-max-idle-connskey>
<value>
<name>dbcp-max-idle-connsname>
value>
entry>
<entry>
<key>dbcp-max-conn-lifetimekey>
<value>
<name>dbcp-max-conn-lifetimename>
value>
entry>
<entry>
<key>dbcp-time-between-eviction-runskey>
<value>
<name>dbcp-time-between-eviction-runsname>
value>
entry>
<entry>
<key>dbcp-min-evictable-idle-timekey>
<value>
<name>dbcp-min-evictable-idle-timename>
value>
entry>
<entry>
<key>dbcp-soft-min-evictable-idle-timekey>
<value>
<name>dbcp-soft-min-evictable-idle-timename>
value>
entry>
descriptors>
<name>MySQL_ConnectionPoolname>
<persistsState>falsepersistsState>
<properties>
<entry>
<key>Database Connection URLkey>
<value>jdbc:mysql://192.168.10.44:3306/test?characterEncoding=UTF-8&useSSL=false&allowPublicKeyRetrieval=truevalue>
entry>
<entry>
<key>Database Driver Class Namekey>
<value>com.mysql.jdbc.Drivervalue>
entry>
<entry>
<key>database-driver-locationskey>
<value>/usr/local/bigdata/testdata/mysql-connector-java-5.1.44.jarvalue>
entry>
<entry>
<key>kerberos-credentials-servicekey>
entry>
<entry>
<key>Database Userkey>
<value>rootvalue>
entry>
<entry>
<key>Passwordkey>
entry>
<entry>
<key>Max Wait Timekey>
<value>500 millisvalue>
entry>
<entry>
<key>Max Total Connectionskey>
<value>8value>
entry>
<entry>
<key>Validation-querykey>
entry>
<entry>
<key>dbcp-min-idle-connskey>
<value>0value>
entry>
<entry>
<key>dbcp-max-idle-connskey>
<value>8value>
entry>
<entry>
<key>dbcp-max-conn-lifetimekey>
<value>-1value>
entry>
<entry>
<key>dbcp-time-between-eviction-runskey>
<value>-1value>
entry>
<entry>
<key>dbcp-min-evictable-idle-timekey>
<value>30 minsvalue>
entry>
<entry>
<key>dbcp-soft-min-evictable-idle-timekey>
<value>-1value>
entry>
properties>
<state>ENABLEDstate>
<type>org.apache.nifi.dbcp.DBCPConnectionPooltype>
controllerServices>
<processors>
<id>1b9fd194-4cdb-369f-0000-000000000000id>
<parentGroupId>60d38136-211b-3d16-0000-000000000000parentGroupId>
<position>
<x>2.974225266934127x>
<y>627.7810694387299y>
position>
<bundle>
<artifact>nifi-standard-narartifact>
<group>org.apache.nifigroup>
<version>1.9.2version>
bundle>
<config>
<bulletinLevel>WARNbulletinLevel>
<comments>comments>
<concurrentlySchedulableTaskCount>1concurrentlySchedulableTaskCount>
<descriptors>
<entry>
<key>Rate Control Criteriakey>
<value>
<name>Rate Control Criterianame>
value>
entry>
<entry>
<key>Maximum Ratekey>
<value>
<name>Maximum Ratename>
value>
entry>
<entry>
<key>Rate Controlled Attributekey>
<value>
<name>Rate Controlled Attributename>
value>
entry>
<entry>
<key>Time Durationkey>
<value>
<name>Time Durationname>
value>
entry>
<entry>
<key>Grouping Attributekey>
<value>
<name>Grouping Attributename>
value>
entry>
descriptors>
<executionNode>ALLexecutionNode>
<lossTolerant>falselossTolerant>
<penaltyDuration>30 secpenaltyDuration>
<properties>
<entry>
<key>Rate Control Criteriakey>
<value>flowfile countvalue>
entry>
<entry>
<key>Maximum Ratekey>
<value>100000value>
entry>
<entry>
<key>Rate Controlled Attributekey>
entry>
<entry>
<key>Time Durationkey>
<value>1 minvalue>
entry>
<entry>
<key>Grouping Attributekey>
entry>
properties>
<runDurationMillis>0runDurationMillis>
<schedulingPeriod>0 secschedulingPeriod>
<schedulingStrategy>TIMER_DRIVENschedulingStrategy>
<yieldDuration>1 secyieldDuration>
config>
<executionNodeRestricted>falseexecutionNodeRestricted>
<name>ControlRate_demoname>
<relationships>
<autoTerminate>falseautoTerminate>
<name>failurename>
relationships>
<relationships>
<autoTerminate>falseautoTerminate>
<name>successname>
relationships>
<state>STOPPEDstate>
<style/>
<type>org.apache.nifi.processors.standard.ControlRatetype>
processors>
<processors>
<id>203e8481-e4c7-3340-0000-000000000000id>
<parentGroupId>60d38136-211b-3d16-0000-000000000000parentGroupId>
<position>
<x>712.1617915050342x>
<y>435.16275513999926y>
position>
<bundle>
<artifact>nifi-standard-narartifact>
<group>org.apache.nifigroup>
<version>1.9.2version>
bundle>
<config>
<bulletinLevel>WARNbulletinLevel>
<comments>comments>
<concurrentlySchedulableTaskCount>1concurrentlySchedulableTaskCount>
<descriptors>
<entry>
<key>Log Levelkey>
<value>
<name>Log Levelname>
value>
entry>
<entry>
<key>Log Payloadkey>
<value>
<name>Log Payloadname>
value>
entry>
<entry>
<key>Attributes to Logkey>
<value>
<name>Attributes to Logname>
value>
entry>
<entry>
<key>attributes-to-log-regexkey>
<value>
<name>attributes-to-log-regexname>
value>
entry>
<entry>
<key>Attributes to Ignorekey>
<value>
<name>Attributes to Ignorename>
value>
entry>
<entry>
<key>attributes-to-ignore-regexkey>
<value>
<name>attributes-to-ignore-regexname>
value>
entry>
<entry>
<key>Log prefixkey>
<value>
<name>Log prefixname>
value>
entry>
<entry>
<key>character-setkey>
<value>
<name>character-setname>
value>
entry>
descriptors>
<executionNode>ALLexecutionNode>
<lossTolerant>falselossTolerant>
<penaltyDuration>30 secpenaltyDuration>
<properties>
<entry>
<key>Log Levelkey>
<value>infovalue>
entry>
<entry>
<key>Log Payloadkey>
<value>falsevalue>
entry>
<entry>
<key>Attributes to Logkey>
entry>
<entry>
<key>attributes-to-log-regexkey>
<value>.*value>
entry>
<entry>
<key>Attributes to Ignorekey>
entry>
<entry>
<key>attributes-to-ignore-regexkey>
entry>
<entry>
<key>Log prefixkey>
entry>
<entry>
<key>character-setkey>
<value>UTF-8value>
entry>
properties>
<runDurationMillis>0runDurationMillis>
<schedulingPeriod>0 secschedulingPeriod>
<schedulingStrategy>TIMER_DRIVENschedulingStrategy>
<yieldDuration>1 secyieldDuration>
config>
<executionNodeRestricted>falseexecutionNodeRestricted>
<name>LogAttribute——demoname>
<relationships>
<autoTerminate>trueautoTerminate>
<name>successname>
relationships>
<state>STOPPEDstate>
<style/>
<type>org.apache.nifi.processors.standard.LogAttributetype>
processors>
<processors>
<id>20f76bcb-e978-3263-0000-000000000000id>
<parentGroupId>60d38136-211b-3d16-0000-000000000000parentGroupId>
<position>
<x>1.783660888671875x>
<y>408.520751953125y>
position>
<bundle>
<artifact>nifi-standard-narartifact>
<group>org.apache.nifigroup>
<version>1.9.2version>
bundle>
<config>
<bulletinLevel>WARNbulletinLevel>
<comments>comments>
<concurrentlySchedulableTaskCount>3concurrentlySchedulableTaskCount>
<descriptors>
<entry>
<key>JsonPath Expressionkey>
<value>
<name>JsonPath Expressionname>
value>
entry>
<entry>
<key>Null Value Representationkey>
<value>
<name>Null Value Representationname>
value>
entry>
descriptors>
<executionNode>ALLexecutionNode>
<lossTolerant>falselossTolerant>
<penaltyDuration>30 secpenaltyDuration>
<properties>
<entry>
<key>JsonPath Expressionkey>
<value>$.*value>
entry>
<entry>
<key>Null Value Representationkey>
<value>empty stringvalue>
entry>
properties>
<runDurationMillis>0runDurationMillis>
<schedulingPeriod>0 secschedulingPeriod>
<schedulingStrategy>TIMER_DRIVENschedulingStrategy>
<yieldDuration>1 secyieldDuration>
config>
<executionNodeRestricted>falseexecutionNodeRestricted>
<name>SplitJson_Demoname>
<relationships>
<autoTerminate>falseautoTerminate>
<name>failurename>
relationships>
<relationships>
<autoTerminate>trueautoTerminate>
<name>originalname>
relationships>
<relationships>
<autoTerminate>falseautoTerminate>
<name>splitname>
relationships>
<state>STOPPEDstate>
<style/>
<type>org.apache.nifi.processors.standard.SplitJsontype>
processors>
<processors>
<id>26c8401a-8807-3771-0000-000000000000id>
<parentGroupId>60d38136-211b-3d16-0000-000000000000parentGroupId>
<position>
<x>0.0x>
<y>825.9684448242188y>
position>
<bundle>
<artifact>nifi-hadoop-narartifact>
<group>org.apache.nifigroup>
<version>1.9.2version>
bundle>
<config>
<bulletinLevel>WARNbulletinLevel>
<comments>comments>
<concurrentlySchedulableTaskCount>3concurrentlySchedulableTaskCount>
<descriptors>
<entry>
<key>Hadoop Configuration Resourceskey>
<value>
<name>Hadoop Configuration Resourcesname>
value>
entry>
<entry>
<key>kerberos-credentials-servicekey>
<value>
<identifiesControllerService>org.apache.nifi.kerberos.KerberosCredentialsServiceidentifiesControllerService>
<name>kerberos-credentials-servicename>
value>
entry>
<entry>
<key>Kerberos Principalkey>
<value>
<name>Kerberos Principalname>
value>
entry>
<entry>
<key>Kerberos Keytabkey>
<value>
<name>Kerberos Keytabname>
value>
entry>
<entry>
<key>Kerberos Relogin Periodkey>
<value>
<name>Kerberos Relogin Periodname>
value>
entry>
<entry>
<key>Additional Classpath Resourceskey>
<value>
<name>Additional Classpath Resourcesname>
value>
entry>
<entry>
<key>Directorykey>
<value>
<name>Directoryname>
value>
entry>
<entry>
<key>Conflict Resolution Strategykey>
<value>
<name>Conflict Resolution Strategyname>
value>
entry>
<entry>
<key>Block Sizekey>
<value>
<name>Block Sizename>
value>
entry>
<entry>
<key>IO Buffer Sizekey>
<value>
<name>IO Buffer Sizename>
value>
entry>
<entry>
<key>Replicationkey>
<value>
<name>Replicationname>
value>
entry>
<entry>
<key>Permissions umaskkey>
<value>
<name>Permissions umaskname>
value>
entry>
<entry>
<key>Remote Ownerkey>
<value>
<name>Remote Ownername>
value>
entry>
<entry>
<key>Remote Groupkey>
<value>
<name>Remote Groupname>
value>
entry>
<entry>
<key>Compression codeckey>
<value>
<name>Compression codecname>
value>
entry>
descriptors>
<executionNode>ALLexecutionNode>
<lossTolerant>falselossTolerant>
<penaltyDuration>30 secpenaltyDuration>
<properties>
<entry>
<key>Hadoop Configuration Resourceskey>
<value>/usr/local/bigdata/hadoop-3.1.4/etc/hadoop/hdfs-site.xml,/usr/local/bigdata/hadoop-3.1.4/etc/hadoop/core-site.xmlvalue>
entry>
<entry>
<key>kerberos-credentials-servicekey>
entry>
<entry>
<key>Kerberos Principalkey>
entry>
<entry>
<key>Kerberos Keytabkey>
entry>
<entry>
<key>Kerberos Relogin Periodkey>
<value>4 hoursvalue>
entry>
<entry>
<key>Additional Classpath Resourceskey>
<value>/usr/local/bigdata/testdata/hadoop-lzo-0.4.21-SNAPSHOT.jarvalue>
entry>
<entry>
<key>Directorykey>
<value>/user/hive/warehouse/test.db/testuservalue>
entry>
<entry>
<key>Conflict Resolution Strategykey>
<value>appendvalue>
entry>
<entry>
<key>Block Sizekey>
entry>
<entry>
<key>IO Buffer Sizekey>
entry>
<entry>
<key>Replicationkey>
entry>
<entry>
<key>Permissions umaskkey>
entry>
<entry>
<key>Remote Ownerkey>
entry>
<entry>
<key>Remote Groupkey>
entry>
<entry>
<key>Compression codeckey>
<value>LZOvalue>
entry>
properties>
<runDurationMillis>0runDurationMillis>
<schedulingPeriod>0 secschedulingPeriod>
<schedulingStrategy>TIMER_DRIVENschedulingStrategy>
<yieldDuration>1 secyieldDuration>
config>
<executionNodeRestricted>falseexecutionNodeRestricted>
<name>PutHDFS_Demoname>
<relationships>
<autoTerminate>falseautoTerminate>
<name>failurename>
relationships>
<relationships>
<autoTerminate>trueautoTerminate>
<name>successname>
relationships>
<state>STOPPEDstate>
<style/>
<type>org.apache.nifi.processors.hadoop.PutHDFStype>
processors>
<processors>
<id>4cb1eb1d-ca3a-34e0-0000-000000000000id>
<parentGroupId>60d38136-211b-3d16-0000-000000000000parentGroupId>
<position>
<x>5.04095458984375x>
<y>203.5y>
position>
<bundle>
<artifact>nifi-avro-narartifact>
<group>org.apache.nifigroup>
<version>1.9.2version>
bundle>
<config>
<bulletinLevel>WARNbulletinLevel>
<comments>comments>
<concurrentlySchedulableTaskCount>1concurrentlySchedulableTaskCount>
<descriptors>
<entry>
<key>JSON container optionskey>
<value>
<name>JSON container optionsname>
value>
entry>
<entry>
<key>Wrap Single Recordkey>
<value>
<name>Wrap Single Recordname>
value>
entry>
<entry>
<key>Avro schemakey>
<value>
<name>Avro schemaname>
value>
entry>
descriptors>
<executionNode>ALLexecutionNode>
<lossTolerant>falselossTolerant>
<penaltyDuration>30 secpenaltyDuration>
<properties>
<entry>
<key>JSON container optionskey>
<value>arrayvalue>
entry>
<entry>
<key>Wrap Single Recordkey>
<value>truevalue>
entry>
<entry>
<key>Avro schemakey>
entry>
properties>
<runDurationMillis>0runDurationMillis>
<schedulingPeriod>0 secschedulingPeriod>
<schedulingStrategy>TIMER_DRIVENschedulingStrategy>
<yieldDuration>1 secyieldDuration>
config>
<executionNodeRestricted>falseexecutionNodeRestricted>
<name>ConvertAvroToJSON_Demoname>
<relationships>
<autoTerminate>falseautoTerminate>
<name>failurename>
relationships>
<relationships>
<autoTerminate>falseautoTerminate>
<name>successname>
relationships>
<state>STOPPEDstate>
<style/>
<type>org.apache.nifi.processors.avro.ConvertAvroToJSONtype>
processors>
<processors>
<id>c16280cc-6d1d-355c-0000-000000000000id>
<parentGroupId>60d38136-211b-3d16-0000-000000000000parentGroupId>
<position>
<x>4.04095458984375x>
<y>0.0y>
position>
<bundle>
<artifact>nifi-standard-narartifact>
<group>org.apache.nifigroup>
<version>1.9.2version>
bundle>
<config>
<bulletinLevel>WARNbulletinLevel>
<comments>comments>
<concurrentlySchedulableTaskCount>1concurrentlySchedulableTaskCount>
<descriptors>
<entry>
<key>Database Connection Pooling Servicekey>
<value>
<identifiesControllerService>org.apache.nifi.dbcp.DBCPServiceidentifiesControllerService>
<name>Database Connection Pooling Servicename>
value>
entry>
<entry>
<key>db-fetch-db-typekey>
<value>
<name>db-fetch-db-typename>
value>
entry>
<entry>
<key>Table Namekey>
<value>
<name>Table Namename>
value>
entry>
<entry>
<key>Columns to Returnkey>
<value>
<name>Columns to Returnname>
value>
entry>
<entry>
<key>db-fetch-where-clausekey>
<value>
<name>db-fetch-where-clausename>
value>
entry>
<entry>
<key>db-fetch-sql-querykey>
<value>
<name>db-fetch-sql-queryname>
value>
entry>
<entry>
<key>Maximum-value Columnskey>
<value>
<name>Maximum-value Columnsname>
value>
entry>
<entry>
<key>Max Wait Timekey>
<value>
<name>Max Wait Timename>
value>
entry>
<entry>
<key>Fetch Sizekey>
<value>
<name>Fetch Sizename>
value>
entry>
<entry>
<key>qdbt-max-rowskey>
<value>
<name>qdbt-max-rowsname>
value>
entry>
<entry>
<key>qdbt-output-batch-sizekey>
<value>
<name>qdbt-output-batch-sizename>
value>
entry>
<entry>
<key>qdbt-max-fragskey>
<value>
<name>qdbt-max-fragsname>
value>
entry>
<entry>
<key>dbf-normalizekey>
<value>
<name>dbf-normalizename>
value>
entry>
<entry>
<key>transaction-isolation-levelkey>
<value>
<name>transaction-isolation-levelname>
value>
entry>
<entry>
<key>dbf-user-logical-typeskey>
<value>
<name>dbf-user-logical-typesname>
value>
entry>
<entry>
<key>dbf-default-precisionkey>
<value>
<name>dbf-default-precisionname>
value>
entry>
<entry>
<key>dbf-default-scalekey>
<value>
<name>dbf-default-scalename>
value>
entry>
descriptors>
<executionNode>PRIMARYexecutionNode>
<lossTolerant>falselossTolerant>
<penaltyDuration>30 secpenaltyDuration>
<properties>
<entry>
<key>Database Connection Pooling Servicekey>
<value>55bee1a0-0b0c-3a63-0000-000000000000value>
entry>
<entry>
<key>db-fetch-db-typekey>
<value>MySQLvalue>
entry>
<entry>
<key>Table Namekey>
<value>dx_uservalue>
entry>
<entry>
<key>Columns to Returnkey>
entry>
<entry>
<key>db-fetch-where-clausekey>
entry>
<entry>
<key>db-fetch-sql-querykey>
<value>select * from dx_user value>
entry>
<entry>
<key>Maximum-value Columnskey>
entry>
<entry>
<key>Max Wait Timekey>
<value>0 secondsvalue>
entry>
<entry>
<key>Fetch Sizekey>
<value>0value>
entry>
<entry>
<key>qdbt-max-rowskey>
<value>0value>
entry>
<entry>
<key>qdbt-output-batch-sizekey>
<value>0value>
entry>
<entry>
<key>qdbt-max-fragskey>
<value>0value>
entry>
<entry>
<key>dbf-normalizekey>
<value>falsevalue>
entry>
<entry>
<key>transaction-isolation-levelkey>
entry>
<entry>
<key>dbf-user-logical-typeskey>
<value>falsevalue>
entry>
<entry>
<key>dbf-default-precisionkey>
<value>10value>
entry>
<entry>
<key>dbf-default-scalekey>
<value>0value>
entry>
properties>
<runDurationMillis>0runDurationMillis>
<schedulingPeriod>86400 secschedulingPeriod>
<schedulingStrategy>TIMER_DRIVENschedulingStrategy>
<yieldDuration>1 secyieldDuration>
config>
<executionNodeRestricted>trueexecutionNodeRestricted>
<name>QueryDatabaseTable_demoname>
<relationships>
<autoTerminate>falseautoTerminate>
<name>successname>
relationships>
<state>STOPPEDstate>
<style/>
<type>org.apache.nifi.processors.standard.QueryDatabaseTabletype>
processors>
snippet>
<timestamp>02/09/2023 05:48:36 GMTtimestamp>
template>
QueryDatabaseTable ——> ConvertAvroToJSON ——> SplitJson ——> PutHDFS
QueryDatabaseTable ——> ConvertAvroToJSON ——> SplitJson ——> ControlRate ——> PutHDFS
本处介绍该示例使用到的处理。
生成SQL选择查询,或使用提供的语句,并执行该语句以获取其指定的“最大值”列中的值大于先前看到的最大值的所有行。查询结果将转换为Avro格式。几种属性都支持表达式语言,但不允许传入连接。变量注册表可用于为包含表达式语言的任何属性提供值。如果需要利用流文件属性来执行这些查询,则可以将GenerateTableFetch和/或ExecuteSQL处理器用于此目的。使用流技术,因此支持任意大的结果集。使用标准调度方法,可以将该处理器调度为在计时器或cron表达式上运行。该处理器只能在主节点上运行。
在下面的列表中,列出所有默认值,以及属性是否支持NiFi表达式语言
将Binary Avro记录转换为JSON对象。该处理器提供了Avro字段到JSON字段的直接映射,因此,生成的JSON将具有与Avro文档相同的层次结构。请注意,Avro模式信息将丢失,因为这不是从二进制Avro到JSON格式的Avro的转换。输出JSON编码为UTF-8编码。如果传入的FlowFile包含多个Avro记录的流,则生成的FlowFile将包含一个JSON Array,其中包含所有Avro记录或JSON对象序列。如果传入的FlowFile不包含任何记录,则输出为空JSON对象。空/单个Avro记录FlowFile输入可以根据“包装单个记录”的要求选择包装在容器中。
该处理器使用JsonPath表达式指定需要的数组元素,将JSON数组分割为多个单独的流文件。每个生成的流文件都由指定数组的一个元素组成,并传输到关系“split”,原始文件传输到关系“original”。如果没有找到指定的JsonPath,或者没有对数组元素求值,则将原始文件路由到“failure”,不会生成任何文件。
该处理器需要使用人员掌握JsonPath表达式语言。
在下面的列表中,列出属性默认值(如果有默认值),以及属性是否支持表达式语言
将FlowFile数据写入Hadoop分布式文件系统(HDFS)
在下面的列表中,列出所有属性及默认值,以及属性是否支持NiFi表达式语言
Database Connection URL = jdbc:mysql://192.168.10.44:3306/test?characterEncoding=UTF-8&useSSL=false&allowPublicKeyRetrieval=true
Database Driver Class Name = com.mysql.jdbc.Driver
#此处的jar包需要提前上传到nifi服务器中
Database Driver Location(s) = /usr/local/bigdata/imply-3.0.4/dist/druid/extensions/mysql-metadata-storage/mysql-connector-java-5.1.44.jar
Database User = root
Password = 8888888
即便参数配置错了,还是能启动的,原因不详
QueryDatabaseTable从ExecuteSQL里出来的是avro格式的数据,要先将其转化成json格式
从上一步输出的数据是由多条记录构成的整体,需要将其分割成独立的单条数据
拖入一个SplitJson processor到界面中,然后从ConvertAvroToJson连一条线到SplitJson,关系为success。
配置SplitJson,在properties页,将JsonPath Expression设置为$.*
Hadoop Configuration Resources = /export/download/config/hdfs-site.xml,/export/download/config/core-site.xml
Directory = /user/hive/warehouse/nifi_test.db/user_info_nifi
Conflict Resolution Strategy = append
根据需要设置QueryDatabaseTable processor的scheduling选项,默认的执行间隔是0秒,即不间断的执行SQL语句,会导致从Mysql中读出大量重复数据。如果仅仅需要将一次SQL查询的结果导入HBase,建议将该值设置大一些,等待执行完毕后手动结束即可;如果需要定期执行,则应设置合适的执行间隔时间。
其不能自己控制每个处理器完成任务的时间,需要人工自己控制。
Caused by: org.apache.hadoop.ipc.RemoteException:
Failed to APPEND_FILE /user/hive/warehouse/test.db/testuser/06b034cf-f4a0-49f1-9742-7b6d74ce024b.lzo_deflate for DFSClient_NONMAPREDUCE_2099184430_144 on 192.168.10.41
because this file lease is currently owned by DFSClient_NONMAPREDUCE_-1635697973_57 on 192.168.10.42
经查询相关资料提示,需要增加ControlRate处理器,设置最大的速率。具体参考模板2。
通过hue查看该表的前提是hive中已经创建表。验证该步骤的前提是已经将数据同步到hive中,并且hue环境好用,否则可以通过hadoop的命令直接查看文件内容。