需求描述
1, 使用nifi每天跑一次,把confluence的昨天的谁写了什么题目的记录同步到一张新表
2, 使用superset设置一个dashboard,观看最近两周的每人每天的贡献度
相关数据
confluence.CONTENT
CREATE TABLE `CONTENT` (
`CONTENTID` bigint(20) NOT NULL ,
`HIBERNATEVERSION` int(11) NOT NULL DEFAULT 0 ,
`CONTENTTYPE` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL ,
`TITLE` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NULL DEFAULT NULL ,
`LOWERTITLE` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NULL DEFAULT NULL ,
`VERSION` int(11) NULL DEFAULT NULL ,
`CREATOR` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NULL DEFAULT NULL ,
`CREATIONDATE` datetime NULL DEFAULT NULL ,
`LASTMODIFIER` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NULL DEFAULT NULL ,
`LASTMODDATE` datetime NULL DEFAULT NULL ,
`VERSIONCOMMENT` mediumtext CHARACTER SET utf8 COLLATE utf8_bin NULL ,
`PREVVER` bigint(20) NULL DEFAULT NULL ,
`CONTENT_STATUS` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NULL DEFAULT NULL ,
`PAGEID` bigint(20) NULL DEFAULT NULL ,
`SPACEID` bigint(20) NULL DEFAULT NULL ,
`CHILD_POSITION` int(11) NULL DEFAULT NULL ,
`PARENTID` bigint(20) NULL DEFAULT NULL ,
`MESSAGEID` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NULL DEFAULT NULL ,
`PLUGINKEY` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NULL DEFAULT NULL ,
`PLUGINVER` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NULL DEFAULT NULL ,
`PARENTCCID` bigint(20) NULL DEFAULT NULL ,
`DRAFTPAGEID` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NULL DEFAULT NULL ,
`DRAFTSPACEKEY` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NULL DEFAULT NULL ,
`DRAFTTYPE` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NULL DEFAULT NULL ,
`DRAFTPAGEVERSION` int(11) NULL DEFAULT NULL ,
`PARENTCOMMENTID` bigint(20) NULL DEFAULT NULL ,
`USERNAME` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NULL DEFAULT NULL ,
PRIMARY KEY (`CONTENTID`)
)
ENGINE=InnoDB
DEFAULT CHARACTER SET=utf8 COLLATE=utf8_bin
ROW_FORMAT=DYNAMIC
;
confluence.user_mapping
CREATE TABLE `user_mapping` (
`user_key` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL ,
`username` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL ,
`lower_username` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NULL DEFAULT NULL ,
PRIMARY KEY (`user_key`),
UNIQUE INDEX `unq_lwr_username` (`lower_username`) USING BTREE
)
ENGINE=InnoDB
DEFAULT CHARACTER SET=utf8 COLLATE=utf8_bin
ROW_FORMAT=DYNAMIC
;
nifi_db.Commitments
CREATE TABLE `Commitments` (
`ContentId` bigint(20) NOT NULL COMMENT '内容ID' ,
`WeekOfYear` varchar(255) CHARACTER SET latin1 COLLATE latin1_swedish_ci NULL DEFAULT NULL COMMENT '年份周数' ,
`Title` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NULL DEFAULT NULL COMMENT '内容抬头' ,
`Modifier` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NULL DEFAULT NULL COMMENT '更新人' ,
`LastModDate` datetime NOT NULL COMMENT '最后更新时间' ,
`Creator` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NULL DEFAULT NULL COMMENT '创建人' ,
`CreateDate` datetime NULL DEFAULT NULL COMMENT '创建时间' ,
PRIMARY KEY (`ContentId`, `LastModDate`)
)
ENGINE=InnoDB
DEFAULT CHARACTER SET=latin1 COLLATE=latin1_swedish_ci
ROW_FORMAT=DYNAMIC
;
配置服务控制器(Controller Service)
DBCPForConfluence(DBCPConnectionPool)
PROPERTIES
properties | values |
---|---|
Database Connection URL | jdbc:mysql://47.96.97.244:3306/confluence?useUnicode=true&characterEncoding=utf8 |
Database Driver Class Name | com.mysql.jdbc.Driver |
Database Driver Location(s) | /usr/share/java/mysql-connector-java.jar |
Database User | root |
Password | *** |
DBCPForNiFi_db(DBCPConnectionPool)
PROPERTIES
properties | values |
---|---|
Database Connection URL | jdbc:mysql://gateway001:3306/nifi_db?useUnicode=true&characterEncoding=utf8 |
Database Driver Class Name | com.mysql.jdbc.Driver |
Database Driver Location(s) | /usr/share/java/mysql-connector-java.jar |
Database User | root |
Password | *** |
全量导入历史贡献记录
获取数据(ExecuteSQL)
SCHEDULING
Scheduling Strategy | Timer driven |
---|---|
Run Schedule | 1 days |
Execution | Primary node |
Concurrent Tasks | 1 |
PROPERTIES
Property | Value |
---|---|
Database Connection Pooling Service | DBCPForConfluence(见上文Controller Service) |
SQL select query | 代码见下文 |
Max Wait Time | 0 seconds |
Normalize Table/Column Names | false |
Use Avro Logical Types | false |
Default Decimal Precision | 10 |
Default Decimal Scale | 0 |
SELECT
a.CONTENTID,
a.TITLE,
b.USERNAME AS CREATOR,
c.USERNAME AS MODIFIER,
a.CREATIONDATE AS CREATEDATE,
a.LASTMODDATE,
YEARWEEK(a.LASTMODDATE) AS WeekOfYear
FROM
CONTENT a
LEFT JOIN user_mapping b ON a.CREATOR = b.user_key
LEFT JOIN user_mapping c ON a.LASTMODIFIER = c.user_key
WHERE
a.CONTENTTYPE = 'PAGE'
AND a.spaceid = 98306
AND a.title IS NOT NULL
AND a.PARENTID IS NOT NULL
ORDER BY
WeekOfYear DESC,
modifier DESC,
LASTMODDATE;
格式转化(ConvertAvroToJSON)
直接新增该Processor,默认配置即可。
SQL生成(ConvertJSONToSQL)
PROPERTIES
properties | values |
---|---|
JDBC Connection Pool | DBCPForNiFi_db(见上文) |
Statement Type | INSERT |
Table Name | Commitments |
SQL写入(PutSQL)
PROPERTIES
Property | Value |
---|---|
JDBC Connection Pool | DBCPForNiFi_db |
SQL Statement | 见下文 |
Support Fragmented Transactions | true |
Transaction TimeoutNo value setBatch Size | 100 |
Obtain Generated Keys | false |
Rollback On Failure | true |
SQL Statement
注:
该参数可为空,
当为空时,则默认执行ConvertJSONToSQL处理器提供的SQL。
当该参数不为空时,则忽略ConvertJSONToSQL处理器提供的SQL,只取其数据。
本需求场景下,此处建议置空;
REPLACE INTO Commitments (
ContentId,
Title,
Creator,
Modifier,
CreateDate,
LastModDate,
WeekOfYear
)
VALUES
(?, ?, ?, ?, ?, ?, ?)
整体流程如图:
定期(每天)导入贡献记录
获取数据(ExecuteSQL)
SCHEDULING
Scheduling Strategy | Timer driven |
---|---|
Run Schedule | 1 days |
Execution | Primary node |
Concurrent Tasks | 1 |
PROPERTIES
Property | Value |
---|---|
Database Connection Pooling Service | DBCPForConfluence(见上文Controller Service) |
SQL select query | 代码见下文 |
Max Wait Time | 0 seconds |
Normalize Table/Column Names | false |
Use Avro Logical Types | false |
Default Decimal Precision | 10 |
Default Decimal Scale | 0 |
SELECT
a.CONTENTID,
a.TITLE,
b.USERNAME AS CREATOR,
c.USERNAME AS MODIFIER,
a.CREATIONDATE AS CREATEDATE,
a.LASTMODDATE,
YEARWEEK(a.LASTMODDATE) AS WeekOfYear
FROM
CONTENT a
LEFT JOIN user_mapping b ON a.CREATOR = b.user_key
LEFT JOIN user_mapping c ON a.LASTMODIFIER = c.user_key
WHERE
a.CONTENTTYPE = 'PAGE'
AND a.spaceid = 98306
AND a.title IS NOT NULL
AND a.PARENTID IS NOT NULL
AND WEEK (a.CREATIONDATE) = WEEK (CURRENT_DATE())
ORDER BY
WeekOfYear DESC,
modifier DESC,
LASTMODDATE;
后续处理器配置同全量导入即可(见上文);
整体流程如下图: