前两天刚刚写了一篇关于性能优化的文章,描述了针对一个功能点的优化历程以及进一步优化的一些思路,这几天出差,晚上也闲来无事,然后跟同事就把优化思路实现了一把,但发现了之前的推论并不正确,在这里聊一下这个问题。
针对数据导入(N个)这一工功能,里面有一个关键步骤是数据从数十个区域(R个)中做空间匹配寻找所处的区域,我们上次讨论了三个版本的优化,如下所示:
最后提出了第四个版本的改进思路就是先批量插入,用区域来批量匹配数据,再做批量修改所属区域。这么做的依据是,V3版需要进行N次R个区域的空间匹配,而V4版需要进行N/B*R次的B个点的空间匹配。(N>>R,B>R)。推测至少有着一半以上的优化效果,甚至可能是数量级级别的。
@Override
public void transformCSV(Path filePath) {
//删除以前的雷电数据
flushDb();
try {
BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(filePath.toFile())));
CSVParser parser = CSVFormat.DEFAULT.parse(reader);
Iterator<CSVRecord> iterator = parser.iterator();
//跳过表头
if (iterator.hasNext()) {
iterator.next();
}
List<List<String>> csvList = new ArrayList<>();
while (iterator.hasNext()) {
CSVRecord record = iterator.next();
//每行的内容
List<String> value = new ArrayList<>();
for (int j = 0; j < record.size(); j++) {
value.add(record.get(j));
}
if (csvList.size() >= BATCH_SIZE) {
thunderBoltDataFilter(csvList);
csvList.clear();
}
csvList.add(value);
}
thunderBoltDataFilter(csvList);
} catch (FileNotFoundException e) {
log.error("临时文件未找到", e);
} catch (Exception e) {
log.error("未定义异常", e);
flushDb();
}
}
private void flushDb() {
remove(new QueryWrapper<>());
}
private List<OriginalThunderbolt> loadDataFromCSV(List<List<String>> csvList) {
List<OriginalThunderbolt> originalThunderboltList = new ArrayList<>();
for (List<String> csvString : csvList) {
OriginalThunderbolt originalThunderbolt = new OriginalThunderbolt();
originalThunderbolt.setId(SnowFlakeIdGenerator.getInstance().nextId());
originalThunderbolt.setCode(csvString.get(0));
if (StringUtils.isNotEmpty(csvString.get(1))) {
Long time = TimeUtils.stringToDateLong(csvString.get(1));
originalThunderbolt.setTime(time);
}
originalThunderbolt.setType(csvString.get(2));
originalThunderbolt.setHeight(Double.valueOf(csvString.get(3)));
originalThunderbolt.setStrength(Double.valueOf(csvString.get(4)));
originalThunderbolt.setLatitude(Double.valueOf(csvString.get(5)));
originalThunderbolt.setLongitude(Double.valueOf(csvString.get(6)));
originalThunderbolt.setProvinces(csvString.get(7));
originalThunderbolt.setCities(csvString.get(8));
originalThunderbolt.setCounties(csvString.get(9));
originalThunderbolt.setLocationMode(csvString.get(10));
originalThunderbolt.setSteepness(csvString.get(11));
originalThunderbolt.setDeviation(csvString.get(12));
originalThunderbolt.setLocatorNumber(csvString.get(13));
originalThunderbolt.setStatus("未同步");
originalThunderbolt.setCreateTime(System.currentTimeMillis());
originalThunderbolt.setUpdateTime(System.currentTimeMillis());
originalThunderbolt.setDeleted(false);
originalThunderboltList.add(originalThunderbolt);
}
return originalThunderboltList;
}
private QueryWrapper<Thunderbolt> buildWrapper(ThunderboltPageModel thunderboltPageModel) {
QueryWrapper<Thunderbolt> wrapper = new QueryWrapper<>();
if (StringUtils.isNotEmpty(thunderboltPageModel.getForestryBureauName())) {
wrapper.likeRight("forestry_bureau", thunderboltPageModel.getForestryBureauName());
}
if (StringUtils.isNotEmpty(thunderboltPageModel.getForestryBureauName())) {
wrapper.likeRight("forest_farm", thunderboltPageModel.getForestryBureauName());
}
if (StringUtils.isNotEmpty(thunderboltPageModel.getCode())) {
wrapper.likeRight("code", thunderboltPageModel.getCode());
}
if (StringUtils.isNotEmpty(thunderboltPageModel.getType())) {
wrapper.eq("type", thunderboltPageModel.getType());
}
if (StringUtils.isNotEmpty(thunderboltPageModel.getLocatorNumber())) {
wrapper.likeRight("locator_number", thunderboltPageModel.getLocatorNumber());
}
if (StringUtils.isNotEmpty(thunderboltPageModel.getStatus())) {
wrapper.eq("status", thunderboltPageModel.getStatus());
}
if (thunderboltPageModel.getDiscoverStartTime() != null) {
wrapper.ge("time", thunderboltPageModel.getDiscoverStartTime());
}
if (thunderboltPageModel.getDiscoverEndTime() != null) {
wrapper.le("time", thunderboltPageModel.getDiscoverEndTime());
}
wrapper.orderByDesc("update_time");
wrapper.eq(DELETED_COLUMN, DEFAULT_STATUS);
return wrapper;
}
private void thunderBoltDataFilter(List<List<String>> csvList) {
List<OriginalThunderbolt> originalThunderbolts = loadDataFromCSV(csvList);
List<Point> points = new ArrayList<>();
List<OriginalThunderbolt> dmzThunderboltList = new ArrayList<>();
originalThunderbolts.forEach(th -> {
//雷点在区域内部
if (SpatialUtils.isInPolygon(th.getLongitude(), th.getLatitude(), findParasFromResource())) {
Point point = new Point(String.valueOf(th.getLongitude()), String.valueOf(th.getLatitude()));
dmzThunderboltList.add(th);
points.add(point);
}
});
//传入sim服务,进行雷点与林场进行匹配
List<OrganizationVO> matchedData = simOrgClient.matchAlarmPointWithOrgRegion(points);
List<Thunderbolt> list = new ArrayList<>();
//模型转换,插入数据库
for (int i = 0; i < dmzThunderboltList.size(); i++) {
OrganizationVO organizationVO = matchedData.get(i);
OriginalThunderbolt originalThunderbolt = dmzThunderboltList.get(i);
Thunderbolt thunderbolt = new Thunderbolt(organizationVO, originalThunderbolt);
list.add(thunderbolt);
}
insertBatchThunderbolt(list);
}
@Override
public void transformCSV(Path filePath) {
List<SynOrganizationVO> organizationVOList = simOrgClient.findAllByType("林场");
try {
BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(filePath.toFile())));
CSVParser parser = CSVFormat.DEFAULT.parse(reader);
Iterator<CSVRecord> iterator = parser.iterator();
//跳过表头
if (iterator.hasNext()) {
iterator.next();
}
List<List<String>> csvList = new ArrayList<>();
while (iterator.hasNext()) {
CSVRecord record = iterator.next();
//每行的内容
List<String> value = new ArrayList<>();
for (int j = 0; j < record.size(); j++) {
value.add(record.get(j));
}
if (csvList.size() >= BATCH_SIZE) {
thunderBoltDataFilter(csvList,organizationVOList);
csvList.clear();
}
csvList.add(value);
}
thunderBoltDataFilter(csvList,organizationVOList);
} catch (FileNotFoundException e) {
log.error("临时文件未找到", e);
} catch (Exception e) {
log.error("未定义异常", e);
deleteByStatus("未同步");
}
}
private void flushDb() {
remove(new QueryWrapper<>());
}
private void deleteByStatus(String status) {
QueryWrapper<Thunderbolt> wrapper = new QueryWrapper<>();
if (StringUtils.isNotEmpty(status)) {
wrapper.eq("status", status);
}
remove(wrapper);
}
private List<OriginalThunderbolt> loadDataFromCSV(List<List<String>> csvList) {
List<OriginalThunderbolt> originalThunderboltList = new ArrayList<>();
for (List<String> csvString : csvList) {
OriginalThunderbolt originalThunderbolt = new OriginalThunderbolt();
originalThunderbolt.setId(SnowFlakeIdGenerator.getInstance().nextId());
originalThunderbolt.setCode(csvString.get(0));
if (StringUtils.isNotEmpty(csvString.get(1))) {
Long time = TimeUtils.stringToDateLong(csvString.get(1));
originalThunderbolt.setTime(time);
}
originalThunderbolt.setType(csvString.get(2));
originalThunderbolt.setHeight(Double.valueOf(csvString.get(3)));
originalThunderbolt.setStrength(Double.valueOf(csvString.get(4)));
originalThunderbolt.setLatitude(Double.valueOf(csvString.get(5)));
originalThunderbolt.setLongitude(Double.valueOf(csvString.get(6)));
originalThunderbolt.setProvinces(csvString.get(7));
originalThunderbolt.setCities(csvString.get(8));
originalThunderbolt.setCounties(csvString.get(9));
originalThunderbolt.setLocationMode(csvString.get(10));
originalThunderbolt.setSteepness(csvString.get(11));
originalThunderbolt.setDeviation(csvString.get(12));
originalThunderbolt.setLocatorNumber(csvString.get(13));
originalThunderbolt.setStatus("未同步");
originalThunderbolt.setCreateTime(System.currentTimeMillis());
originalThunderbolt.setUpdateTime(System.currentTimeMillis());
originalThunderbolt.setDeleted(false);
originalThunderboltList.add(originalThunderbolt);
}
return originalThunderboltList;
}
再没有对数据库进行任何改进的情况下,
20k数据量
v3,42s
v4,41s
更加糟糕的情况是在V4版本的时间并不稳定,有时候能达到近2分钟,而且有着数据越多时间并非线性增长的现象。反观V3时间则始终保持线性增长。
看到这测试结果,真是啪啪的打脸,赶紧再分析一遍问题。发现前面的推论有几个问题:
针对测试结果,果断把V4的修改回滚。。。