本文章将展示如何将一个包含多条数据的文本文件保存到数据库中,每条数据对应数据库中的一条记录
下面将依次详细说明每个核心processor的配置以及完成的功能
因为输入文件的格式不符合CSV json等格式,因此我们需要对其进行格式转换,将每一行的内容通过”::”分割,然后采用”;”进行拼接(或者直接将”::”替换为”;”),也就是转换为csv格式。同时对可以自定义csv表头
import org.apache.commons.io.IOUtils
import org.apache.nifi.processor.io.StreamCallback
import java.nio.charset.*
def flowFile = session.get()
if (!flowFile) return
flowFile = session.write(flowFile, { inputStream, outputStream ->
def stringBuilder = new StringBuilder()
// 添加csv表头
stringBuilder.append("id;director;type\n")
def tellTaleHeart = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
def words = tellTaleHeart.split("::|\\n")
def length=words.length
def count=0
for(int i=0;i<length;i++){
String word= words[i]
if(word!=null&&word.length()>0){
stringBuilder.append(word)
count=count+1
if(count!=0&&count%3==0)
stringBuilder.append("\n")
else
stringBuilder.append(";")
}
}
outputStream.write(stringBuilder.toString().getBytes(StandardCharsets.UTF_8))
} as StreamCallback)
flowFile = session.putAttribute(flowFile, 'filename', 'movies')
session.transfer(flowFile, REL_SUCCESS)
{
"type":"record",
"name":"MovieRecord",
"fields":[
{"name":"id","type":"long"},
{"name":"director","type":["null","string"]},
{"name":"filmType","type":["null","string"]}
]
}
[
{"id":115,"director":"Happiness Is in the Field (Bonheur est dans le pr茅, Le) (1995)","filmType":"Comedy"},
{"id":116,"director":"Anne Frank Remembered (1995)","filmType":"Documentary"},
{"id":117,"director":"Young Poisoner's Handbook, The (1995)","filmType":"Crime|Drama"},
{"id":118,"director":"If Lucy Fell (1996)","filmType":"Comedy|Romance"},
{"id":119,"director":"Steal Big, Steal Little (1995)","filmType":"Comedy"},{"id":120,"director":"Race the Sun (1996)","filmType":"Drama"},{"id":121,"director":"Boys of St. Vincent, The (1992)","filmType":"Drama"},
....
]