CSV 是一种非常通用的数据存储方式。在之前的好几篇文章中,我们使用了好几种的方法来把 CSV 格式的文件导入到 Elasticsearch 中。你可以参阅一下的文章:
Logstash:运用 Elastic Stack 分析 CSDN 阅读量
Logstash:导入zipcode CSV文件和Geo Search体验
Kibana: 运用Data Visualizer来分析CSV数据
在今天我们来使用另外一种方式来展示如何使用 Logstash 的 dissect filter 来导入一个 CSV 格式的数据。
我们首先安装好自己的 Elasticsearch 及 Kibana。如果你还没安装好的话,请参阅我之前的文章 “Elastic:菜鸟上手指南”。
你可以参阅我之前的文章 “如何安装Elastic栈中的Logstash” 来安装自己的 Logstash。
为了说明问题的方便,我创建了如下的一个简单的 csv 文件:
test.csv
"device1","London","Engineering","Computer"
"device2","Toronto","Consulting","Mouse"
"device3","Winnipeg","Sales","Computer"
"device4","Barcelona","Engineering","Phone"
"device5","Toronto","Consulting","Computer"
"device6","London","Consulting","Computer"
我们把这个文件保存于 logstash 的安装根目录中。在这里,请注意:CSV 中的每一个数据都是以逗号进行分隔的。在这个文件中,它没有 header,也就是说它不是像如下的格式:
"Device_ID","Device_Location","Device_Owner","Device_Type"
"device1","London","Engineering","Computer"
"device2","Toronto","Consulting","Mouse"
"device3","Winnipeg","Sales","Computer"
"device4","Barcelona","Engineering","Phone"
"device5","Toronto","Consulting","Computer"
"device6","London","Consulting","Computer"
我们能让我们的 Logstash 处理上面的 csv 文件,我们创建如下的配置文件:
logstash_dissect_csv.conf
input {
stdin{}
}
filter {
mutate {
gsub => [
"message", "\"", ""
]
}
dissect {
mapping => {
"message" => "%{Device_ID},%{Device_Location},%{Device_Owner},%{Device_Type}"
}
}
mutate {
remove_field => ["message"]
}
}
output {
stdout {
codec => "rubydebug"
}
elasticsearch {
index => "devices"
}
}
就像上面展示的那样,它接受一个从 stdin 输入的数据,并使用 filters:
我们可以使用如下的方式来运行 Logstash:
cat test.csv | sudo ./bin/logstash -f ./logstash_dissect_csv.conf
我们可以在 Logstash 的 console 中看到如下的输出:
它表明,我们的 Logstash 是正常工作的。
我们在 Kibana 中进行查看我们是否已经有一个叫做 devices 的索引:
GET devices/_search
上面的命令的显示结果:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 6,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "devices",
"_type" : "_doc",
"_id" : "qE3nanMB6PiWomqxb5U3",
"_score" : 1.0,
"_source" : {
"Device_Owner" : "Engineering",
"@timestamp" : "2020-07-20T06:26:58.645Z",
"Device_Location" : "London",
"host" : "liuxg",
"Device_Type" : "Computer",
"@version" : "1",
"Device_ID" : "device1"
}
},
{
"_index" : "devices",
"_type" : "_doc",
"_id" : "rU3nanMB6PiWomqxb5X4",
"_score" : 1.0,
"_source" : {
"Device_Owner" : "Sales",
"@timestamp" : "2020-07-20T06:26:58.656Z",
"Device_Location" : "Winnipeg",
"host" : "liuxg",
"Device_Type" : "Computer",
"@version" : "1",
"Device_ID" : "device3"
}
},
{
"_index" : "devices",
"_type" : "_doc",
"_id" : "rE3nanMB6PiWomqxb5U5",
"_score" : 1.0,
"_source" : {
"Device_Owner" : "Consulting",
"@timestamp" : "2020-07-20T06:26:58.656Z",
"Device_Location" : "Toronto",
"host" : "liuxg",
"Device_Type" : "Mouse",
"@version" : "1",
"Device_ID" : "device2"
}
},
{
"_index" : "devices",
"_type" : "_doc",
"_id" : "qk3nanMB6PiWomqxb5U4",
"_score" : 1.0,
"_source" : {
"Device_Owner" : "Engineering",
"@timestamp" : "2020-07-20T06:26:58.657Z",
"Device_Location" : "Barcelona",
"host" : "liuxg",
"Device_Type" : "Phone",
"@version" : "1",
"Device_ID" : "device4"
}
},
{
"_index" : "devices",
"_type" : "_doc",
"_id" : "qU3nanMB6PiWomqxb5U4",
"_score" : 1.0,
"_source" : {
"Device_Owner" : "Consulting",
"@timestamp" : "2020-07-20T06:26:58.657Z",
"Device_Location" : "London",
"host" : "liuxg",
"Device_Type" : "Computer",
"@version" : "1",
"Device_ID" : "device6"
}
},
{
"_index" : "devices",
"_type" : "_doc",
"_id" : "q03nanMB6PiWomqxb5U4",
"_score" : 1.0,
"_source" : {
"Device_Owner" : "Consulting",
"@timestamp" : "2020-07-20T06:26:58.657Z",
"Device_Location" : "Toronto",
"host" : "liuxg",
"Device_Type" : "Computer",
"@version" : "1",
"Device_ID" : "device5"
}
}
]
}
}
在这篇文章中,我们使用了一个和之前完全不同的方法导入 CSV。说明 Elastic Stack 是非常弹性的。我们可以使用不同的方法来达到同样的效果。真所谓:条条大路通北京。