logstash从csv文件导入数据到elasticsearch

logstash的安装部署自行百度
注意:要和es的版本一致,这里使用的都是5.5.1版本
一、在logstash的bin目录下创建logstash.conf文件:

input {
  file {
    path => ["C:\Users\Desktop\test.csv"]  
    start_position => "beginning"
  }
}
filter {
  csv {
    separator => ","
    columns => ["name","age"]
  }
  mutate {
    convert => {
      "name" => "string"
      "age" => "integer"
    }
  }
 }

output {
  elasticsearch {
        hosts => ["127.0.0.1:9200"]
        index => "test2"
		document_type => "test2"
  }
}

其中:

input

input组件负责读取数据,使用以下插件读取不同的数据类型。
       file插件读取本地文本文件,
       stdin插件读取标准输入数据,
       tcp插件读取网络数据,
       log4j插件读取log4j发送过来的数据等等。
path:csv文件路径
start_position:可以设置为beginning或者end,beginning表示从头开始读取文件,end表示读取最新的,这个也要和ignore_older一起使用。

filter

filter插件负责过滤解析input读取的数据
读取csv文件:
separator:拆分符
columns:csv文件中的字段,注意:要和 csv文件中字段顺序一致

mutate:字段类型转换,具体可转换的数据类型,根据es版本而定,可查看官网:https://www.elastic.co/guide/en/logstash/5.5/plugins-filters-mutate.html

output

hosts:主机ip
index:设置es中的索引名称
document_type:索引下的type名称

对于csv文件需要注意一下几点:
1、第一行不需要存储字段名称,直接就是字段值信息
2、最后一行要换行
csv文件示例:
logstash从csv文件导入数据到elasticsearch_第1张图片

二、在logstash的bin目录下运行 logstash -f logstash.conf,出现如下信息则表示运行成功:

E:\softwareInstallDirecory\logstash\logstash-5.5.1\bin>logstash -f logstash.conf
ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console.
Sending Logstash's logs to E:/softwareInstallDirecory/logstash/logstash-5.5.1/logs which is now configured via log4j2.properties
[2018-10-11T10:12:20,773][INFO ][logstash.outputs.elasticsearch] Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[http://127.0.0.1:9200/]}}
[2018-10-11T10:12:20,773][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://127.0.0.1:9200/, :path=>"/"}
[2018-10-11T10:12:20,914][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>#}
[2018-10-11T10:12:20,930][INFO ][logstash.outputs.elasticsearch] Using mapping template from {:path=>nil}
[2018-10-11T10:12:20,991][INFO ][logstash.outputs.elasticsearch] Attempting to install template {:manage_template=>{"template"=>"logstash-*", "version"=>50001, "settings"=>{"index.refresh_interval"=>"5s"}, "mappings"=>{"_default_"=>{"_all"=>{"enabled"=>true, "norms"=>false}, "dynamic_templates"=>[{"message_field"=>{"path_match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false}}}, {"string_fields"=>{"match"=>"*", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false, "fields"=>{"keyword"=>{"type"=>"keyword", "ignore_above"=>256}}}}}], "properties"=>{"@timestamp"=>{"type"=>"date", "include_in_all"=>false}, "@version"=>{"type"=>"keyword", "include_in_all"=>false}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"}, "longitude"=>{"type"=>"half_float"}}}}}}}}
[2018-10-11T10:12:21,007][INFO ][logstash.outputs.elasticsearch] New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>[#]}
[2018-10-11T10:12:21,007][INFO ][logstash.pipeline        ] Starting pipeline {"id"=>"main", "pipeline.workers"=>4, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>5, "pipeline.max_inflight"=>500}
[2018-10-11T10:12:21,304][INFO ][logstash.pipeline        ] Pipeline main started
[2018-10-11T10:12:21,413][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9600}

注意:这个进程会一直处于运行状态,若csv文件被修改,程序会自动重新导入数据;
若需要重新导入数据,则需要删除logstash安装目录下\data\plugins\inputs\file下的文件。重新运行logstash -f logstash.conf

三、使用es-head查看数据
logstash从csv文件导入数据到elasticsearch_第2张图片

你可能感兴趣的:(ElasticSearch)