ignest node定义一个process pipeline来处理数据,可以替代logstash的某些功能,个人感觉
elasticsearch.yml中定义node为ingest node
node.ingest: false
可以在request或者bulk request命令提交pipeline到ingest node
PUT my-index/my-type/my-id?pipeline=my_pipeline_id
{
"foo": "bar"
}
{
"description" : "...",
"processors" : [ ... ]
}
description描述功能,processors定义处理列表
put api
可以更新和创建新的pipeline
PUT _ingest/pipeline/my-pipeline-id
{
"description" : "describe pipeline",
"processors" : [
{
"set" : {
"field": "foo",
"value": "bar"
}
}
]
}
修改可以即时更新
get api
GET _ingest/pipeline/my-pipeline-id
返回
{
"my-pipeline-id" : {
"description" : "describe pipeline",
"processors" : [
{
"set" : {
"field" : "foo",
"value" : "bar"
}
}
]
}
}
delet api
DELETE _ingest/pipeline/my-pipeline-id
模拟pepeline api
创建模拟的pipeline
POST _ingest/pipeline/_simulate
{
"pipeline" : {
// pipeline definition here
},
"docs" : [
{ /** first document **/ },
{ /** second document **/ },
// ...
]
}
根据现有的pipeline
POST _ingest/pipeline/my-pipeline-id/_simulate
{
"docs" : [
{ /** first document **/ },
{ /** second document **/ },
// ...
]
}
example
POST _ingest/pipeline/_simulate
{
"pipeline" :
{
"description": "_description",
"processors": [
{
"set" : {
"field" : "field2",
"value" : "_value"
}
}
]
},
"docs": [
{
"_index": "index",
"_type": "type",
"_id": "id",
"_source": {
"foo": "bar"
}
},
{
"_index": "index",
"_type": "type",
"_id": "id",
"_source": {
"foo": "rab"
}
}
]
}
返回值
{
"docs": [
{
"doc": {
"_id": "id",
"_ttl": null,
"_parent": null,
"_index": "index",
"_routing": null,
"_type": "type",
"_timestamp": null,
"_source": {
"field2": "_value",
"foo": "bar"
},
"_ingest": {
"timestamp": "2016-01-04T23:53:27.186+0000"
}
}
},
{
"doc": {
"_id": "id",
"_ttl": null,
"_parent": null,
"_index": "index",
"_routing": null,
"_type": "type",
"_timestamp": null,
"_source": {
"field2": "_value",
"foo": "rab"
},
"_ingest": {
"timestamp": "2016-01-04T23:53:27.186+0000"
}
}
}
]
}
访问pipeline数据
处理时可以访问或者设置数据的字段,元数据等信息
访问source filed
{
"set": {
"field": "_source.my_field"
"value": 582.1
}
}
修改元素数据字段,修改document _id
{
"set": {
"field": "_id"
"value": "1"
}
}
_index, _type, _id, _routing,_parent是可以接受访问的
访问ingest元数据
{
"set": {
"field": "received"
"value": "{{_ingest.timestamp}}"
}
}
访问field与metafield在template中
{
"set": {
"field": "field_c"
"value": "{{field_a}} {{field_b}}"
}
}
{
"set": {
"field": "_index"
"value": "{{geoip.country_iso_code}}"
}
}
pipelin错误处理
下面的例子可以将foo字段转换成bar字段,如果没有foo将保存错误在elasticsearch中分析
{
"description" : "my first pipeline with handled exceptions",
"processors" : [
{
"rename" : {
"field" : "foo",
"target_field" : "bar",
"on_failure" : [
{
"set" : {
"field" : "error",
"value" : "field \"foo\" does not exist, cannot rename to \"bar\""
}
}
]
}
}
]
}
怱略异常
{
"description" : "my first pipeline with handled exceptions",
"processors" : [
{
"rename" : {
"field" : "foo",
"target_field" : "bar",
"ignore_failure" : true
}
}
]
}
访问错误信息在pipeline
{
"description" : "my first pipeline with handled exceptions",
"processors" : [
{
"rename" : {
"field" : "foo",
"to" : "bar",
"on_failure" : [
{
"set" : {
"field" : "error",
"value" : "{{ _ingest.on_failure_message }}"
}
}
]
}
}
]
}
现有的process
https://www.elastic.co/guide/en/elasticsearch/reference/current/ingest-processors.html