安装es5.x版本
需要把jvm设置调大,否则起不起来
sudo sysctl -w vm.max_map_count=262144
es版本 |
插件名 |
参考文档 |
es5.0之前 |
mapper-attachments |
https://qbox.io/blog/index-attachments-files-elasticsearch-mapper , |
es5.0以后 |
ingest-attachment |
https://qbox.io/blog/how-to-index-attachments-and-files-to-elasticsearch-5-0-using-ingest-api , https://www.elastic.co/guide/en/elasticsearch/plugins/5.6/using-ingest-attachment.html |
由于原本的es集群是2.3.5版本的,先试了安装2.3.5版本的 mapper-attachments安装失败,原因是下载下来的插件版本说是匹配2.0的ES。好像es集群是2.4的时候可以安装成功,请自己测试。又想把ES版本升级到5.x,于是选择了5.6的ES版本。
sudo bin/elasticsearch-plugin install ingest-attachment
注:1、如果elasticsearch是用docker部署的话,需要在容器内执行这个命令,否则不生效,
2、安装完成attachment插件,还需要重启es集群,才能使得插件生效
注:properties的字段可以指定,最多可指定"content", "title", "author", "keywords", "date", "content_length", "content_type"
curl -XPUT 'http://localhost:19200/_ingest/pipeline/attachment?pretty' -H 'Content-Type: application/json' -d '{ "description" : "Extract attachment information encoded in Base64 with UTF-8 charset", "processors" : [ { "attachment" : { "field" : "data", "properties": [ "content", "title", "author", "keywords", "date", "content_length", "content_type" ] } } ] }'
curl -XPUT 'http://localhost:19200/test/' -d '{ "settings":{ "index":{ "number_of_shards":1, "number_of_replicas":1 } } }'
curl -XPUT 'http://localhost:19200/test/_mapping/document/' -d ' { "document": { "_source": { "excludes": [ "data", "attachment.content" ] }, "properties": { "filename": { "type": "text" }, "attachment": { "properties": { "date": { "type": "date" }, "content_type": { "type": "text", "fields": { "keyword": { "ignore_above": 256, "type": "keyword" } } }, "author": { "type": "text", "fields": { "keyword": { "ignore_above": 256, "type": "keyword" } } }, "title": { "type": "text", "fields": { "keyword": { "ignore_above": 256, "type": "keyword" } } }, "content": { "type": "text" }, "content_length": { "type": "long" } } }, "data": { "type": "binary", "store": false }, "filePath": { "type": "keyword" }, "downloadTimes": { "type": "long" }, "source": { "type": "keyword" }, "type": { "type": "keyword" }, "uploadTime": { "type": "date" }, "viewTimes": { "type": "long" }, "fileType": { "type": "keyword" } } } }'
参考:http://blog.csdn.net/napoay/article/details/62233031
"_source": { "excludes": [ "data", "attachment.content" ] },
type:"keyword",完全匹配搜索
"source": { "type": "keyword" }
ES5之后去掉了string类型,改为text
"content": { "type": "text" }
data 是原文档的base64编码,存储为binary,不需要被看到,也排除在_source中
"data": { "type": "binary", "store": false }
注:data 是原文档的base64编码,用java api索引的时候要把文件内容读为base64字符串放入data字段
curl -XPUT 'http://localhost:19200/test/document/test_id2?pipeline=attachment&pretty' -H 'Content-Type: application/json' -d '{ "source":"北京地区", "filename":"测试文档", "data": "UWJveCBlbmFibGVzIGxhdW5jaGluZyBzdXBwb3J0ZWQsIGZ1bGx5LW1hbmFnZWQsIFJFU1RmdWwgRWxhc3RpY3NlYXJjaCBTZXJ2aWNlIGluc3RhbnRseS4g" }'
curl -XPOST 'http://localhost:19200/test/document/_search?pretty' -d '{ "query": { "bool": { "must": [ { "match_phrase": { "attachment.content": "Qbox" } }, { "term": { "source": "北京地区" } } ] } } }'
https://www.elastic.co/guide/en/elasticsearch/plugins/5.6/using-ingest-attachment.htmlhttps://www.elastic.co/guide/en/elasticsearch/reference/5.5/binary.html