在es进行批量插入的时候,默认的超时设置的10ms,有时候并不能满足要求可以手动修改,可以在程序中添加如下配置:
es = Elasticsearch(“IP”,timeout=30)
另外在集群出问题的时候需要关注目录下的日志文件,一般都会标明问题的所在,批量插入的是bulk接口:利用action的json结构,如下代码:
import os
import os.path
import sys
from elasticsearch import Elasticsearch
from elasticsearch import helpers
import uuid
#es = Elasticsearch(["172.16.4.111:9200","172.16.4.112:9200","172.16.4.113:9200","172.16.4.114:9200","172.16.4.115:9200","172.16.4.116:9200","172.16.4.117:9200","172.16.4.118:9200","172.16.4.119:9200"])
es = Elasticsearch(["192.158.71.86:9200","192.158.71.87:9200","192.158.71.88:9200"],timeout=30)
#es = Elasticsearch(["IPadress"],timeout=30)
reload(sys)
sys.setdefaultencoding('utf-8')
import gzip
reload(sys)
sys.setdefaultencoding('utf-8')
if __name__ == '__main__':
line='2017-11-06T00:00:00.000Z,aaaaaaaaaaa,118393528,24710266,1217260,80,20,0,1'
items=line.split(',')
actions=[]
action = {
"_index": "20171106lorrygps-test",
"_type": "gps",
"_id":uuid.uuid4(),
"_source": {
"carId": items[1],
"lon": items[2],
"lat": items[3],
"time":items[0],
"other1":items[4],
"other2":items[5],
"other3":items[6],
"other4":items[7],
"other5":items[8]
}
}
actions.append(action)
helpers.bulk(es,actions)
print("-----------------------------------")
action中可以添加任意多个action只要硬件和内存可以跟上,这样能够加快插入速率。
下面附代码,常用的查看集群的信息
#查看集群状态
curl -XGET 'http://20.42.93.10:9200/_cluster/health?pretty'
#查看集群的所有索引:
curl -X GET 'http://20.42.93.10:9200/_cat/indices?v'
#创建索引,并设置相应的配置:
curl -X PUT '192.168.20.76:9200/20171106lorrygps-test2' -d '
{
"settings":
{"number_of_shards":12,
"number_of_replicas":0
},
"mappings": {
"gps": {
"properties": {
"carId": {
"type": "String"
},
"lon": {
"type": "String"
},
"lat": {
"type": "String"
},
"time": {
"type": "String"
},
"other1": {
"type": "String"
},
"other2": {
"type": "String"
},
"other3": {
"type": "String"
},
"other4": {
"type": "String"
},
"other5": {
"type": "String"
}
}
}
}
}'