Elastic日报第1期

第1期

地址:https://elasticsearch.cn/article/201

1. A Practical Introduction to Elasticsearch

地址:https://www.elastic.co/blog/a-practical-introduction-to-elasticsearch

通过实际案例介绍Elasticsearch,推荐新手阅读。

2. Elasticsearch 5.0 General Performance Recommendations

地址: https://qbox.io/blog/elasticsearch-5-0-general-performance-recommendations

ES性能调优

1. 避免索引大文件;

  1. 大文件一般都是不必要的,且通常会对网络传输、磁盘、内存等带来压力;
  2. 除了使用_id请求,尽量不要请求_source字段,推荐指定字段名减少网络传输;
  3. http.max_context_length默认大小是100M(Lucene的默认大小是2G),大于该大小的索引请求将被拒绝;

2. 避免请求大的结果集

  1. 如果要取回大量的结果,推荐使用Scroll API
  2. 使用结束后一定要记得清除Scroll标志;
  3. 使用searchSliceWithScroll替代fromsize;
  4. Sliced默认采用的uid切片和单独创建一个整形字段来切分对内存和cpu的影响区别不大;
  5. 推荐Sliced个数和分片数一致。
GET /twitter/tweet/_search?scroll=1m
{
    "slice": {
        "id": 0, 
        "max": 2 
    },
    "query": {
        "match" : {
            "title" : "elasticsearch"
        }
    }
}
GET /twitter/tweet/_search?scroll=1m
{
    "slice": {
        "id": 1,
        "max": 2
    },
    "query": {
        "match" : {
            "title" : "elasticsearch"
        }
    }
}


The id of the slice
The maximum number of slices

3. 避免文档的稀疏性,尽可能最大化文档密度

  1. 将字段一致的文档放在一起;
  2. 文档稀疏会影响搜索和索引速度;
  3. 避免将不相干的文档放在同一个索引;
  4. 标准化文档结构;
  5. 避免过度使用类型,一个索引下的全部类型的字段要尽可能相同;
  6. 对于==仅仅用于过滤的字段==可以禁用norms
  7. 对于==不用于排序且不用于聚合==的字段,可以禁用doc_values
禁用norms
"properties": {
   "title": {
     "type": "text",
     "norms": false
   }
 }
 
禁用doc_values
"mappings": {
   "my_type": {
     "properties": {
       "session_id": {
         "type":       "keyword",
         "doc_values": false
       }
     }
   }
 }

3.The Authoritative Guide to Elasticsearch Performance Tuning

  1. 慎重决定索引个数以及分片数;
  2. 为节点分配合适的角色;
  3. 设置ES允许内存锁定;
  4. 设置ES_HEAP_SIZE;
  5. 关闭系统交换sudo swapoff -a ==/etc/fstab==;
  6. 配置Swappiness:确保sysctlvm.swappiness被设置为1,减少系统内核交换,在一般情况下不交换,紧急情况下和平时一样;
  7. 设置集群名字;
  8. 设置discovery.zen.ping.multicast.enabled为false;
  9. 设置discovery.zen.minimum_master_nodesN/2 + 1;
  10. 在搜索的时候需要倒排索引,在排序、聚合时需要一个非倒排的结构,称为column-store
  11. doc_values默认是开启的,因此不需要进行排序和聚合的字段可以设置其为false
  12. 允许一个字段关闭搜索功能,开启聚合功能;
  13. 关闭_all字段;
  14. 设置action.destructive_requires_nametrue
  15. 设置indices.cluster.send_refresh_mapping: false
  16. 可以为多个索引设置相同的别名,这样的好处是,一个索引可以不需要建立很多分片,有效控制了索引的大小;
  17. 大批量索引文档时,设置备份数量为0;
  18. 使用专门的Data Node,而不要混用;
  19. Bulk提交的大小在==5–15MB==;
  20. SSDs;RAID 0;
  21. ==Avoid using EFS== as the sacrifices made to offer durability;
  22. Do not use remote-mounted storage, such as NFS or SMB/CIFS

The discovery.zen.ping_timeout (which defaults to 3s) allows for the tweaking of election time to handle cases of slow or congested networks (higher values assure less chance of failure). Once a node joins, it will send a join request to the master (discovery.zen.join_timeout) with a timeout defaulting at 20 times the ping timeout. If we are on slow network, set the value higher. The higher the value, the smaller the chance of discovery failure.

If discovery.zen.master_election.filter_client is true, pings from client nodes are ignored during master election; the default value is true. If discovery.zen.master_election.filter_data is true, pings from non-master-eligible data nodes (nodes where node.data is true and node.master is false) are ignored during master election; the default value is false.

The discovery.zen.minimum_master_nodes control the minimum number of eligible master nodes that a node should “see” in order to operate within the cluster. It’s recommended that we set it to a higher value than 1 when running more than 2 nodes in the cluster. One way to calculate value for this will be N/2 + 1 where N is number of master nodes. This setting must be set to a quorum of our master eligible nodes. It is recommended to avoid having only two master eligible nodes, since a quorum of two is two. Therefore, a loss of either master eligible node will result in an inoperable cluster.

## 集群失败重试定义
discovery.zen.fd:
    ping_interval - How often a node gets pinged, defaults to 1s.
    ping_timeout - How long to wait for a ping response, defaults to 30s.
    ping_retries - How many ping failures / timeouts cause a node to be considered failed, defaults to 3.


## 推荐的配置
discovery.zen.no_master_block: all/write
discovery.zen.fd.ping_timeout: 10s
discovery.zen.minimum_master_nodes: 2
Discovery.zen.ping.unicast.hosts: ["master_node_01″,"master_node_02″,"master_node_03″]


## 允许一个字段关闭搜索功能,开启聚合功能
curl -XPUT localhost:9200/my_index -d'{
  "mappings": {
    "my_type": {
      "properties": {
        "user_id": {
          "type": "string",
          "index": "not_analyzed",
          "doc_values": true, 
          "index": "no" 
        }
      }
    }
  }
}'

## 关闭_all模式的索引删除
curl -XPUT localhost:9200/_cluster/settings -d '{
    "persistent" : {
        "action.destructive_requires_name" : true
    }
}'

curl -XPUT localhost:9200/_cluster/settings -d '{
    "transient" : {
        "action.destructive_requires_name" : true
    }
}'

## 添加删除索引别名
curl -XPOST 'localhost:9200/_aliases' -d '{
    "actions" : [
        { "remove" : { "index" : "test1", "alias" : "alias1" } },
        { "add" : { "index" : "test2", "alias" : "alias1" } }
    ]
}'

你可能感兴趣的:(Elastic日报第1期)