elastic search 配置文件

Elasticsearch的配置文件在config文件夹下,其中有  elasticsearch.yml 、logging.yml两个配置文件,其中  elasticsearch.yml是用来配置 Elasticsearch服务的,logging.yml是用来配置日志文件的。下面是 elasticsearch.yml配置文件的中文说明:

  1. # ---------------------------------- Cluster -----------------------------------
  2. # 配置es的集群名称,默认是elasticsearch,es会自动发现在同一网段下的es,如果在同一网段下有多个集群,就可以用这个属性来区分不同的集群
  3. cluster.name: elasticsearch
  4. #
  5. # ------------------------------------ Node ------------------------------------
  6. # 节点名,默认随机指定一个name列表中名字,该列表在es的jar包中config文件夹里name.txt文件中,其中有很多作者添加的有趣名字
  7. node.name: "Franz Kafka"
  8. #
  9. # 指定该节点是否有资格被选举成为node,默认是true,es是默认集群中的第一台机器为master,如果这台机挂了就会重新选举master
  10. node.master: true
  11. #
  12. # 指定该节点是否存储索引数据,默认为true
  13. node.data: true
  14. #
  15. # ------------------------------------ Index -----------------------------------
  16. # 设置默认索引分片个数,默认为5片
  17. index.number_of_shards: 5
  18. #
  19. # 设置默认索引副本个数,默认为1个副本
  20. index.number_of_replicas: 1
  21. #
  22. # ----------------------------------- Paths ------------------------------------
  23. # 设置配置文件的存储路径,默认是es根目录下的config文件夹
  24. # path.conf: /path/to/conf
  25. #
  26. # 设置索引数据的存储路径,默认是es根目录下的data文件夹,可以设置多个存储路径,用逗号隔开,例:
  27. # path.data: /path/to/data1,/path/to/data2
  28. # path.data: /path/to/data
  29. #
  30. # 设置日志文件的存储路径,默认是es根目录下的logs文件夹
  31. # path.logs: /path/to/logs
  32. #
  33. # 设置临时文件的存储路径,默认是es根目录下的work文件夹
  34. # path.work: /path/to/work
  35. #
  36. # 设置插件的存放路径,默认是es根目录下的plugins文件夹
  37. # path.plugins: /path/to/plugins
  38. #
  39. # ----------------------------------- Memory -----------------------------------
  40. # 设置为true来锁住内存。因为当jvm开始swapping时es的效率会降低,所以要保证它不swap,
  41. # 可以把ES_MIN_MEM和ES_MAX_MEM两个环境变量设置成同一个值,并且保证机器有足够的内存分配给es。
  42. # 同时也要允许elasticsearch的进程可以锁住内存,linux下可以通过`ulimit -l unlimited`命令
  43. bootstrap.mlockall: true
  44. #
  45. # ---------------------------------- Network -----------------------------------
  46. # 设置绑定的ip地址,可以是ipv4或ipv6的,默认为0.0.0.0
  47. # network.bind_host: 192.168.0.1
  48. #
  49. # 设置其它节点和该节点交互的ip地址,如果不设置它会自动判断,值必须是个真实的ip地址
  50. # network.publish_host: 192.168.0.1
  51. #
  52. # 这个参数是用来同时设置bind_host和publish_host上面两个参数
  53. # network.host: 192.168.0.1
  54. #
  55. # 设置节点间交互的tcp端口,默认是9300
  56. transport.tcp.port: 9300
  57. #
  58. # 设置对外服务的http端口,默认为9200
  59. http.port: 9200
  60. #
  61. # 设置是否压缩tcp传输时的数据,默认为false,不压缩
  62. transport.tcp.compress: true
  63. #
  64. # 设置内容的最大容量,默认100mb
  65. http.max_content_length: 100mb
  66. #
  67. # 是否使用http协议对外提供服务,默认为true,开启
  68. http.enabled: true
  69. #
  70. # ---------------------------------- Gateway -----------------------------------
  71. # gateway的类型,默认为local即为本地文件系统,可以设置为本地文件系统,分布式文件系统,hadoop的HDFS,和amazon的s3服务器
  72. gateway.type: local
  73. #
  74. # 设置集群中N个节点启动时进行数据恢复,默认为1
  75. # gateway.recover_after_nodes: 1
  76. #
  77. # 设置初始化数据恢复进程的超时时间,默认是5分钟
  78. # gateway.recover_after_time: 5m
  79. #
  80. # 设置这个集群中节点的数量,默认为2,一旦这N个节点启动,就会立即进行数据恢复(无需等待 recover_after_time 过期)
  81. # gateway.expected_nodes: 2
  82. #
  83. # ----------------------------- Recovery Throttling ----------------------------
  84. # 初始化数据恢复时,并发恢复线程的个数,默认为4
  85. # cluster.routing.allocation.node_initial_primaries_recoveries: 4
  86. #
  87. # 添加删除节点或负载均衡时并发恢复线程的个数,默认为4
  88. # cluster.routing.allocation.node_concurrent_recoveries: 2
  89. #
  90. # 设置数据恢复时限制的带宽,如入100mb,默认为0,即无限制
  91. indices.recovery.max_size_per_sec: 20mb
  92. #
  93. # 设置这个参数来限制从其它分片恢复数据时最大同时打开并发流的个数,默认为5
  94. indices.recovery.concurrent_streams: 5
  95. #
  96. # --------------------------------- Discovery ----------------------------------
  97. # 设置这个参数来保证集群中的节点可以知道其它N个有master资格的节点。默认为1,对于大的集群来说,可以设置大一点的值(2-4)
  98. discovery.zen.minimum_master_nodes: 1
  99. #
  100. # 设置集群中自动发现其它节点时ping连接超时时间,默认为3秒,对于比较差的网络环境可以高点的值来防止自动发现时出错
  101. discovery.zen.ping.timeout: 3s
  102. #
  103. # 设置是否打开多播发现节点,默认是true
  104. discovery.zen.ping.multicast.enabled: true
  105. #
  106. # 设置集群中master节点的初始列表,可以通过这些节点来自动发现新加入集群的节点
  107. # discovery.zen.ping.unicast.hosts: ["host1", "host2:port", "host3[portX-portY]"]
  108. #
  109. # ---------------------------------- Slow Log ----------------------------------
  110. # 下面是一些查询时的慢日志参数设置
  111. #
  112. #index.search.slowlog.threshold.query.warn: 10s
  113. #index.search.slowlog.threshold.query.info: 5s
  114. #index.search.slowlog.threshold.query.debug: 2s
  115. #index.search.slowlog.threshold.query.trace: 500ms
  116. #
  117. #index.search.slowlog.threshold.fetch.warn: 1s
  118. #index.search.slowlog.threshold.fetch.info: 800ms
  119. #index.search.slowlog.threshold.fetch.debug: 500ms
  120. #index.search.slowlog.threshold.fetch.trace: 200ms
  121. #
  122. #index.indexing.slowlog.threshold.index.warn: 10s
  123. #index.indexing.slowlog.threshold.index.info: 5s
  124. #index.indexing.slowlog.threshold.index.debug: 2s
  125. #index.indexing.slowlog.threshold.index.trace: 500ms
  126. #
  127. # --------------------------------- GC Logging ---------------------------------
  128. #
  129. #monitor.jvm.gc.young.warn: 1000ms
  130. #monitor.jvm.gc.young.info: 700ms
  131. #monitor.jvm.gc.young.debug: 400ms
  132. #
  133. #monitor.jvm.gc.old.warn: 10s
  134. #monitor.jvm.gc.old.info: 5s
  135. #monitor.jvm.gc.old.debug: 2s
  136. #
  137. # ---------------------------------- Security ----------------------------------
  138. # 是否启用JSONP,默认禁用
  139. # http.jsonp.enable: false
  140. #

-------------------------------------------------------------------------------------------------------------------------------

http://jingyan.baidu.com/article/48206aead42b53216bd6b372.html

安装配置之前写过,这里是重要的配置。。。主要是对elastic中Important Configuration Changes这一节的翻译吧,译的不好,所以关键英文原文也会带一些吧。

为什么选这一节,因为真的是重要吧,刚好今天有人问了个配置的问题,所以整理下。也供自己以后参考吧,不经常用到时间久了,经常要重头来过也挺烦的。。——by 歪歪

elasticsearch.yml

请阅读本节内容!呈现所有配置都同样重要,并且不以任何特定的顺序列出。请在所有配置选项读取,并将它们应用到你的集群

Please read this entire section! All configurations presented are equally important, and are not listed in any particular order. Please read through all configuration options and apply them to your cluster.

2elasticsearch学习一、安装和配置

方法/步骤

  1. 1

    Elasticsearch附带了非常好的默认值, 尤其是当它涉及到性能相关的设置和选项。如果有疑问(如果没有弄清楚),请不要动配置。我们目睹了许多因为用了错误的配置而导致集群的毁灭,就是因为他们的管理员认为他们的修改可以使性能百倍的提升!

    Elasticsearch ships with very good defaults, especially when it comes to performance- related settings and options. When in doubt, just leave the settings alone. We have witnessed countless dozens of clusters ruined by errant settings because the administrator thought he could turn a knob and gain 100-fold improvement.

    指定集群名称Assign Names
  2. Elasticseach默认的集群名称都为elasticseach,为生产环境的集群重命名是一个明智的做法,简单的做法能防止意外的发生,如某个人的笔记本加入到集群中。集群名称改成elasticsearch_production,这么一个简单的修改可以避免很多的痛心。

    Elasticseach by default starts a cluster named elasticsearch. It is wise to rename your production cluster to something else, simply to prevent accidents whereby someone’s laptop joins the cluster. A simple change to elasticsearch_production can save a lot of heartache.

    指定节点名称
  3. 同样的,改变你的节点的名称也是明智的做法。正如你可能已经注意到了,Elasticsearch在启动时为节点随机分配一个漫威超级英雄的名字。在开发环境你可能觉得很萌,但是当凌晨3点你还在试图定位那个物理机怠工了,你就不会觉得很萌啦。

    更重要的是,因为这些名字是在启动时生成的,每次重启节点,它都会得到一个新的名字。因为这些节点的名字经常的变化,可能导致你的日志非常的混乱。

    基于它可能带来的麻烦,我们建议你有计划的给每个节点起个有意义的,描述性的名字。这同样在elasticsearch.yml文件中配置

    Similarly, it is wise to change the names of your nodes. As you’ve probably noticed by now, Elasticsearch assigns a random Marvel superhero name to your nodes at startup. This is cute in development—but less cute when it is 3a.m. and you are trying to remember which physical machine was Tagak the Leopard Lord.

    More important, since these names are generated on startup, each time you restart your node, it will get a new name. This can make logs confusing, since the names of all the nodes are constantly changing.

    Boring as it might be, we recommend you give each node a name that makes sense to you—a plain, descriptive name. This is also configured in your elasticsearch.yml:

    路径配置Paths
  4. 默认情况下,Eleasticsearch会把插件、日志、最重要的是你的数据都放在安装目录下。这可能会不幸的意外,通过安装新的elasticsearch就可能把安装目录覆盖了。如果你不小心,你可能擦除你所有的数据。

    不要笑 - 我们已经看到它发生好几次了。

    最好的做法就是把你的数据目录配置到安装目录以外的地方,同样你也可以配置你的插件和日志的目录。

    By default, Elasticsearch will place the plug-ins, logs, and—most important—your data in the installation directory. This can lead to unfortunate accidents, whereby the installation directory is accidentally overwritten by a new installation of Elasticsearch. If you aren’t careful, you can erase all your data.

    Don’t laugh—we’ve seen it happen more than a few times.

    The best thing to do is relocate your data directory outside the installation location. You can optionally move your plug-in and log directories as well.

    This can be changed as follows:

    注意:你可以通过逗号分隔指定多个目录。
  5. 数据可以保存到多个不同的目录,如果每个目录挂载在不同的硬盘,这是一种简单而有效的方式来建立一个软件RAID 0。Elasticsearch自动会把数据分配到不同的目录,以便提高性能。

    Notice that you can specify more than one directory for data by using comma-separated lists.

    Data can be saved to multiple directories, and if each directory is mounted on a different hard drive, this is a simple and effective way to set up a software RAID 0. Elasticsearch will automatically stripe data between the different directories, boosting performance.

  6. 设置最小主节点数Minimum Master Nodes

    最小主节点数的设置对集群的稳定是非常重要的。该设置对预防脑裂是有帮助的,即一个集群中存在两个master。

    脑裂的危害。。。

    这个配置就是告诉Elasticsearch除非有足够可用的master候选节点,否则就不选举master,只有有足够可用的master候选节点才进行选举。

    该设置应该始终被配置为有主节点资格的法定节点数,法定节点数:(主节点资格的节点数/2)+1。例如:

    1、如果你有10个符合规则的节点数,法定数就是6.

    2、如果你有3个候选master,和100个数据节点,法定数就是2,你只要计算那些有主节点资格的节点数就可以了。

    3、如果你有2个符合规则的节点数,法定节点数应该是2,但是这意味着如果一个节点狗带了,你的整个集群就不可以用了。设置成1将保证集群的功能,但是就不能防止脑裂了。基于这样的情况,最好的解决就是至少有3个节点。

    The minimum_master_nodes setting is extremely important to the stability of your cluster. This setting helps prevent split brains, the existence of two masters in a single cluster.

    When you have a split brain, your cluster is at danger of losing data. Because the master is considered the supreme ruler of the cluster, it decides when new indices can be created, how shards are moved, and so forth. If you have two masters, data integrity becomes perilous, since you have two nodes that think they are in charge.

    This setting tells Elasticsearch to not elect a master unless there are enough master-eligible nodes available. Only then will an election take place.

    This setting should always be configured to a quorum (majority) of your master-eligible nodes. A quorum is (number of master-eligible nodes / 2) + 1. Here are some examples:

    If you have ten regular nodes (can hold data, can become master), a quorum is 6.

    If you have three dedicated master nodes and a hundred data nodes, the quorum is 2, since you need to count only nodes that are master eligible.

    If you have two regular nodes, you are in a conundrum. A quorum would be 2, but this means a loss of one node will make your cluster inoperable. A setting of 1 will allow your cluster to function, but doesn’t protect against split brain. It is best to have a minimum of three nodes in situations like this.

    所谓脑裂问题(类似于精神分裂),就是同一个集群中的不同节点,对于集群的状态有了不一样的理解。

  7. 今天,Elasticsearch集群出现了查询极端缓慢的情况,通过以下命令查看集群状态:

    curl -XGET 'es-1:9200/_cluster/health'

    发现,集群的总体状态是red,本来9个节点的集群,在结果中只显示了4个;但是,将请求发向不同的节点之后,我却发现即使是总体状态是red的,但是可用的节点数量却不一致。


    正常情况下,集群中的所有的节点,应该对集群中master的选择是一致的,这样获得的状态信息也应该是一致的,不一致的状态信息,说明不同的节点对master节点的选择出现了异常——也就是所谓的脑裂问题。这样的脑裂状态直接让节点失去了集群的正确状态,导致集群不能正常工作。


    可能导致的原因:

    1. 网络:由于是内网通信,网络通信问题造成某些节点认为master死掉,而另选master的可能性较小;进而检查Ganglia集群监控,也没有发现异常的内网流量,故此原因可以排除。

    2. 节点负载:由于master节点与data节点都是混合在一起的,所以当工作节点的负载较大(确实也较大)时,导致对应的ES实例停止响应,而这台服务器如果正充当着master节点的身份,那么一部分节点就会认为这个master节点失效了,故重新选举新的节点,这时就出现了脑裂;同时由于data节点上ES进程占用的内存较大,较大规模的内存回收操作也能造成ES进程失去响应。所以,这个原因的可能性应该是最大的。


    应对问题的办法:

    1. 对应于上面的分析,推测出原因应该是由于节点负载导致了master进程停止响应,继而导致了部分节点对于master的选择出现了分歧。为此,一个直观的解决方案便是将master节点与data节点分离。为此,我们添加了三台服务器进入ES集群,不过它们的角色只是master节点,不担任存储和搜索的角色,故它们是相对轻量级的进程。可以通过以下配置来限制其角色:

    [plain]  view plain  copy
    1. node.master: true  
    2. node.data: false  

    当然,其它的节点就不能再担任master了,把上面的配置反过来即可。这样就做到了将master节点与data节点分离。当然,为了使新加入的节点快速确定master位置,可以将data节点的默认的master发现方式有multicast修改为unicast:


    [plain]  view plain  copy
    1. discovery.zen.ping.multicast.enabled: false  
    2. discovery.zen.ping.unicast.hosts: ["master1", "master2", "master3"]  

    2. 还有两个直观的参数可以减缓脑裂问题的出现:

    discovery.zen.ping_timeout(默认值是3秒):默认情况下,一个节点会认为,如果master节点在3秒之内没有应答,那么这个节点就是死掉了,而增加这个值,会增加节点等待响应的时间,从一定程度上会减少误判。

    discovery.zen.minimum_master_nodes(默认是1):这个参数控制的是,一个节点需要看到的具有master节点资格的最小数量,然后才能在集群中做操作。官方的推荐值是(N/2)+1,其中N是具有master资格的节点的数量(我们的情况是3,因此这个参数设置为2,但对于只有2个节点的情况,设置为2就有些问题了,一个节点DOWN掉后,你肯定连不上2台服务器了,这点需要注意)。


    因为Elasticsearch集群是动态的,你可以轻易的添加和删除节点的数,而导致法定节点数的改变。如果你不得不将新的配置配到每个节点然后重启整个集群,仅仅是因为改变了这个配置,这一定会让人抓狂。

    基于这个原因,minimum_master_nodes(和其他设置)可经由API动态调用进行配置。当你的集群在运行的时候您就可以更改这些设置:

    But because Elasticsearch clusters are dynamic, you could easily add or remove nodes that will change the quorum. It would be extremely irritating if you had to push new configurations to each node and restart your whole cluster just to change the setting.

    For this reason, minimum_master_nodes (and other settings) can be configured via a dynamic API call. You can change the setting while your cluster is online:

    PUT /_cluster/settings

    {

        "persistent" : {

            "discovery.zen.minimum_master_nodes" : 2

        }

    }

    这是一个持久化的配置,生效将优先于配置文件中的配置。当你添加和删除有主节点资格的节点的时候,你需要更改这个配置。
  8. This will become a persistent setting that takes precedence over whatever is in the static configuration. You should modify this setting whenever you add or remove master-eligible nodes.

  9. 集群恢复的设置Recovery Settings

    当集群重新启动,有几个设置会影响片恢复的行为。 首先,我们需要了解,如果什么都没有配置会发生什么事情。

    假设你有10个节点,并且每个节点保留一个片——主分片或者副本分片,在一个5主一副本的索引中。因为维护你停掉了所有的集群(例如,安装新的驱动)。在你重启集群的时候,很正常的5个节点别其它5个节点先起来了。

    可能那5个节点脱离了集群,他们没有立即重启的命令。不过什么原因了,你只启动了5个节点。这5个节点进行互相通信,选举出主节点,形成一个集群。因为5个节点没有加入到集群,所以他们发现数据不再是均匀分布的,然后每个片之间立即进行数据复制。

    最终,你其他5个节点打开了并且加入到了集群中。他们发现他们的数据已经被复制到其他节点了,因此他们删除他们的本地数据(因为他们现在是多余的,很可能已经过期了)。因为现在集群中节点数从5个变成了10个,集群重新开始平衡。

    在这整个处理过程中,你的节点不停的消耗磁盘和网络,不停的进行数据转移,而根本没有好的理由让你必须这么做。对于一个TB级数据的大集群,这个无意义的数据清洗过程可能会需要很长的时间。如果所有的节点只是简单的等待整个集群联机,那么所有的节点都是本地数据并且没有什么需要移动的。

    现在我们知道问题了,我们可以进行一些配置来缓和这个问题。首先,我们需要给Elasticsearch一个硬性的限制:gateway.recover_after_nodes: 8

    Several settings affect the behavior of shard recovery when your cluster restarts. First, we need to understand what happens if nothing is configured.

    Imagine you have ten nodes, and each node holds a single shard—either a primary or a replica—in a 5 primary / 1 replica index. You take your entire cluster offline for maintenance (installing new drives, for example). When you restart your cluster, it just so happens that five nodes come online before the other five.

    Maybe the switch to the other five is being flaky, and they didn’t receive the restart command right away. Whatever the reason, you have five nodes online. These five nodes will gossip with each other, elect a master, and form a cluster. They notice that data is no longer evenly distributed, since five nodes are missing from the cluster, and immediately start replicating new shards between each other.

    Finally, your other five nodes turn on and join the cluster. These nodes see that their data is being replicated to other nodes, so they delete their local data (since it is now redundant, and may be outdated). Then the cluster starts to rebalance even more, since the cluster size just went from five to ten.

    During this whole process, your nodes are thrashing the disk and network, moving data around—for no good reason. For large clusters with terabytes of data, this useless shuffling of data can take a really long time. If all the nodes had simply waited for the cluster to come online, all the data would have been local and nothing would need to move.

    Now that we know the problem, we can configure a few settings to alleviate it. First, we need to give Elasticsearch a hard limit:

    gateway.recover_after_nodes: 8

    这将防止Elasticsearch立即开始数据恢复,直到集群中至少有八个(数据节点或主节点)节点存在。改值的设置是个人喜好的问题:你认为提供多少个节点你的集群是可用的?在上面的案例中,我们设置为8,意味着除非至少有8个节点后集群才是可操作的。
  10. This will prevent Elasticsearch from starting a recovery until at least eight (data or master) nodes are present. The value for this setting is a matter of personal preference: how many nodes do you want present before you consider your cluster functional? In this case, we are setting it to 8, which means the cluster is inoperable unless there are at least eight nodes.

  11. 然后我们告诉Elasticsearch应该有多少节点在集群中,并且所有的节点都加入到集群中我们需要等多久:

    Then we tell Elasticsearch how many nodes should be in the cluster, and how long we want to wait for all those nodes:

    gateway.expected_nodes: 10 

    gateway.recover_after_time: 5m

    意味着是Elasticsearch将做到以下几点:
  12. 1、等到集群中有8个节点

    2、集群开始数据恢复等到5分钟后或者10个节点加入,以先到者为准。

    这三个设置使你在集群重启是可以避免分片之间出现过多的数据交换。它可以正确的使恢复只需要几秒钟,而不是几个小时。

    注意:这些设置只可以在config/elasticsearch.yml文件中或者命令行(他们不是动态更新的)中设置,他们在一个完整的集群重启过程中才有意义。

    What this means is that Elasticsearch will do the following:

    1、Wait for eight nodes to be present

    2、Begin recovering after 5 minutes or after ten nodes have joined the cluster, whichever comes first.

    These three settings allow you to avoid the excessive shard swapping that can occur on cluster restarts. It can literally make recovery take seconds instead of hours.

    NOTE:These settings can only be set in the config/elasticsearch.yml file or on the command line (they are not dynamically updatable) and they are only relevant during a full cluster restart.

  13. 单播代替组播 Prefer Unicast over Multicast

    Elasticsearch被配置为使用单播防止节点意外加入集群。只有节点运行在同一台机器上后才会自动形成集群。

    虽然组播作为一个插件来提供,他应该永远不用在生产环境。你最不想看到的事就是节点意味加入到你生产环境的集群中,仅仅是因为他们收到了错误的组播信号。组播本身没什么错误。组播导致愚蠢的问题,并且使集群变得脆弱(例如,一个网络攻城狮没有告诉你在捣鼓网络,所有节点突然找不到对方了)。

    使用单播,给Elasticsearch提供一个应该联系的节点的列表。当一个节点联系到单播列表中的一个节点时,他会接收完整的集群中状态和集群中的所有节点。然后他联系主节点并加入集群。

    这意味着你的单播列表没必要包含集群中的所有节点。

    这意味着你的单播列表中没有必要包括集群中的所有节点。一个新的节点只要找到足够的节点可以联系就可以了。如果你使用专门的主节点,那么仅仅列出三个专门的主节点就可以了。在elasticsearch.yml中配置:discovery.zen.ping.unicast.hosts: ["host1", "host2:port"]

    Elasticsearch is configured to use unicast discovery out of the box to prevent nodes from accidentally joining a cluster. Only nodes running on the same machine will automatically form cluster.

    While multicast is still provided as a plugin, it should never be used in production. The last thing you want is for nodes to accidentally join your production network, simply because they received an errant multicast ping. There is nothing wrong with multicast per se. Multicast simply leads to silly problems, and can be a bit more fragile (for example, a network engineer fiddles with the network without telling you—and all of a sudden nodes can’t find each other anymore).

    To use unicast, you provide Elasticsearch a list of nodes that it should try to contact. When a node contacts a member of the unicast list, it receives a full cluster state that lists all of the nodes in the cluster. It then contacts the master and joins the cluster.

    This means your unicast list does not need to include all of the nodes in your cluster. It just needs enough nodes that a new node can find someone to talk to. If you use dedicated masters, just list your three dedicated masters and call it a day. This setting is configured in elasticsearch.yml:discovery.zen.ping.unicast.hosts: ["host1", "host2:port"]

    单播和组播这个1.x和2.x的版本默认值有点不太一样
  14. 我用的是1.7.2的所以应该有两个配置项需要修改




    ##################### ElasticSearch 配置示例 #####################


    # This file contains an overview of various configuration settings,
    # targeted at operations staff. Application developers should
    # consult the guide at <http://elasticsearch.org/guide>.
    # 这个文件包含了各种配置的概览,旨在配置与运行操作相关的东西。
    # 应用程序开发人员应该咨询<http://elasticsearch.org/guide>
    #
    # The installation procedure is covered at
    # <http://elasticsearch.org/guide/reference/setup/installation.html>.
    # 安装过程在这里有<http://elasticsearch.org/guide/reference/setup/installation.html>.
    #
    #
    # ElasticSearch comes with reasonable defaults for most settings,
    # so you can try it out without bothering with configuration.
    # ElasticSearch 已经提供了大部分设置,都是合理的默认配置。
    # 所以你不必进行烦人的配置就可以尝试一下。
    #
    # Most of the time, these defaults are just fine for running a production
    # cluster. If you're fine-tuning your cluster, or wondering about the
    # effect of certain configuration option, please _do ask_ on the
    # mailing list or IRC channel [http://elasticsearch.org/community].
    # 大多数时候,这些默认的配置就足以运行一个生产集群了。
    # 如果你想优化你的集群,或者对一个特定的配置选项的作用好奇,你可以访问邮件列表
    # 或者IRC频道[http://elasticsearch.org/community].
    #




    # Any element in the configuration can be replaced with environment variables
    # by placing them in ${...} notation. For example:
    # 配置中的任何一个元素都可以被环境变量取代,这些环境变量使用${...}符号占位
    # 例如:
    # node.rack: ${RACK_ENV_VAR}




    # See <http://elasticsearch.org/guide/reference/setup/configuration.html>
    # for information on supported formats and syntax for the configuration file.
    # 查看<http://elasticsearch.org/guide/reference/setup/configuration.html>了解更多
    # 的可支持的格式和配置文件的语法。








    ################################### 集群 ###################################




    # Cluster name identifies your cluster for auto-discovery. If you're running
    # multiple clusters on the same network, make sure you're using unique names.
    # 集群名称标识了你的集群,自动探查会用到它。
    # 如果你在同一个网络中运行多个集群,那就要确保你的集群名称是独一无二的。
    #
    # cluster.name: elasticsearch








    #################################### 节点 #####################################




    # Node names are generated dynamically on startup, so you're relieved
    # from configuring them manually. You can tie this node to a specific name:
    # 节点名称会在启动的时候自动生成,所以你可以不用手动配置。你也可以给节点指定一个
    # 特定的名称
    #
    # node.name: "Franz Kafka"




    # Every node can be configured to allow or deny being eligible as the master,
    # and to allow or deny to store the data.
    # 每一个节点是否允许被选举成为主节点,是否允许存储数据,都是可以配置的
    #
    #
    # Allow this node to be eligible as a master node (enabled by default):
    # 允许这个节点被选举为一个主节点(默认为允许)
    #
    #
    # node.master: true
    #
    # Allow this node to store data (enabled by default):
    # 允许这个节点存储数据(默认为允许)
    #
    # node.data: true




    # You can exploit these settings to design advanced cluster topologies.
    # 你可以利用这些设置设计高级的集群拓扑
    #
    # 1. You want this node to never become a master node, only to hold data.
    # This will be the "workhorse" of your cluster.
    # 1. 你不想让这个节点成为一个主节点,只想用来存储数据。
    # 这个节点会成为你的集群的“负载器”
    #
    # node.master: false
    # node.data: true
    #
    # 2. You want this node to only serve as a master: to not store any data and
    # to have free resources. This will be the "coordinator" of your cluster.
    # 2. 你想让这个节点成为一个主节点,并且不用来存储任何数据,并且拥有空闲资源。
    # 这个节点会成为你集群中的“协调器”
    #
    # node.master: true
    # node.data: false
    #
    # 3. You want this node to be neither master nor data node, but
    # to act as a "search load balancer" (fetching data from nodes,
    # aggregating results, etc.)
    # 4. 你既不想让这个节点变成主节点也不想让其变成数据节点,只想让其成为一个“搜索负载均衡器”
    # (从节点中获取数据,聚合结果,等等)
    #
    # node.master: false
    # node.data: false




    # Use the Cluster Health API [http://localhost:9200/_cluster/health], the
    # Node Info API [http://localhost:9200/_cluster/nodes] or GUI tools
    # such as <http://github.com/lukas-vlcek/bigdesk> and
    # <http://mobz.github.com/elasticsearch-head> to inspect the cluster state.
    # 使用集群体检API[http://localhost:9200/_cluster/health] ,
    # 节点信息API[http://localhost:9200/_cluster/nodes] 或者GUI工具例如:
    # <http://github.com/lukas-vlcek/bigdesk>和<http://mobz.github.com/elasticsearch-head>
    # 可以查看集群状态
    #




    # A node can have generic attributes associated with it, which can later be used
    # for customized shard allocation filtering, or allocation awareness. An attribute
    # is a simple key value pair, similar to node.key: value, here is an example:
    # 一个节点可以附带一些普通的属性,这些属性可以在后面的自定义分片分配过滤或者allocation awareness中使用。
    # 一个属性就是一个简单的键值对,类似于node.key: value, 这里有一个例子:
    #
    # node.rack: rack314




    # By default, multiple nodes are allowed to start from the same installation location
    # to disable it, set the following:
    # 默认的,多个节点允许从同一个安装位置启动。若想禁止这个特性,按照下面所示配置:
    # node.max_local_storage_nodes: 1





    #################################### 索引 ####################################




    # You can set a number of options (such as shard/replica options, mapping
    # or analyzer definitions, translog settings, ...) for indices globally,
    # in this file.
    # 你可以在这个文件中为所有的索引设置一系列的全局操作(例如 分片/副本 操作,mapping(映射)
    # 或者分词器定义,translog配置,...)

    #
    # Note, that it makes more sense to configure index settings specifically for
    # a certain index, either when creating it or by using the index templates API.
    # 提示,针对一个特定的索引进行配置更合理,不论是在创建索引还是使用索引模板API的时候。
    #
    #
    # See <http://elasticsearch.org/guide/reference/index-modules/> and
    # <http://elasticsearch.org/guide/reference/api/admin-indices-create-index.html>
    # for more information.
    # 详情见<http://elasticsearch.org/guide/reference/index-modules/>
    # <http://elasticsearch.org/guide/reference/api/admin-indices-create-index.html>




    # Set the number of shards (splits) of an index (5 by default):
    # 设置一个索引的分片数量(默认为5)
    #
    # index.number_of_shards: 5




    # Set the number of replicas (additional copies) of an index (1 by default):
    # 设置一个索引的副本数量(默认为1)
    #
    # index.number_of_replicas: 1




    # Note, that for development on a local machine, with small indices, it usually
    # makes sense to "disable" the distributed features:
    # 注意,为了使用小的索引在本地机器上开发,禁用分布式特性是合理的做法。
    #
    #
    # index.number_of_shards: 1
    # index.number_of_replicas: 0




    # These settings directly affect the performance of index and search operations
    # in your cluster. Assuming you have enough machines to hold shards and
    # replicas, the rule of thumb is:
    # 这些设置会直接影响索引和查询操作的在集群中的性能。假如你有足够的机器来放分片和副本,
    # 最佳实践是:
    #
    # 1. Having more *shards* enhances the _indexing_ performance and allows to
    # _distribute_ a big index across machines.
    # 1. 索引分片分的多一些,可以提高索引的性能,并且把一个大的索引分布到机器中去。
    # 2. Having more *replicas* enhances the _search_ performance and improves the
    # cluster _availability_.
    # 2. 副本分片分的多一些,可以提高搜索的性能,并且提高集群的可用性。
    #
    # The "number_of_shards" is a one-time setting for an index.
    # "number_of_shards"对一个索引来说只能配置一次
    #
    # The "number_of_replicas" can be increased or decreased anytime,
    # by using the Index Update Settings API.
    # "number_of_replicas"在任何时候都可以增加或减少,通过Index Update Settings(索引更新配置)API可以做到这一点。
    #
    #
    # ElasticSearch takes care about load balancing, relocating, gathering the
    # results from nodes, etc. Experiment with different settings to fine-tune
    # your setup.
    # ElasticSearch 会维护load balancin(负载均衡),relocating(重定位),合并来自各个节点的结果等等。
    # 你可以实验不同的配置来进行优化。
    #




    # Use the Index Status API (<http://localhost:9200/A/_status>) to inspect
    # the index status.
    # 使用Index Status(索引状态)API (<http://localhost:9200/A/_status>)查看索引状态




    #################################### Paths(路径) ####################################




    # Path to directory containing configuration (this file and logging.yml):
    # 包含配置(这个文件和logging.yml)的目录的路径
    #
    # path.conf: /path/to/conf




    # Path to directory where to store index data allocated for this node.
    # 存储这个节点的索引数据的目录的路径

    # path.data: /path/to/data
    #
    # Can optionally include more than one location, causing data to be striped across
    # the locations (a la RAID 0) on a file level, favouring locations with most free
    # space on creation. For example:
    # 可以随意的包含不止一个位置,这样数据会在文件层跨越多个位置(a la RAID 0),创建时会
    # 优先选择大的剩余空间的位置
    #
    # path.data: /path/to/data1,/path/to/data2




    # Path to temporary files:
    # 临时文件的路径
    #
    # path.work: /path/to/work




    # Path to log files:
    # 日志文件的路径
    #
    # path.logs: /path/to/logs




    # Path to where plugins are installed:
    # 插件安装路径
    #
    # path.plugins: /path/to/plugins

    #################################### 插件 ###################################




    # If a plugin listed here is not installed for current node, the node will not start.
    # 如果当前结点没有安装下面列出的插件,结点不会启动
    #
    # plugin.mandatory: mapper-attachments,lang-groovy








    ################################### 内存 ####################################




    # ElasticSearch performs poorly when JVM starts swapping: you should ensure that
    # it _never_ swaps.
    # 当JVM开始swapping(换页)时ElasticSearch性能会低下,你应该保证它不会换页
    #
    #
    # Set this property to true to lock the memory:
    # 设置这个属性为true来锁定内存
    #
    # bootstrap.mlockall: true




    # Make sure that the ES_MIN_MEM and ES_MAX_MEM environment variables are set
    # to the same value, and that the machine has enough memory to allocate
    # for ElasticSearch, leaving enough memory for the operating system itself.
    # 确保ES_MIN_MEM和ES_MAX_MEM环境变量设置成了同一个值,确保机器有足够的内存来分配
    # 给ElasticSearch,并且保留足够的内存给操作系统
    #
    #
    # You should also make sure that the ElasticSearch process is allowed to lock
    # the memory, eg. by using `ulimit -l unlimited`.
    # 你应该确保ElasticSearch的进程可以锁定内存,例如:使用`ulimit -l unlimited`
    #








    ############################## Network(网络) 和 HTTP ###############################




    # ElasticSearch, by default, binds itself to the 0.0.0.0 address, and listens
    # on port [9200-9300] for HTTP traffic and on port [9300-9400] for node-to-node
    # communication. (the range means that if the port is busy, it will automatically
    # try the next port).
    # 默认的ElasticSearch把自己和0.0.0.0地址绑定,HTTP传输的监听端口在[9200-9300],节点之间
    # 通信的端口在[9300-9400]。(范围的意思是说如果一个端口已经被占用,它将会自动尝试下一个端口)
    #
    #




    # Set the bind address specifically (IPv4 or IPv6):
    # 设置一个特定的绑定地址(IPv4 or IPv6):
    #
    # network.bind_host: 192.168.0.1




    # Set the address other nodes will use to communicate with this node. If not
    # set, it is automatically derived. It must point to an actual IP address.
    # 设置其他节点用来与这个节点通信的地址。如果没有设定,会自动获取。
    # 必须是一个真实的IP地址。
    #
    # network.publish_host: 192.168.0.1




    # Set both 'bind_host' and 'publish_host':
    # 'bind_host'和'publish_host'都设置
    #
    # network.host: 192.168.0.1




    # Set a custom port for the node to node communication (9300 by default):
    # 为节点之间的通信设置一个自定义端口(默认为9300)
    #
    # transport.tcp.port: 9300




    # Enable compression for all communication between nodes (disabled by default):
    # 为所有的节点间的通信启用压缩(默认为禁用)
    #
    # transport.tcp.compress: true




    # Set a custom port to listen for HTTP traffic:
    # 设置一个监听HTTP传输的自定义端口
    #
    # http.port: 9200




    # Set a custom allowed content length:
    # 设置一个自定义的允许的内容长度
    #
    # http.max_content_length: 100mb




    # Disable HTTP completely:
    # 完全禁用HTTP
    #
    # http.enabled: false

你可能感兴趣的:(elasticSearch)