elasticsearch 设置 —— 基本配置

configuration 配置


environment variables 环境变量

通过脚本,Elasticsearch 会将启动脚本中的JAVA_OPTS 选项传递给 JVM 来启动elasticsearch. 其中最重要的一个参数是 -Xmx ,此参数用于控制系统分配给elasticsearch 进程的最大内存量。另外 -Xms用于控制系统分配给elasticsearch进程的最小内存量(通常情况下,分配的内存越多越好).

多数情况下,尽量保持 JAVA_OPTS的默认配置,通过使用 ES_JAVA_OPTS环境变量来设置或改变现有的JVM设置。ES_HEAP_SIZE 环境变量用于设置分配给elasticsearch java进程的堆内存量。通常情况下,它将会把最大最小值设置为同一个值,尽管这两个值可以分别设置(通过 ES_MIN_MEM,默认为256m,和ES_MAX_MEM 默认为1gb)。
建议将内存的大小限制设置为相同值。
elasticsearch 启动脚本:
#!/bin/sh

# OPTIONS:
#   -d: daemonize, start in the background
#   -p : log the pid to a file (useful to kill it later)

# CONTROLLING STARTUP:
#
# This script relies on few environment variables to determine startup
# behavior, those variables are:
#
#   ES_CLASSPATH -- A Java classpath containing everything necessary to run.
#   JAVA_OPTS    -- Additional arguments to the JVM for heap size, etc
#   ES_JAVA_OPTS -- External Java Opts on top of the defaults set
#
#
# Optionally, exact memory values can be set using the following values, note,
# they can still be set using the `ES_JAVA_OPTS`. Sample format include "512m", and "10g".
#
#   ES_HEAP_SIZE -- Sets both the minimum and maximum memory to allocate (recommended)
#
# As a convenience, a fragment of shell is sourced in order to set one or
# more of these variables. This so-called `include' can be placed in a
# number of locations and will be searched for in order. The lowest
# priority search path is the same directory as the startup script, and
# since this is the location of the sample in the project tree, it should
# almost work Out Of The Box.
#
# Any serious use-case though will likely require customization of the
# include. For production installations, it is recommended that you copy
# the sample to one of /usr/share/elasticsearch/elasticsearch.in.sh,
# /usr/local/share/elasticsearch/elasticsearch.in.sh, or
# /opt/elasticsearch/elasticsearch.in.sh and make your modifications there.
#
# Another option is to specify the full path to the include file in the
# environment. For example:
#
#   $ ES_INCLUDE=/path/to/in.sh elasticsearch -p /var/run/es.pid
#
# Note: This is particularly handy for running multiple instances on a
# single installation, or for quick tests.
#
# If you would rather configure startup entirely from the environment, you
# can disable the include by exporting an empty ES_INCLUDE, or by
# ensuring that no include files exist in the aforementioned search list.
# Be aware that you will be entirely responsible for populating the needed
# environment variables.


# Maven will replace the project.name with elasticsearch below. If that
# hasn't been done, we assume that this is not a packaged version and the
# user has forgotten to run Maven to create a package.
IS_PACKAGED_VERSION='elasticsearch'
if [ "$IS_PACKAGED_VERSION" != "elasticsearch" ]; then
    cat >&2 << EOF
Error: You must build the project with Maven or download a pre-built package
before you can run Elasticsearch. See 'Building from Source' in README.textile
or visit http://www.elasticsearch.org/download to get a pre-built package.
EOF
    exit 1
fi

CDPATH=""
SCRIPT="$0"

# SCRIPT may be an arbitrarily deep series of symlinks. Loop until we have the concrete path.
while [ -h "$SCRIPT" ] ; do
  ls=`ls -ld "$SCRIPT"`
  # Drop everything prior to ->
  link=`expr "$ls" : '.*-> \(.*\)$'`
  if expr "$link" : '/.*' > /dev/null; then
    SCRIPT="$link"
  else
    SCRIPT=`dirname "$SCRIPT"`/"$link"
  fi
done

# determine elasticsearch home
ES_HOME=`dirname "$SCRIPT"`/..

# make ELASTICSEARCH_HOME absolute
ES_HOME=`cd "$ES_HOME"; pwd`


# If an include wasn't specified in the environment, then search for one...
if [ "x$ES_INCLUDE" = "x" ]; then
    # Locations (in order) to use when searching for an include file.
    for include in /usr/share/elasticsearch/elasticsearch.in.sh \
                   /usr/local/share/elasticsearch/elasticsearch.in.sh \
                   /opt/elasticsearch/elasticsearch.in.sh \
                   ~/.elasticsearch.in.sh \
                   "`dirname "$0"`"/elasticsearch.in.sh; do
        if [ -r "$include" ]; then
            . "$include"
            break
        fi
    done
# ...otherwise, source the specified include.
elif [ -r "$ES_INCLUDE" ]; then
    . "$ES_INCLUDE"
fi

if [ -x "$JAVA_HOME/bin/java" ]; then
    JAVA="$JAVA_HOME/bin/java"
else
    JAVA=`which java`
fi

if [ ! -x "$JAVA" ]; then
    echo "Could not find any executable java binary. Please install java in your PATH or set JAVA_HOME"
    exit 1
fi

if [ -z "$ES_CLASSPATH" ]; then
    echo "You must set the ES_CLASSPATH var" >&2
    exit 1
fi

# Special-case path variables.
case `uname` in
    CYGWIN*)
        ES_CLASSPATH=`cygpath -p -w "$ES_CLASSPATH"`
        ES_HOME=`cygpath -p -w "$ES_HOME"`
    ;;
esac

launch_service()
{
    pidpath=$1
    daemonized=$2
    props=$3
    es_parms="-Delasticsearch"

    if [ "x$pidpath" != "x" ]; then
        es_parms="$es_parms -Des.pidfile=$pidpath"
    fi

    # The es-foreground option will tell Elasticsearch not to close stdout/stderr, but it's up to us not to daemonize.
    if [ "x$daemonized" = "x" ]; then
        es_parms="$es_parms -Des.foreground=yes"
        exec "$JAVA" $JAVA_OPTS $ES_JAVA_OPTS $es_parms -Des.path.home="$ES_HOME" -cp "$ES_CLASSPATH" $props \
                org.elasticsearch.bootstrap.Elasticsearch
        # exec without running it in the background, makes it replace this shell, we'll never get here...
        # no need to return something
    else
        # Startup Elasticsearch, background it, and write the pid.
        exec "$JAVA" $JAVA_OPTS $ES_JAVA_OPTS $es_parms -Des.path.home="$ES_HOME" -cp "$ES_CLASSPATH" $props \
                    org.elasticsearch.bootstrap.Elasticsearch <&- &
        return $?
    fi
}

# Parse any long getopt options and put them into properties before calling getopt below
# Be dash compatible to make sure running under ubuntu works
ARGV=""
while [ $# -gt 0 ]
do
    case $1 in
      --*=*) properties="$properties -Des.${1#--}"
           shift 1
           ;;
      --*) properties="$properties -Des.${1#--}=$2"
           shift 2
           ;;
      *) ARGV="$ARGV $1" ; shift
    esac
done

# Parse any command line options.
args=`getopt vdhp:D:X: $ARGV`
eval set -- "$args"

while true; do
    case $1 in
        -v)
            "$JAVA" $JAVA_OPTS $ES_JAVA_OPTS $es_parms -Des.path.home="$ES_HOME" -cp "$ES_CLASSPATH" $props \
                    org.elasticsearch.Version
            exit 0
        ;;
        -p)
            pidfile="$2"
            shift 2
        ;;
        -d)
            daemonized="yes"
            shift
        ;;
        -h)
            echo "Usage: $0 [-d] [-h] [-p pidfile]"
            exit 0
        ;;
        -D)
            properties="$properties -D$2"
            shift 2
        ;;
        -X)
            properties="$properties -X$2"
            shift 2
        ;;
        --)
            shift
            break
        ;;
        *)
            echo "Error parsing argument $1!" >&2
            exit 1
        ;;
    esac
done

# Start up the service
launch_service "$pidfile" "$daemonized" "$properties"

exit $?


system configuration 系统配置

file descriptors 文件描述符

确保增加机器中可打开的文件描述符个数,建议在32k~64k。为了能检测进程可打开的文件描述符的个数,在es启动时添加参数 -Des.max-open-files 并设置为 true ,这样可以显示进程可以打开的文件描述符的个数。

或者,你也可以检索节点的max_file_descriptors信息,通过使用 Node Info API:

curl localhost:9200/_nodes/process?pretty

memory settings 内存设置

Linux 内核会为文件系统缓存分配尽可能多的内存,它会急切的将未使用的应用程序的内存换出。这样就可能导致elasticsearch进程内存被换出。内存换入换出对elasticsearch来说是非常有害于性能和稳定性的,所以我们应该尽量避免。有三个选项可供使用:
  • 禁用交换
最简单的方法是完全禁用内存交换,通常Elasticsearch是在一个机器上运行的唯一服务,它的内存使用量由ES_HEAP_SIZE环境变量控制。应该没有必要启用交换。 在Linux系统中,你可以暂时禁用交换:
 sudo swapoff -a

也可以永久的禁用交换,编辑/etc/fstab文件,注释掉包含swap词的所有行。
  • 配置 swappiness
 通过将 vim.swapniess 设置为0可以使系统内核在一般情况下不将es进程占用的内存交换,但是在紧急情况下允许交换。
在3.5 -rc1 以及以上的内核中,如果将swapniess 设置为1 会导致OOM直接杀死进程,而不会交换。这种情况下应该将swapniess设置为1,以保证在紧急情况下仍能进行交换。
  • mlockall  

这种配置方法仅适用于 Linux/Unix系统。使用 mlockall锁住elasticsearch进程使用的内存空间。这样也可以禁止此内存空间被换出。如果采用这种方式的话需要在 config/elasticsearch.yml 文件中添加:

bootstrap.mlockall: true

在启动elasticsearch后你可以通过查看mlockall域来查看内存是否被锁住:

curl localhost:9200/_nodes/process?pretty

如果看到mlockall选项为false的话,说明此设置没有应用成功,通常情况下是因为启动elasticsearch的用户没有锁住内存的权限,这时可以切换到root重新启动。另外一种原因就是系统的临时目录/tmp挂载时启用了noexec选项,这时为elasticsearch重新指定临时目录就可以了:

./bin/elasticsearch -Djna.tmpdir=/path/to/new/dir

mlockall 可能会导致JVM或者shell会话退出,当它尝试去分配更多内存(已经超出了可用内存)的时候。


elasticsearch设置

elasticsearch的配置文件在 ES_HOME/config 目录下,此目录下有两个配置文件 elasticsearch.yml 用于配置elasticsearch的各个模块,logging.yml用于配置elasticsearch日志。
配置文件格式为 YMAL。

paths 路径设置

在实际应用中,你几乎肯定会想更改数据文件存储 路径和日志文件存储路径:
path:
  logs: /var/log/elasticsearch
  data: /var/data/elasticsearch

cluster name 集群名称

不要忘记给你的集群一个名称,此名称用于唯一标识集群并且自动发现并添加节点:
cluster:
  name: 

node name 节点名称

您可能还需要为每个节点设置名称,例如设置为主机名。默认情况下elasticsearch会随机选取节点名称。
node:
  name: 

在内部,上述配置都会被组合成名称空间表示形式,例如 node.name, path.logs,cluster.name 等。这意味着你可以使用其它类格式的配置文件,例如JSON格式的。如果配置文件为JSON格式的,那么只需要将elasticsearch.yml 改为elasticsearch.json
并按照如下方式配置:

configuration styles 配置风格

{
    "network" : {
        "host" : "10.0.0.4"
    }
}

这也意味着,它很容易从外部传递参数进行配置,例如:
$ elasticsearch -Des.network.host=10.0.0.4

另一种方式是将 es.default 前缀代替 es.  前缀,这意味着默认配置将会被使用,如果配置文件中没有显式配置的话。还有一种选择是 在配置文件中 使用${...}符号,它将被解析为环境变量值,例如:
{
    "network" : {
        "host" : "${ES_NET_HOST}"
    }
}

配置文件的位置可以通过系统属性指定在外部:
$ elasticsearch -Des.config=/path/to/config/file

index settings 索引设置

在集群中创建索引时可以提供自己的设置。例如,以下代码创建一个基于内存存储的 索引而不是默认存储在文件系统中的索引一个(提交数据格式可以是YMAL 或者 JSON):
$ curl -XPUT http://localhost:9200/kimchy/ -d \
'
index :
    store:
        type: memory
'

索引的设置,也可以在节点级别中完成,这样会使该节点中的索引都会存储在内存,除非该索引被显式配置,在配置文件中:
index :
    store:
        type: memory

换句话说,索引级别的配置可以覆盖节点级别的配置。也可以通过如下方式设置:
$ elasticsearch -Des.index.store.type=memory

logging  日志

在elasticsearch内部,使用log4j来生成日志,可以按照YMAL格式来简化log4j的配置。










你可能感兴趣的:(Elasticsearch)