1.这个问题不是很简单么?
直接设置不久好了,java -Xmx2000m像这样不就好了。No,我说的不仅仅是这个问题,比如,你看到
/usr/local/bigdata/jdk/bin/java -Xmx2048m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/usr/local/bigdata/hadoop/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/local/bigdata/hadoop -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,console -Djava.library.path=/usr/local/bigdata/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Xmx512m -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.util.RunJar /usr/local/bigdata/hive/lib/hive-cli-0.13.0.jar org.apache.hadoop.hive.cli.CliDriver
为什么出现2个-Xmx,这二个货是从哪里冒出来的?
其实,根本问题是我想调整hiveserver2的jvm Xmx大小,我想直接修改hive/conf/hive-env.sh,修改
# The heap size of the jvm stared by hive shell script can be controlled via:
#
export HADOOP_HEAPSIZE=2000
#
不就好了么,可是,我太天真了,任凭我怎么修改,都是没有用的!!
2.那究竟怎么修改?或者说这些参数是在哪里设定的?
我们一起看看:
1)hive/hiveserver2的命令最终都指向了hive/bin/ext/util/execHiveCmd.sh
cat hive/bin/ext/util/execHiveCmd.sh
execHiveCmd () {
......
# hadoop 20 or newer - skip the aux_jars option. picked up from hiveconf
exec $HADOOP jar ${HIVE_LIB}/hive-cli-*.jar $CLASS $HIVE_OPTS "$@"
}
也即hive所有命令都是通过hadoop jar命令加上自己的hive相关的jar包名执行的。
2)所以我们看hadoop/bin/hadoop命令
cat hadoop/bin/hadoop
. $HADOOP_LIBEXEC_DIR/hadoop-config.sh
......
exec "$JAVA" $JAVA_HEAP_MAX $HADOOP_OPTS $CLASS "$@"
即先执行了hadoop/libexec/hadoop-config.sh再执行java命令的,注意$JAVA_HEAP_MAX和$HADOOP_OPTS
3)看hadoop/libexec/hadoop-config.sh
cat hadoop/libexec/hadoop-config.sh
if [ -f "${HADOOP_CONF_DIR}/hadoop-env.sh" ]; then
. "${HADOOP_CONF_DIR}/hadoop-env.sh"
fi
......
# some Java parameters
JAVA_HEAP_MAX=-Xmx1000m
# check envvars which might override default args
if [ "$HADOOP_HEAPSIZE" != "" ]; then
#echo "run with heapsize $HADOOP_HEAPSIZE"
JAVA_HEAP_MAX="-Xmx""$HADOOP_HEAPSIZE""m"
#echo $JAVA_HEAP_MAX
fi
先执行hadoop/conf/hadoop-env.sh,获取hadoop中的HADOOP_HEAPSIZE
看到这里应该就明白了,hive/conf/hive-env.sh即使设定了HADOOP_HEAPSIZE=2000,也会被后来的再执行hadoop命令的时候给覆盖掉。
4)那之前hive命令中出现的Xmx512m又是从哪里来的呢?
cat hadoop/conf/hadoop-env.sh
# The following applies to multiple commands (fs, dfs, fsck, distcp etc)
export HADOOP_CLIENT_OPTS="-Xmx512m $HADOOP_CLIENT_OPTS"
#HADOOP_JAVA_PLATFORM_OPTS="-XX:-UsePerfData $HADOOP_JAVA_PLATFORM_OPTS"
cat hadoop/bin/hadoop
# Always respect HADOOP_OPTS and HADOOP_CLIENT_OPTS
HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
export CLASSPATH=$CLASSPATH
exec "$JAVA" $JAVA_HEAP_MAX $HADOOP_OPTS $CLASS "$@"
原来是通过在hadoop/conf/hadoop-env.sh中的HADOOP_CLIENT_OPTS设定死的
5)那hadoop中启动一个namenode/datanode/resourcemanager/nodemanager等又是怎么设定的呢?长话短说,整理一张图
所以问题的根本在于hadoop/hdfs命令都公用了HADOOP_HEAPSIZE这个参数(hive命令也实际上是hadoop命令),而yarn则非常好的没有公用,而是采用YARN_HEAPSIZE/YARN_RESOURCEMANAGER_HEAPSIZE/YARN_NODEMANAGER_HEAPSIZE分别设置。
3.结论:
为了保证hadoop集群中各个组件Xmx参数可以独立设置不干扰(关键是hive中可以设定Xmx大小),而且结构清晰层次分明:
1)对于namenode/datanode通过设定HADOOP_NAMENODE_OPTS/HADOOP_DATANODE_OPTS设定Xmx
2)取消hadoop/conf/hadoop-env.sh中HADOOP_HEAPSIZE设定
3)取消HADOOP_CLIENT_OPTS中Xmx的设定
4)设定hive/conf/hive-env.sh中HADOOP_HEAPSIZE
最后的最后,
写的口吐白沫终于写完了,就这样啊,再见