greenplum 集群启动报错 Do not have enough valid segments to start the array.
前提:
集群配置完成后,有些集群配置需要优化调整一下:
设置work_mem 64MB
查看配置
gpconfig -s work_mem
Values on all segments are consistent
GUC : work_mem
Master value: 32MB
Segment value: 32MB
修改配置
gpconfig -c work_mem -v 64M
重启集群加载配置
重新加载配置文件 postgresql.conf 和 pg_hba.conf
gpstop -u
重启报错如下:
查看报错日志:
/home/gpadmin/gpAdminLogs/gpstart_20180904.log
[INFO]:-----------------------------------------------------
[INFO]:- Successful segment starts = 0
[WARNING]:-Failed segment starts = 32 <<<<<<<<
[INFO]:- Skipped segment starts (segments are marked down in configuration) = 0
[INFO]:-----------------------------------------------------
[INFO]:-Successfully started 0 of 32 segment instances <<<<<<<<
[INFO]:-----------------------------------------------------
[WARNING]:-Segment instance startup failures reported
[WARNING]:-Failed start 32 of 32 segment instances <<<<<<<
[WARNING]:-Review /home/gpadmin/gpAdminLogs/gpstart_20180904.log
[INFO]:-----------------------------------------------------
[INFO]:-Commencing parallel segment instance shutdown, please wait...
[ERROR]:-gpstart error: Do not have enough valid segments to start the array.
解决办法:
根据报错信息,在网上搜了一下,发现这个是个很粗的报错,参数设置过大、主机异常、配置错误都会报这个错。。。。。。。
试着根据提示修改了一下master节点的配置,将修改的配置注销,再次重启集群,发现集群还是无法启动。报错如下:
20180904:18:53:20:108168 gpstart:cndh1322-6-15:gpadmin-[INFO]:-Starting Master instance in admin mode
20180904:19:03:21:108168 gpstart:cndh1322-6-15:gpadmin-[CRITICAL]:-Failed to start Master instance in admin mode
20180904:19:03:21:108168 gpstart:cndh1322-6-15:gpadmin-[CRITICAL]:-Error occurred: non-zero rc: 1
Command was: 'env GPSESSID=0000000000 GPERA=None $GPHOME/bin/pg_ctl -D /usr/local/gpdata/gpmaster/gpseg-1 -l /usr/local/gpdata/gpmaster/gpseg-1/pg_log/startup.log -w -t 600 -o " -p 5432 --gp_dbid=
1 --gp_num_contents_in_cluster=0 --silent-mode=true -i -M master --gp_contentid=-1 -x 34 -c gp_role=utility " start'
rc=1, stdout='waiting for server to start............................................................................................................................................................
.....................................................................................................................................................................................................
.....................................................................................................................................................................................................
..................................................... stopped waiting
查看master 启动日志 发现报错内容如下:
more /usr/local/gpdata/gpmaster/gpseg-1/pg_log/startup.log
2018-09-04 11:04:07.898274 GMT,,,p127931,th1064769408,,,,0,,,seg-1,,,,,"FATAL","22023","invalid value for parameter ""work_mem"": ""64M""",,"Valid units for this parameter are ""kB"", ""MB"", and "
"GB"".",,,,,,"set_config_option","guc.c",4874,
通过以上错误内容可以看出是配置参数错误导致的!
修改配置 ,gpconfig -c work_mem -v 64MB 不能写成 64M,服务认为配置错误,所以集群无法启动,将master 节点的配置在之前排查错误过程中已经注销了,为啥还不能启动哪?然后登陆一台segment 节点发现 segment节点的配置文件也已经被修改了,所以segment进程起不来。
最终解决;
快速启动,进入维护模式:
gpstart -a -m
调整参数:
gpconfig -c work_mem -v 64MB
启动集群、集群可以正常启动;
gpstart
故障总结:
1:使用gpconfig 修改参数会传递到集群每一个节点的配置文件;
2:gpconfig与集群耦合较松,输入的错误也会被写入配置;
3:修改参数前先查询配置现有值,参照原始参数进行修改;