GP数据库安装以及使用

编 写

潘永雷

时 间

2017年04月13日

说明

本文档用于指导centos下安装gp  greenplum-db-4.3.9.1-build-1-rhel5-x86_64.bin,不同环境可能略有不同,在文档所对应的环境下经验证安装成功

1、机器环境

1.1  系统信息

1

2

3

4

[gpadmin@sdw1 ~]$ uname -a

Linux sdw1 2.6.32-642.el6.x86_64 #1 SMP Tue May 10 17:27:01 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

[gpadmin@sdw1 ~]$ cat /etc/issue

CentOS release 6.8 (Final)

Kernel \r on an \m

1.2 地址分配

192.168.0.223Master
192.168.0.224 Segment

192.168.0.130 225  Segment

 

1.3 系统参数

    参数修改后各个机器需要reboot下才能生效

1.3.1 编辑/etc/sysctl.conf

以下是最小配置(把没有的输进去,不一样的修改之)

kernel.shmmax = 500000000  

kernel.shmmni = 4096  

kernel.shmall = 4000000000  

kernel.sem = 250 512000 100 2048  

kernel.sysrq = 1  

kernel.core_uses_pid = 1  

kernel.msgmnb = 65536  

kernel.msgmax = 65536  

kernel.msgmni = 2048  

net.ipv4.tcp_syncookies = 1  

net.ipv4.ip_forward = 0  

net.ipv4.conf.default.accept_source_route = 0  

net.ipv4.tcp_tw_recycle = 1  

net.ipv4.tcp_max_syn_backlog = 4096  

net.ipv4.conf.all.arp_filter = 1  

net.ipv4.ip_local_port_range = 1025 65535  

net.core.netdev_max_backlog = 10000  

net.core.rmem_max = 2097152  

net.core.wmem_max = 2097152  

vm.overcommit_memory = 2  

1.3.2 修改文件打开数等限制

  编辑/etc/security/limits.conf,添加一下几行(注意*也需要添加)

* soft nofile 65536

* hard nofile 65536

* soft nproc 131072

* hard nproc 131072

1.3.3 在正式安装之前关闭防火墙

  检测防火墙

    #/sbin/chkconfig --list iptables

关闭状态:iptables0:off 1:off 2:off 3:off 4:off 5:off 6:off

使用以下命令关闭

    /sbin/chkconfigiptables off

修改/etc/selinux/config文件

    SELINUX=disabled

出现以下情形则关闭成功(每太机器都要关闭)

1.3.3.编辑/boot/grub/grub.conf

新增elevator=deadline

1.3.4.修改hosts

       master主机的/etc/hosts第一行 localhost localhost.localdomainlocalhost4 localhost4.localdomain4  改为 localhost  mdw localhost4 localhost4.localdomain4

并增加

192.168.0.223 mdw

192.168.0.224 sdw1

192.168.0.225 sdw2

其他两个slave的/etc/hosts只需要改第一行就行,改成sdw1和sdw2

 

1.3.5 修改/etc/sysconfig/network

 

然后修改成下图形式,退出后加上一行命令

hostname mdw

mdw是主节点的主机名,从节点是sdw1sdw2

1.3.6创建用户及用户组
useradd gpadmin

        passwd gpadmin

(三台机器都执行)

1.3.7 创建all_hosts和all_segs文件

在master的/home/gpadmin目录下面创建all_hosts文件,内容如下

mdw

sdw1

sdw2

 

在master的/home/gpadmin目录下面创建all_segs文件,内容如下

sdw1

sdw2

 

2、安装

2. 1、安装数据库软件

2.1.1 Master安装

(a) 得到greenplum-db-*.*.*.*-build-1-RHEL5-x86_64.bin文件,将其拷贝到/usr/local文件夹下进行安装(因为官网默认在此目录安装,为了不至于后面配置参数之类的太麻烦,我们也在这个目录下安装)。

/bin/bash greenplum-db-*.*.*.*-build-1-RHEL5-x86_64.bin

执行以上命令进行安装,过程中可能要按照提示输入几次yes。

(b)修改Greenplum所有者

    # chown -R gpadmin/usr/local/greenplum-db

    # chgrp  -R gpadmin/usr/local/greenplum-db

    # chown -R gpadmin/usr/local/greenplum-db-*.*.*.*

    # chgrp -R gpadmin/usr/local/greenplum-db-*.*.*.*

 

(c)配置环境变量:

    其环境变量在/usr/local/greenplum-db/greenplum_path.sh文件中也有,可以参考

    打开/etc/profile对环境变量修改:

        vim/etc/profile

    添加以下文件:

        GPHOME=/usr/local/greenplum-db-4.3.6.2

        PATH=$GPHOME/bin:$GPHOME/ext/python/bin:$PATH

        exportGPHOME

        exportPATH

(d)设置Master主机上的数据目录,指定数据存放位置,空间要足够。

        #mkdir /gpmaster

        #chown -R gpadmin /gpmaster

        #chgrp -R gpadmin /gpmaster

切记,在设置这个文件夹后需要修改gpadmin的环境变量,否则初始化GP的时候会找不到master的存储文件夹

在gpadmin用户下:

          vim ~/.bashrc

在末尾添加:

        MASTER_DATA_DIRECTORY=/gpmaster

        exportMASTER_DATA_DIRECTORY

之后记得     source~/.bashrc

2.1.1 Segments安装


(a) 主节点上创建安装GP的tar文件

cd /usr/local

gtar -cvf /home/gpadmin/gp.tar greenplum-db-*.*.*.*

(b)复制到segments

scp /home/gpadmin/gp.tar sdw1:/usr/local

 scp/home/gpadmin/gp.tar sdw2:/usr/local

(c) 每个从节点上执行

    gtar--directory  /usr/local -xvf/usr/local/gp.tar

    建立gp当前版本目录连接:

    ln -s/usr/local/greenplum-db-*.*.*.*  /usr/local/greenplum-db

修改目录所有者

 chown -Rgpadmin /usr/local/greenplum-db

chgrp -R gpadmin /usr/local/greenplum-db

chown -R gpadmin /usr/local/greenplum-db-*.*.*.*

chgrp -R gpadmin /usr/local/greenplum-db-*.*.*.*

 

(d)建立segment上的存储区

mkdir /home/gpadmin/primary   #主文件

mkdir /home/gpadmin/mirror #镜像文件

修改权限和所有者(同上一步(c))

2.1.3 初始化

   (1)同步时钟

在gpadmin下:

检查时钟:gpssh -f /home/gpadmin/all_hosts  -v  date

同步: gpssh -f /home/gpadmin/all_hosts  -v  ntpd

 

(2)系统检测:

在gpadmin下:

gpcheckos -f  /home/gpadmin/all_hosts

 

 cp /usr/local/greenplum-db/docs/cli_help/gpconfigs/gpinitsystem_config/home/gpadmin/gpconfigs/ 并修改gpinitsystem_config

# FILE NAME: gpinitsystem_config
# Configuration file needed by the gpinitsystem
################################################
#### REQUIRED PARAMETERS
################################################
#### Name of this Greenplum system enclosed in quotes.
ARRAY_NAME="EMC Greenplum DW"
#### Naming convention for utility-generated data directories.
SEG_PREFIX=gpseg
#### Base number by which primary segment port numbers
#### are calculated.
PORT_BASE=40000
#### File system location(s) where primary segment data directories
#### will be created. The number of locations in the list dictate
#### the number of primary segments that will get created per#### physical host (if multiple addresses for a host are listed in
#### the hostfile, the number of segments will be spread evenly across
#### the specified interface addresses).
declare -a DATA_DIRECTORY=(/home/gpadmin/primary)
#### OS-configured hostname or IP address of the master host.
MASTER_HOSTNAME=mdw
#### File system location where the master data directory
#### will be created.
MASTER_DIRECTORY=/gpmaster
#### Port number for the master instance.
MASTER_PORT=5432
#### Shell utility used to connect to remote hosts.
TRUSTED_SHELL=ssh
#### Maximum log file segments between automatic WAL checkpoints.
CHECK_POINT_SEGMENTS=8
#### Default server-side character set encoding.
ENCODING=UNICODE
################################################
#### OPTIONAL MIRROR PARAMETERS
################################################
#### Base number by which mirror segment port numbers
#### are calculated.
#MIRROR_PORT_BASE=50000
#### Base number by which primary file replication port
#### numbers are calculated.
#MIRROR_REPLICATION_PORT_BASE=51000
#### File system location(s) where mirror segment data directories
#### will be created. The number of mirror locations must equal the
#### number of primary locations as specified in the
#### DATA_DIRECTORY parameter.
##declare -a MIRROR_DATA_DIRECTORY=(/home/gpadmin/mirror /home/gpadmin/mirror /data1/mirror /data2/mirror /data2/mirror /data2/mirror)
declare -a MIRROR_DATA_DIRECTORY=(/home/gpadmin/mirror)
################################################
#### Create a database of this name after initialization.
#DATABASE_NAME=name_of_database
DATABASE_NAME=gpexmp
#### Specify the location of the host address file here instead of
#### with the the -h option of gpinitsystem.
MACHINE_LIST_FILE=/home/gpadmin/all_segs

(4)初始化数据库:

gpinitsystem-c /home/gpadmin/gpconfigs/gpinitsystem_config

会出现很多info或者warn,如果有以下信息,说明初始化成功,

然后输入Y。

这里在初始化之后建立了一个叫gpexmp的数据库,我们可以进行测试:

gpadmin下:

执行以下查询

    psql  -d  gpexmp

 

3、安装错误处理

3.1 错误1

 error while loading shared libraries:libnetsnmp.so.20: cannot open shared object file: No such file or directory

 是因为环境变量配置的不正确,需要将/usr/local/greenplum-db/greenplum_path.sh 的内容复制到.bashrc下,相当于依赖LD_LIBRARY_PATH环境变量

3.2 错误2

 [gpadmin@mdw gpseg-1]$  gpstart -v

20170412:16:23:44:003224gpstart:mdw:gpadmin-[INFO]:-Starting gpstart with args: -v

20170412:16:23:44:003224gpstart:mdw:gpadmin-[DEBUG]:-Setting level of parallelism to: 64

20170412:16:23:44:003224gpstart:mdw:gpadmin-[INFO]:-Gathering information and validating theenvironment...

20170412:16:23:44:003224gpstart:mdw:gpadmin-[DEBUG]:-Checking if GPHOME env variable is set.

20170412:16:23:44:003224gpstart:mdw:gpadmin-[DEBUG]:-Checking if MASTER_DATA_DIRECTORY env variable isset.

20170412:16:23:44:003224gpstart:mdw:gpadmin-[DEBUG]:-Checking if LOGNAME or USER env variable is set.

20170412:16:23:44:003224gpstart:mdw:gpadmin-[DEBUG]:---Checking that current user can use GP binaries

20170412:16:23:44:003224gpstart:mdw:gpadmin-[DEBUG]:-Obtaining master's port from master data directory

20170412:16:23:44:003224gpstart:mdw:gpadmin-[DEBUG]:-Read from postgresql.conf port=5432

20170412:16:23:44:003224gpstart:mdw:gpadmin-[DEBUG]:-Read from postgresql.conf max_connections=250

20170412:16:23:44:003224gpstart:mdw:gpadmin-[DEBUG]:-gp_external_grant_privileges is None

20170412:16:23:44:003224gpstart:mdw:gpadmin-[INFO]:-Reading the gp_dbid file - /gpmaster/gp_dbid...

20170412:16:23:44:003224gpstart:mdw:gpadmin-[ERROR]:-gpstart failed. exiting...

Traceback (most recent call last):

 File"/usr/local/greenplum-db/lib/python/gppylib/mainUtils.py", line 281,in simple_main_locked

   exitCode = commandObject.run()

 File "/usr/local/greenplum-db/./bin/gpstart", line 95, in run

   self._prepare()

 File "/usr/local/greenplum-db/./bin/gpstart", line 196, in_prepare

   self._basic_setup()

 File "/usr/local/greenplum-db/./bin/gpstart", line 212, in_basic_setup

   self.dbidfile = GpDbidFile(self.master_datadir, do_read=True,logger=get_logger_if_verbose())

 File "/usr/local/greenplum-db/lib/python/gppylib/gp_dbid.py",line 39, in __init__

   self.read_gp_dbid()

 File "/usr/local/greenplum-db/lib/python/gppylib/gp_dbid.py",line 49, in read_gp_dbid

   with open(self.filepath) as f:

IOError: [Errno 2] No such file ordirectory: '/gpmaster/gp_dbid'

 

 

 

 gpstart -d /gpmaster/gpseg-1/  -v

 

3.3 错误3

 gpstop -u

20170412:16:28:07:003451gpstop:mdw:gpadmin-[INFO]:-Starting gpstop with args: -u

20170412:16:28:07:003451gpstop:mdw:gpadmin-[INFO]:-Gathering information and validating theenvironment...

20170412:16:28:07:003451gpstop:mdw:gpadmin-[ERROR]:-gpstop error: postmaster.pid file does notexist.  is Greenplum instance alreadystopped?

 

改为如下执行:  gpstop -u -d/gpmaster/gpseg-1/

 

3.4.  错误4

 


 Command was: 'env GPSESSID=0000000000 GPKILL=NEVER GPERA=None$GPHOME/bin/pg_ctl -D /data/master/gpseg-1 -l /data/master/gpseg-1/pg_log/startup.log-w -t 600 -o " -p 5432 -b 1 -z 0 --silent-mode=true -i -M master -C -1 -x0 -c gp_role=utility " start'
rc=1, stdout='waiting for server to start......could not start server
', stderr='pg_ctl: PID file "/data/master/gpseg-1/postmaster.pid"does not exist

在配置pg_hba.conf文件时需要小心,这个文件出错后启动会报这个错误

 

4. 其他

4.1 安装后修改端口

如果已经安装上gp再修改端口,需要改.bashrc 和 postgresql.conf

.bashrc 需要 exportPGPORT=15432

4.2 安装参考链接

http://blog.csdn.net/yxlllll/article/details/50266269此次安装参考这个链接

 

 

/usr/local/greenplum-db

/gpmaster/gpseg-1/pg_hba.conf

 

 

 

4. GPLoad安装

 

安装好gpfdist后,gpload也自动有了,可以自动使用。

基础组件

wget http://pyyaml.org/download/libyaml/yaml-0.1.7.tar.gz

tar -xvfyaml-0.1.7.tar.gz

cdyaml-0.1.7

./configure                                                                            

make

makeinstall

 

 

unzipgreenplum-loaders-4.3.8.2-build-1-RHEL5-x86_64.zip

shgreenplum-loaders-4.3.8.2-build-1-RHEL5-x86_64.bin –y

默认安装在这个目录下面了/usr/local/greenplum-loaders-4.3.8.1-build-1

 

修改/etc/profile

PATH=/usr/local/greenplum-loaders-4.3.8.1-build-1/bin:$JAVA_HOME/bin:$PATH

source/usr/local/greenplum-loaders-4.3.8.1-build-1/greenplum_loaders_path.sh

 

执行source/usr/local/greenplum-loaders-4.3.8.1-build-1/greenplum_loaders_path.sh

 

可以创建外表指定gpfdist,外表指定gpfdist路径,数据没有移动到gp。  也可以用gpload导入。

 

Gpload下载链接 https://network.pivotal.io/products/pivotal-gpdb/#/releases/2146

参考链接

http://www.infocool.net/kb/OtherDB/201705/360938.html

http://blog.csdn.net/mchdba/article/details/72540806

https://discuss.pivotal.io/hc/en-us/articles/115002064167-GPLOAD-Unable-to-Import-the-PyGreSQL-Python-Module-pg-py-

 

你可能感兴趣的:(大数据)