Greenplum学习实践-【安装部署】-2、 5.10集群部署

  • Greenplum学习实践-【安装部署】-2、 5.10集群部署

 

 

第 1 章** 环境说明

1.1 官方文档

Greenplum官方安装说明:

https://gpdb.docs.pivotal.io/5160/install_guide/install_extensions.html

 

 

 

1.2 系统要求

 

Operating System *• Red Hat Enterprise Linux 64-bit 7.x (See the following Note)**• Red Hat Enterprise Linux 64-bit 6.x**• SuSE Linux Enterprise Server 64-bit 12 SP2 and SP3 with kernel version greater than 4.4.73-5. (See**the following Note)**• SuSE Linux Enterprise Server 64-bit 11 SP4 (See the following Note)**• CentOS 64-bit 7.x**• CentOS 64-bit 6.x*
File Systems • xfs required for data storage on SUSE Linux and Red Hat (ext3 supported for root file system)
Minimum CPU Pentium Pro compatible (P3/Athlon and above)

 

Minimum Memory 16 GB RAM per server
Disk Requirements • 150MB per host for Greenplum installation • Approximately 300MB per segment instance for meta data • Appropriate free space for data with disks at no more than 70% capacity • High-speed, local storage
Network Requirements 10 Gigabit Ethernet within the array Dedicated, non-blocking switch NIC bonding is recommended when multiple interfaces are present
Software and Utilities zlib compression libraries bash shell GNU tars GNU zip GNU sed (used by Greenplum Database gpinitsystem) perl secure shell

*Important**:* SSL is supported only on the Greenplum Database master host system.

 

1.3 搭建环境介绍

 

操作系统:CentOS Linux release 7.4.1708 (Core)

 

机器型号 PowerEdge R330 *4
CPU Intel(R) Xeon(R) CPU E3-1220 v5 @ 3.00GHz *4物理核心(16cores)
内存 16G
磁盘 4T
Swap 32G

 

 

 

 

第 2 章 安装架构

2.1 安装大致步骤

Perform the following tasks in order: *1.* Make sure your systems meet the System Requirements *2.* Setting the Greenplum Recommended OS Parameters *3.* (master only) Creating the Greenplum Database Administrative User Account *4.* (master only) Installing the Greenplum Database Software *5.* Installing and Configuring Greenplum on all Hosts *6.* (Optional) Installing Oracle Compatibility Functions *7.* (Optional) Installing Optional Modules *8.* (Optional) Installing Greenplum Database Extensions *9.* (Optional) Installing and Configuring the Greenplum Platform Extension Framework (PXF) *10.*Creating the Data Storage Areas *11.*Synchronizing System Clocks *12.*Next Steps

 

2.2 软件信息

greenplum-db-5.10.2-rhel6-x86_64.zip是mpp软件,

greenplum-cc-web-4.3.1-LINUX-x86_64.zip是web监控平台

 

2.3 架构说明

准备 4 台服务器, 1 台做 master, 3台都做存储共部署 6 个segment 及其镜像

IP 主机名 cpu 内存 组件规划
10.102.254.24 sdw1 16 16 4*segment
10.102.254.25 sdw2 16 16 4*segment
10.102.254.26 sdw3 16 16 4*segment
10.102.254.27 mdw1 16 16 master

 

=架构目标==

 

mdw sdw1 sdw2 sdw3
master seg0p seg1p seg5m seg4m seg2p seg3p seg0m seg1m seg4p seg5p seg2m seg3m smdw

 

第 3 章 安装架构

3.1 关闭防火墙和selinux-所有节点

 systemctl stop firewalld.service 
 ​
 systemctl disable firewalld.service 
 ​
 iptables -F 
 ​
 vi /etc/selinux/config SELINUX=DISABLED
 ​
  # sestatus SELinuxstatus: disabled   
 ​
 ​

 

3.2 修改host文件 -所有节点

 

  1.  1. vi /etc/hosts 
     2. 
     3. 10.102.254.24      sdw1 
     4. 10.102.254.25      sdw2 
     5. 10.102.254.26      sdw3 
     6. 10.102.254.27      mdw 

     

 

 

 

3.2 修改主机名

 

 hostnamectl set-hostname  sdw1
 ​
 hostnamectl status  状态
 ​
 /etc/sysconfig/network
 ​
 ​

 

3.3 系统内核参数优化

 

 ​
 ​
 vi /etc/sysctl.conf
 ​
 kernel.shmmax = 500000000
 ​
 kernel.shmmni = 4096
 ​
 kernel.shmall = 4000000000
 ​
 kernel.sem = 250 512000 100 2048
 ​
 kernel.sysrq = 1
 ​
 kernel.core_uses_pid = 1
 ​
 kernel.msgmnb = 65536
 ​
 kernel.msgmax = 65536
 ​
 kernel.msgmni = 2048
 ​
 net.ipv4.tcp_syncookies = 1
 ​
 net.ipv4.ip_forward = 0
 ​
 net.ipv4.conf.default.accept_source_route = 0
 ​
 net.ipv4.tcp_tw_recycle = 1
 ​
 net.ipv4.tcp_max_syn_backlog = 4096
 ​
 net.ipv4.conf.all.arp_filter = 1
 ​
 net.ipv4.ip_local_port_range = 1025 65535
 ​
 net.core.netdev_max_backlog = 10000
 ​
 net.core.rmem_max = 2097152
 ​
 net.core.wmem_max = 2097152
 ​
 vm.overcommit_memory = 2 
 ​
  
 ​
 sysctl -p
 ​
  
 ​
  
 ​
 cat > /etc/sysctl.conf << EOF
 ​
 \# sysctl settings are defined through files in
 ​
 \# /usr/lib/sysctl.d/, /run/sysctl.d/, and /etc/sysctl.d/.
 ​
 \#
 ​
 \# Vendors settings live in /usr/lib/sysctl.d/.
 ​
 \# To override a whole file, create a new file with the same in
 ​
 \# /etc/sysctl.d/ and put new settings there. To override
 ​
 \# only specific settings, add a file with a lexically later
 ​
 \# name in /etc/sysctl.d/ and put new settings there.
 ​
 \#
 ​
 \# For more information, see sysctl.conf(5) and sysctl.d(5).
 ​
  
 ​
 kernel.shmmax = 500000000
 ​
 kernel.shmmni = 4096
 ​
 kernel.shmall = 4000000000
 ​
 kernel.sem = 500 1024000 200 4096
 ​
 kernel.sysrq = 1
 ​
 kernel.core_uses_pid = 1
 ​
 kernel.msgmnb = 65536
 ​
 kernel.msgmax = 65536
 ​
 kernel.msgmni = 2048
 ​
 net.ipv4.tcp_syncookies = 1
 ​
 net.ipv4.ip_forward = 0
 ​
 net.ipv4.conf.default.accept_source_route = 0
 ​
 net.ipv4.tcp_tw_recycle = 1
 ​
 net.ipv4.tcp_max_syn_backlog = 4096
 ​
 net.ipv4.conf.all.arp_filter = 1
 ​
 net.ipv4.ip_local_port_range = 1025 65535
 ​
 net.core.netdev_max_backlog = 10000
 ​
 net.core.rmem_max = 2097152
 ​
 net.core.wmem_max = 2097152
 ​
 vm.overcommit_memory = 2
 ​
 vm.swappiness = 1
 ​
 kernel.pid_max = 655350
 ​
 EOF
 ​
  
 ​
 sysctl -p

 

3.4 修改Linux最大限制

 

 cat /etc/security/limits.conf
 ​
  
 ​
  vi /etc/security/limits.conf
 ​
  
 ​
 * soft nofile 65536
 ​
 * hard nofile 65536
 ​
 * soft nproc 131072
 ​
 * hard nproc 131072
 ​
  
 ​
 cat > /etc/security/limits.conf << EOF
 ​
 * soft nofile 65536
 ​
 * hard nofile 65536
 ​
 * soft nproc 131072
 ​
 * hard nproc 131072
 ​
 EOF

 

如何是rhel 6.x 请注意 /etc/security/limits.d/90-nproc.conf,详细情况请见文档

 

3.5 设备与IO-文件系统

设置XFS文件系统并挂载

EXT4是第四代扩展文件系统(英语:Fourth EXtended filesystem,缩写为ext4)是Linux系统下的日志文件系统,是ext3文件系统的后继版本。

Ext4的文件系统容量达到1EB,而文件容量则达到16TB,这是一个非常大的数字了。对一般的台式机和服务器而言,这可能并不重要,但对于大型磁盘阵列的用户而言,这就非常重要了。

XFS是一个64位文件系统,最大支持8EB减1字节的单个文件系统,实际部署时取决于宿主操作系统的最大块限制。对于一个32位Linux系统,文件和文件系统的大小会被限制在16TB。

 

二者各有特点,而性能表现基本上是差不多的。例如,谷歌公司就考虑将EXT2系统升级,最终确定为EXT4系统。谷歌公司表示,他们还考虑过XFS和JFS。结果显示,EXT4和XFS的表现类似,不过从EXT2升级到EXT4比升级到XFS容易。

 

例子:

cat >> /etc/fstab << EOF

/dev/sdb1 /greenplum xfs rw,nodev,noatime,inode64,allocsize=16m 0 0

EOF

rw,nodev,noatime,nobarrier,inode64

 

cat /etc/fstab

 

 

 

3.6 磁盘访问策略

Linux磁盘I/O调度器对磁盘的访问支持不同的策略,默认的为CFQ,GP建议设置为deadline

 

查看磁盘的I/O调度策略,看到默认的为[cfq]

 

The deadline scheduler option is recommended. To specify a scheduler until the next system reboot,

run the following:

# echo schedulername > /sys/block/devname/queue/scheduler

 

echo deadline > /sys/block/sda/queue/scheduler

 

linux 7

# grubby --update-kernel=ALL --args="elevator=deadline"

grubby --info=ALL

 

 

3.7 调整磁盘预读扇区数

 

fdisk -l

 

检查

/sbin/blockdev --getra /dev/sda

设置

/sbin/blockdev --setra 16384 /dev/sda

 

 

在参数文件 /etc/rc.d/rc.local 中增加

DELL : blockdev --setra 16384 /dev/sd* (红色部分为硬盘设备标识) HP:blockdev --setra 16384 /dev/cciss/c?d?*

 

 

3.8 禁用THP

On systems that use grub2 such as RHEL 7.x or CentOS 7.x, use the system utility grubby. This

command adds the parameter when run as root.

# grubby --update-kernel=ALL --args="transparent_hugepage=never"

After adding the parameter, reboot the system.

This cat command checks the state of THP. The output indicates that THP is disabled.

$ cat /sys/kernel/mm/*transparent_hugepage/enabled

always [never]

 

服务方式注册

# 创建 init.d 脚本

echo '#!/bin/sh

case $1 in

start)

if [ -d /sys/kernel/mm/transparent_hugepage ]; then

thp_path=/sys/kernel/mm/transparent_hugepage

elif [ -d /sys/kernel/mm/redhat_transparent_hugepage ]; then

thp_path=/sys/kernel/mm/redhat_transparent_hugepage

else

exit 0

fi

 

echo never > ${thp_path}/enabled

echo never > ${thp_path}/defrag

 

unset thp_path

;;

esac' > /etc/init.d/disable-transparent-hugepages

 

# 注册systemd文件

echo '[Unit]

Description=Disable Transparent Hugepages

After=multi-user.target

 

[Service]

ExecStart=/etc/init.d/disable-transparent-hugepages start

Type=simple

 

[Install]

WantedBy=multi-user.target' > /etc/systemd/system/disable-thp.service

 

# 磁盘预读扇区数

/sbin/blockdev --getra /dev/sdb1 # 查看大小

/sbin/blockdev --setra 65535 /dev/sdb1 # 设置大小

 

# 创建 init.d 脚本

echo '#!/bin/sh

device_name=/dev/sdb1

case $1 in

start)

if mount | grep "^${device_name}" > /dev/null;then

/sbin/blockdev --setra 65535 ${device_name}

else

exit 0

fi

 

unset device_name

;;

esac' > /etc/init.d/blockdev-setra-sdb

 

# 注册systemd文件

echo '[Unit]

Description=Blocdev --setra N

After=multi-user.target

 

[Service]

ExecStart=/etc/init.d/blockdev-setra-sdb start

Type=simple

 

[Install]

WantedBy=multi-user.target' > /etc/systemd/system/blockdev-setra-sdb.service

 

# 授权并设置开机启动

chmod 755 /etc/init.d/disable-transparent-hugepages

chmod 755 /etc/init.d/blockdev-setra-sdb

chmod 755 /etc/systemd/system/disable-thp.service

chmod 755 /etc/systemd/system/blockdev-setra-sdb.service

systemctl enable disable-thp blockdev-setra-sdb

 

 

 

3.9 Disable IPC object removal for RHEL 7 or CentOS 7

 

Set this parameter in /etc/systemd/logind.conf on the Greenplum

Database host systems.

RemoveIPC=no

The setting takes effect after restarting the systemd-login service or rebooting the system. To

restart the service, run this command as the root user.

service systemd-logind restart

 

cat /etc/systemd/logind.conf

 

3.10 时间同步

 

/etc/chrony.conf

 

systemctl status chronyd.service --查看状态

systemctl start chronyd.service --启动

systemctl enable chronyd.service --使其开机自启

systemctl status chronyd.service

server 10.1.3.1 prefer

 

查看时间同步源

chronyc sources -v

chronyc sourcestats -v

 

3.11 控制ssh连接数

/etc/ssh/sshd_config

MaxStartups 10:30:200

 

systemctl restart sshd.service

 

 

 

3.12 系统依赖包

yum -y install epel-release

yum -y install wget cmake3 git gcc gcc-c++ bison flex libedit-devel zlib zlib-devel perl-devel perl-ExtUtils-Embed python-devel libevent libevent-devel libxml2 libxml2-devel libcurl libcurl-devel bzip2 bzip2-devel net-tools libffi-devel openssl-devel

 

第 4章 安装软件

4.1 创建用户和组

master only

 # groupadd gpadmin
 ​
 # useradd gpadmin -g gpadmin
 ​
 # passwd gpadmin
 ​
 New password: 
 ​
 Retype new password: 
 ​
 ​
 ​
 echo gpadmin | passwd  gpadmin --stdin

 

 

 

4.2 root用户解压缩和安装

 

 ./greenplum-db-5.10.2-rhel6-x86_64.bin
 ​
 ​
 ​
 I HAVE READ AND AGREE TO THE TERMS OF THE ABOVE PIVOTAL SOFTWARE
 ​
 LICENSE AGREEMENT.
 ​
 ​
 ​
 ​
 ​
 ********************************************************************************
 ​
 Do you accept the Pivotal Database license agreement? [yes|no]
 ​
 ********************************************************************************
 ​
 ​
 ​
 yes
 ​
 ​
 ​
 ********************************************************************************
 ​
 Provide the installation path for Greenplum Database or press ENTER to
 ​
 accept the default installation path: /usr/local/greenplum-db-5.10.2
 ​
 ********************************************************************************
 ​
 ​
 ​
 ​
 ​
 ​
 ​
 ********************************************************************************
 ​
 Install Greenplum Database into /usr/local/greenplum-db-5.10.2? [yes|no]
 ​
 ********************************************************************************
 ​
 ​
 ​
 yes
 ​
 ​
 ​
 ********************************************************************************
 ​
 /usr/local/greenplum-db-5.10.2 does not exist.
 ​
 Create /usr/local/greenplum-db-5.10.2 ? [yes|no]
 ​
 (Selecting no will exit the installer)
 ​
 ********************************************************************************
 ​
 ​
 ​
 ​
 ​
 ​
 ​
 安装完成后授权
 ​
 ​
 ​
 ​
 ​
 \# chown -R gpadmin /usr/local/greenplum*(在创建gpadmin后执行)
 ​
 ​
 ​
 \# chgrp -R gpadmin /usr/local/greenplum*(在创建gpadmin后执行)

 

 

4.3 编辑环境变量

 

 cat >> .bashrc << EOF
 ​
 export MASTER_DATA_DIRECTORY=/greenplum/gpdata/master/gpseg-1
 ​
 source /usr/local/greenplum-db/greenplum_path.sh
 ​
 EOF
 ​
 source .bashrc
 ​
 ​
 ​
 ​
 ​
 cat >> /home/gpadmin/.bash_profile < 
  

 

 

 

4.4 进行文件配置

 

切换root

 

source /usr/local/greenplum-db/greenplum_path.sh

 

 

 

------只在mdw,smdw执行

mkdir /home/gpadmin/gpconfig

chown -R gpadmin:gpadmin /home/gpadmin/gpconfig

 

------只在mdw,smdw执行

cat >> /home/gpadmin/gpconfig/all_host <

mdw

sdw1

sdw2

sdw3

EOF

 

------只在mdw,smdw执行

cat >> /home/gpadmin/gpconfig/all_segment <

sdw1

sdw2

sdw3

EOF

 

chown -R gpadmin:gpadmin /home/gpadmin/gpconfig/all_host

chown -R gpadmin:gpadmin /home/gpadmin/gpconfig/all_segment

 

4.5 设置主机免密码登陆 -

 

source /usr/local/greenplum-db/greenplum_path.sh

 

/usr/local/greenplum-db/bin/gpssh-exkeys -f /home/gpadmin/gpconfig/all_host

 

4.6 确认检查主机连接状态

 

gpssh -f /home/gpadmin/gpconfig/all_host -e "ls -l"

 

 

4.7 批量创建其他节点的用户

 

gpssh -f /home/gpadmin/gpconfig/all_segment

 

groupadd gpadmin

useradd gpadmin -g gpadmin

passwd gpadmin

echo gpadmin | passwd gpadmin --stdin

 

 

4.8 gpadmin用户-互信

 

source /usr/local/greenplum-db/greenplum_path.sh

 

/usr/local/greenplum-db/bin/gpssh-exkeys -f /home/gpadmin/gpconfig/all_host

 

gpssh -f /home/gpadmin/gpconfig/all_host -e "ls -l"

 

4.9 检查时间同步

gpssh -f /home/gpadmin/gpconfig/all_host -e "date"

 

4.10 分发所有seg节点软件

 

root执行

 

source /usr/local/greenplum-db/greenplum_path.sh

 

gpseginstall -f /home/gpadmin/gpconfig/all_host -u gpadmin -p gpadmin

 

 

4.11 检查安装情况

 

o Log in as the gpadmin user and source

• source /usr/local/greenplum-db/greenplum_path.sh

o Use the gpssh utility to see if you can login to all hosts without a password prompt

 

4.12 创建相关目录(root用户)

 

mkdir -p /greenplum/gpdata/master

chown gpadmin:gpadmin /greenplum/gpdata/master

 

gpssh -f /home/gpadmin/gpconfig/all_segment -e 'mkdir -p /greenplum/gpdata/primary1'

gpssh -f /home/gpadmin/gpconfig/all_segment -e 'mkdir -p /greenplum/gpdata/primary2'

gpssh -f /home/gpadmin/gpconfig/all_segment -e 'chown -R gpadmin:gpadmin /greenplum/gpdata'

 

 

gpssh -f /home/gpadmin/gpconfig/all_segment -e 'mkdir -p /greenplum/gpdata/mirror1'

gpssh -f /home/gpadmin/gpconfig/all_segment -e 'mkdir -p /greenplum/gpdata/mirror2'

gpssh -f /home/gpadmin/gpconfig/all_segment -e 'chown -R gpadmin:gpadmin /greenplum/gpdata'

 

or 批量创建

gpssh -f /home/gpadmin/gpconfig/all_segment -e 'mkdir -p /greenplum/gpdata/primary{1..2}'

gpssh -f /home/gpadmin/gpconfig/all_segment -e 'chown -R gpadmin:gpadmin /greenplum/gpdata'

 

4.13 验证系统

 

检查系统参数和测试性能

检查命令:gpcheck -f host_file -m mdw -ssmdw

Validating Hardware Performance

o gpcheckperf can be used to identify hardware and system-level issues on the machines in your Greenplum

Database array.

o Network Performance (gpnetbench*)

• gpcheckperf -f hostfile_gpchecknet_ic1 -r N -d /tmp > subnet1.out

o Disk I/O Performance (dd test) & Memory Bandwidth (stream test)

• gpcheckperf -f hostfile_gpcheckperf -r ds -D -d /data/primary -d /data/mirror

 

 

4.14 验证OS配置

 

source /usr/local/greenplum-db/greenplum_path.sh

gpcheck -f /home/gpadmin/gpconfig/all_host -m mdw

 

 

验证硬件性能--这个需要确认(网络和IO)

 

  gpcheckperf -f /home/gpadmin/gpconfig/all_host -r N -d /tmp >  checknetwork.out
 ​
  [root@mdw greenplum-db]# cat checknetwork.out
 ​
 /usr/local/greenplum-db/./bin/gpcheckperf -f /home/gpadmin/gpconfig/all_host -r N -d /tmp
 ​
 ​
 ​
 \-------------------
 ​
 --  NETPERF TEST
 ​
 \-------------------
 ​
 ​
 ​
 ====================
 ​
 ==  RESULT
 ​
 ====================
 ​
 Netperf bisection bandwidth test
 ​
 mdw -> sdw1 = 112.340000
 ​
 sdw2 -> sdw3 = 112.340000
 ​
 sdw1 -> mdw = 112.330000
 ​
 sdw3 -> sdw2 = 112.330000
 ​
 ​
 ​
 Summary:
 ​
 sum = 449.34 MB/sec
 ​
 min = 112.33 MB/sec
 ​
 max = 112.34 MB/sec
 ​
 avg = 112.33 MB/sec
 ​
 median = 112.34 MB/sec
 ​
 ​
 ​
  gpcheckperf -f /home/gpadmin/gpconfig/all_host -r ds -D -d /greenplum/gpdata/primary1 -d /greenplum/gpdata/mirror1 >  checkDISKIO.out
 ​
  [root@mdw greenplum-db]# gpcheckperf -f /home/gpadmin/gpconfig/all_host -r ds -D -d /greenplum/gpdata/primary1 -d /greenplum/gpdata/mirror1
 ​
  /usr/local/greenplum-db/./bin/gpcheckperf -f /home/gpadmin/gpconfig/all_host -r ds -D -d /greenplum/gpdata/primary1 -d /greenplum/gpdata/mirror1
 ​
 ​
 ​
 \--------------------
 ​
 --  DISK WRITE TEST
 ​
 \--------------------
 ​
 ​
 ​
 \--------------------
 ​
 --  DISK READ TEST
 ​
 \--------------------
 ​
 ​
 ​
 \--------------------
 ​
 --  STREAM TEST
 ​
 \--------------------
 ​
 ​
 ​
 ====================
 ​
 ==  RESULT
 ​
 ====================
 ​
 ​
 ​
  disk write avg time (sec): 20.88
 ​
  disk write tot bytes: 132920115200
 ​
  disk write tot bandwidth (MB/s): 6074.65
 ​
  disk write min bandwidth (MB/s): 1476.04 [ mdw]
 ​
  disk write max bandwidth (MB/s): 1551.18 [sdw3]
 ​
  -- per host bandwidth --
 ​
   disk write bandwidth (MB/s): 1476.04 [ mdw]
 ​
   disk write bandwidth (MB/s): 1537.63 [sdw1]
 ​
   disk write bandwidth (MB/s): 1509.80 [sdw2]
 ​
   disk write bandwidth (MB/s): 1551.18 [sdw3]
 ​
 ​
 ​
 ​
 ​
  disk read avg time (sec): 59.80
 ​
  disk read tot bytes: 132920115200
 ​
  disk read tot bandwidth (MB/s): 2175.57
 ​
  disk read min bandwidth (MB/s): 454.54 [sdw2]
 ​
  disk read max bandwidth (MB/s): 700.04 [sdw1]
 ​
  -- per host bandwidth --
 ​
   disk read bandwidth (MB/s): 520.03 [ mdw]
 ​
   disk read bandwidth (MB/s): 700.04 [sdw1]
 ​
   disk read bandwidth (MB/s): 454.54 [sdw2]
 ​
   disk read bandwidth (MB/s): 500.96 [sdw3]
 ​
 ​
 ​
 ​
 ​
  stream tot bandwidth (MB/s): 49348.52
 ​
  stream min bandwidth (MB/s): 12297.76 [ mdw]
 ​
  stream max bandwidth (MB/s): 12388.57 [sdw2]
 ​
  -- per host bandwidth --
 ​
   stream bandwidth (MB/s): 12297.76 [ mdw]
 ​
   stream bandwidth (MB/s): 12321.47 [sdw1]
 ​
   stream bandwidth (MB/s): 12388.57 [sdw2]
 ​
   stream bandwidth (MB/s): 12340.73 [sdw3]

 

第 5 章 初始化database

  1. copy配置文件

 cp $GPHOME/docs/cli_help/gpconfigs/gpinitsystem_config /home/gpadmin/gpconfig/gpinitsystem_config

 

  1. 设置初始化系统文件列表

 cat >> /home/gpadmin/gpconfig/hostfile_gpinitsystem < 
  

 

  1. 调整参数

 ARRAY_NAME="EMC Greenplum DW"
 ​
 PORT_BASE=40000
 ​
 SEG_PREFIX=gpseg
 ​
 declare -a DATA_DIRECTORY=(/greenplum/gpdata/primary1 /greenplum/gpdata/primary2)
 ​
 MASTER_HOSTNAME=mdw
 ​
 MASTER_DIRECTORY=/greenplum/gpdata/master
 ​
 MASTER_PORT=5432
 ​
 TRUSTED_SHELL=ssh
 ​
 CHECK_POINT_SEGMENTS=8
 ​
 ENCODING=UNICODE
 ​
 MIRROR_PORT_BASE=50000
 ​
 REPLICATION_PORT_BASE=41000
 ​
 MIRROR_REPLICATION_PORT_BASE=51000
 ​
 declare -a MIRROR_DATA_DIRECTORY=(/greenplum/gpdata/mirror1 /greenplum/gpdata/mirror2)
 ​
 ​
 ​
 vim /home/gpadmin/gpconfig/gpinitsystem_config
 ​
 ​
 ​
 修改如下
 ​
 ​
 ​
 [gpadmin@mdw ~]$ cat  /home/gpadmin/gpconfig/gpinitsystem_config
 ​
 \# FILE NAME: gpinitsystem_config
 ​
 ​
 ​
 \# Configuration file needed by the gpinitsystem
 ​
 ​
 ​
 \################################################
 ​
 \#### REQUIRED PARAMETERS
 ​
 \################################################
 ​
 ​
 ​
 \#### Name of this Greenplum system enclosed in quotes.
 ​
 ARRAY_NAME="Greenplum Data Platform"
 ​
 ​
 ​
 \#### Naming convention for utility-generated data directories.
 ​
 SEG_PREFIX=gpseg
 ​
 ​
 ​
 \#### Base number by which primary segment port numbers
 ​
 \#### are calculated.
 ​
 PORT_BASE=40000
 ​
 ​
 ​
 \#### File system location(s) where primary segment data directories
 ​
 \#### will be created. The number of locations in the list dictate
 ​
 \#### the number of primary segments that will get created per
 ​
 \#### physical host (if multiple addresses for a host are listed in
 ​
 \#### the hostfile, the number of segments will be spread evenly across
 ​
 \#### the specified interface addresses).
 ​
 ​
 ​
 declare -a DATA_DIRECTORY=(/greenplum/gpdata/primary1 /greenplum/gpdata/primary2)
 ​
 \#### OS-configured hostname or IP address of the master host.
 ​
 MASTER_HOSTNAME=mdw
 ​
 ​
 ​
 \#### File system location where the master data directory
 ​
 \#### will be created.
 ​
 MASTER_DIRECTORY=/greenplum/gpdata/master
 ​
 ​
 ​
 \#### Port number for the master instance.
 ​
 MASTER_PORT=5432
 ​
 ​
 ​
 \#### Shell utility used to connect to remote hosts.
 ​
 TRUSTED_SHELL=ssh
 ​
 ​
 ​
 \#### Maximum log file segments between automatic WAL checkpoints.
 ​
 CHECK_POINT_SEGMENTS=8
 ​
 ​
 ​
 \#### Default server-side character set encoding.
 ​
 ENCODING=UNICODE
 ​
 ​
 ​
 \################################################
 ​
 \#### OPTIONAL MIRROR PARAMETERS
 ​
 \################################################
 ​
 ​
 ​
 \#### Base number by which mirror segment port numbers
 ​
 \#### are calculated.
 ​
 MIRROR_PORT_BASE=50000
 ​
 ​
 ​
 \#### Base number by which primary file replication port
 ​
 \#### numbers are calculated.
 ​
 REPLICATION_PORT_BASE=41000
 ​
 ​
 ​
 \#### Base number by which mirror file replication port
 ​
 \#### numbers are calculated.
 ​
 MIRROR_REPLICATION_PORT_BASE=51000
 ​
 ​
 ​
 \#### File system location(s) where mirror segment data directories
 ​
 \#### will be created. The number of mirror locations must equal the
 ​
 \#### number of primary locations as specified in the
 ​
 \#### DATA_DIRECTORY parameter.
 ​
 ​
 ​
 declare -a MIRROR_DATA_DIRECTORY=(/greenplum/gpdata/mirror1 /greenplum/gpdata/mirror2)
 ​
 ​
 ​
 \################################################
 ​
 \#### OTHER OPTIONAL PARAMETERS
 ​
 \################################################
 ​
 ​
 ​
 \#### Create a database of this name after initialization.
 ​
 \#DATABASE_NAME=name_of_database
 ​
 ​
 ​
 \#### Specify the location of the host address file here instead of
 ​
 \#### with the the -h option of gpinitsystem.
 ​
 \#MACHINE_LIST_FILE=/home/gpadmin/gpconfigs/hostfile_gpinitsystem

 

 

  1. 初始化database

 

 

gpadmin用户

 

 gpinitsystem -c /home/gpadmin/gpconfig/gpinitsystem_config -h /home/gpadmin/gpconfig/hostfile_gpinitsystem
 ​
 ​
 ​
 如何添加master standby和修改mirror分布策略spread mirror
 ​
 gpinitsystem -c gpconfigs/gpinitsystem_config -h gpconfigs/hostfile_gpinitsystem -s
 ​
 standby_master_hostname -S (with a standby master and a spread mirror configuration)

 

  1. 检查环境变量

 

 ​
 ​
 MASTER_DATA_DIRECTORY=/data/master/gpseg-1
 ​
 ​
 ​
 GPHOME=/usr/local/greenplum-db
 ​
 ​
 ​
 PGDATABASE=gpadmin
 ​
 ​
 ​
 [gpadmin@mdw ~]$ cat .bash_profile
 ​
 \# .bash_profile
 ​
 ​
 ​
 \# Get the aliases and functions
 ​
 if [ -f ~/.bashrc ]; then
 ​
 •    . ~/.bashrc
 ​
 fi
 ​
 ​
 ​
 \# User specific environment and startup programs
 ​
 ​
 ​
 PATH=$PATH:$HOME/.local/bin:$HOME/bin
 ​
 ​
 ​
 export PATH
 ​
 export MASTER_DATA_DIRECTORY=/greenplum/gpdata/master/gpseg-1
 ​
 source /usr/local/greenplum-db/greenplum_path.sh
 ​
 export PGPORT=5432
 ​
 export PGDATABASE=archdata
 ​
 ​

 

第 6 章 连接测试

 

 

6.1设置gpadmin远程访问密码

psql postgres gpadmin

alter user gpadmin encrypted password 'gpadmin';

\q

 

6.2查询测试

psql -hmdw -p 5432 -d postgres -U gpadmin -c 'select dfhostname, dfspace,dfdevice from gp_toolkit.gp_disk_free order by dfhostname;'

 

[gpadmin@mdw ~]$ psql -hmdw -p 5432 -d postgres -U gpadmin -c 'select dfhostname, dfspace,dfdevice from gp_toolkit.gp_disk_free order by dfhostname;'

dfhostname | dfspace | dfdevice

------------+----------+----------------------------

sdw1 | 98708120 | /dev/mapper/VolGroup-root

sdw1 | 98708120 | /dev/mapper/VolGroup-root

sdw2 | 98705600 | /dev/mapper/VolGroup-root

sdw2 | 98705600 | /dev/mapper/VolGroup-root

sdw3 | 98705144 | /dev/mapper/VolGroup-root

sdw3 | 98705144 | /dev/mapper/VolGroup-root

(6 rows)

 

psql -h hmdw -p 5432 -d postgres -U gpadmin -c '\l+'

[gpadmin@mdw ~]$ psql -h mdw -p 5432 -d postgres -U gpadmin -c '\l+'

List of databases

Name | Owner | Encoding | Access privileges | Size | Tablespace | Description

-----------+---------+----------+---------------------+-------+------------+---------------------------

postgres | gpadmin | UTF8 | | 73 MB | pg_default |

template0 | gpadmin | UTF8 | =c/gpadmin | 72 MB | pg_default |

: gpadmin=CTc/gpadmin

template1 | gpadmin | UTF8 | =c/gpadmin | 73 MB | pg_default | default template database

: gpadmin=CTc/gpadmin

(3 rows)

 

[gpadmin@mdw ~]$

 

第 7 章 常用命令

 

  1. 数据库启动:gpstart 常用可参数: -a : 直接启动,不提示终端用户输入确认 -m:只启动master 实例,主要在故障处理时使用

  1. 数据库停止:gpstop: 常用可参数:-a:直接停止,不提示终端用户输入确认 -m:只停止master 实例,与gpstart –m 对应使用 -M fast:停止数据库,中断所有数据库连接,回滚正在运 行的事务 -u:不停止数据库,只加载pg_hba.conf 和postgresql.conf中运行时参数,当改动参数配置时候使用。 评:-a用在shell里,最多用的还是-M fast。

  1. 查看实例配置和状态 select * from gp_configuration order by 1 ; 主要字段说明: Content:该字段相等的两个实例,是一对P(primary instance)和M(mirror Instance) Isprimary:实例是否作为primary instance 运行 Valid:实例是否有效,如处于false 状态,则说明该实例已经down 掉。 Port:实例运行的端口 Datadir:实例对应的数据目录

  1. gpstate :显示Greenplum数据库运行状态,详细配置等信息 常用可参数:-c:primary instance 和 mirror instance 的对应关系 -m:只列出mirror 实例的状态和配置信息 -f:显示standby master 的详细信息 -Q:显示状态综合信息 该命令默认列出数据库运行状态汇总信息,常用于日常巡检。 评:最开始由于网卡驱动的问题,做了mirror后,segment经常down掉,用-Q参数查询综合信息还是比较有用的。

  1. 查看用户会话和提交的查询等信息 select * from pg_stat_activity 该表能查看到当前数据库连接的IP 地址,用户名,提交的查询等。另外也可以在master 主机上查看进程,对每个客户端连接,master 都会创建一个进程。ps -ef |grep -i postgres |grep -i con 评:常用的命令,我经常用这个查看数据库死在那个sql上了。

  1. 查看数据库、表占用空间 select pg_size_pretty(pg_relation_size('schema.tablename')); select pg_size_pretty(pg_database_size('databasename')); 必须在数据库所对应的存储系统里,至少保留30%的自由空间,日常巡检,要检查存储空间的剩余容量。 评:可以查看任何数据库对象的占用空间,pg_size_pretty可以显示如mb之类的易读数据,另外,可以与pg_tables,pg_indexes之类的系统表链接,统计出各类关于数据库对象的空间信息。

  2. 收集统计信息,回收空间 定期使用Vacuum analyze tablename 回收垃圾和收集统计信息,尤其在大数据量删除,导入以后,非常重要 评:这个说的不全面,vacuum分两种,一种是analize,优化查询计划的,还有一种是清理垃圾数据,postres删除工作,并不是真正删除数据,而是在被删除的数据上,坐一个标记,只有执行vacuum时,才会真正的物理删除,这个非常重用,有些经常更新的表,各种查询、更新效率会越来越慢,这个多是因为没有做vacuum的原因。

  3. 查看数据分布情况 两种方式: l Select gp_segment_id,count(*) from tablename group by 1 ; l 在命令运行:gpskew -t public.ate -a postgres 如数据分布不均匀,将发挥不了并行计算的优势,严重影响性能。 评:非常用,gp要保障数据分布均匀。

  4. 实例恢复:gprecoverseg 通过gpstate 或gp_configuration 发现有实例down 掉以后,使用该命令进行回复。

  5. 查看锁信息: SELECT locktype, database, c.relname, l.relation, l.transactionid, l.transaction, l.pid, l.mode, l.granted, a.current_query FROM pg_locks l, pg_class c, pg_stat_activity a WHERE l.relation=c.oid AND l.pid=a.procpid ORDER BY c.relname; 主要字段说明: relname: 表名 locktype、mode 标识了锁的类型

  6. explain:在提交大的查询之前,使用explain分析执行计划、发现潜在优化机会,避免将系统资源熬尽。 评:少写了个analyze,如果只是explain,统计出来的执行时间,是非常坑爹的,如果希望获得准确的执行时间,必须加上analyze。

  7. 数据库备份 gp_dump 常用参数:-s: 只导出对象定义(表结构,函数等) -n: 只导出某个schema gp_dump 默认在master 的data 目录上产生这些文件: gp_catalog_1__ :关于数据库系统配置的备份文件 gp_cdatabase_1:数据库创建语句的备份文件 gp_dump_1:数据库对象ddl语句 gp_dump_status_1:备份操作的日志 在每个segment instance 上的data目录上产生的文件: gp_dump_0:用户数据备份文件 gp_dump_status_0:备份日志

  8. 数据库恢复 gp_restore 必参数:--gp-k=key :key 为gp_dump 导出来的文件的后缀时间戳 -d dbname :将备份文件恢复到dbname

  9. 登陆与退出Greenplum #正常登陆 psql gpdb psql -d gpdb -h gphostm -p 5432 -U gpadmin #使用utility方式 PGOPTIONS="-c gp_session_role=utility" psql -h -d dbname hostname -p port #退出 在psql命令行执行\q

  10. 参数查询 psql -c 'SHOW ALL;' -d gpdb gpconfig --show max_connections 评:这个用,可以管道给grep。

  11. 创建数据库 createdb -h localhost -p 5432 dhdw

  12. 创建GP文件系统 # 文件系统名 gpfsdw # 子节点,视segment数创建目录 mkdir -p /gpfsdw/seg1 mkdir -p /gpfsdw/seg2 chown -R gpadmin:gpadmin /gpfsdw # 主节点 mkdir -p /gpfsdw/master chown -R gpadmin:gpadmin /gpfsdw gpfilespace -o gpfilespace_config gpfilespace -c gpfilespace_config

  13. 创建GP表空间 psql gpdb create tablespace TBS_DW_DATA filespace gpfsdw; SET default_tablespace = TBS_DW_DATA;

  14. 删除GP数据库 gpdeletesystem -d /gpmaster/gpseg-1 -f

  15. 查看segment配置 select * from gp_segment_configuration;

  16. 文件系统 select * from pg_filespace_entry;

  17. 磁盘、数据库空间 SELECT * FROM gp_toolkit.gp_disk_free ORDER BY dfsegment; SELECT * FROM gp_toolkit.gp_size_of_database ORDER BY sodddatname; 日志 SELECT * FROM gp_toolkit.gp_log_master_ext; SELECT * FROM gp_toolkit.gp_log_segment_ext;

  18. 表数据分布 SELECT gp_segment_id, count(*) FROM GROUP BY gp_segment_id;

  19. 表占用空间 SELECT relname as name, sotdsize/1024/1024 as size_MB, sotdtoastsize as toast, sotdadditionalsize as other FROM gp_toolkit.gp_size_of_table_disk as sotd, pg_class WHERE sotd.sotdoid = pg_class.oid ORDER BY relname;

  20. 索引占用空间 SELECT soisize/1024/1024 as size_MB, relname as indexname FROM pg_class, gp_toolkit.gp_size_of_index WHERE pg_class.oid = gp_size_of_index.soioid AND pg_class.relkind='i';

  21. OBJECT的操作统计 SELECT schemaname as schema, objname as table, usename as role, actionname as action, subtype as type, statime as time FROM pg_stat_operations WHERE objname = '';

  22. 锁 SELECT locktype, database, c.relname, l.relation, l.transactionid, l.transaction, l.pid, l.mode, l.granted, a.current_query FROM pg_locks l, pg_class c, pg_stat_activity a WHERE l.relation=c.oid AND l.pid=a.procpid ORDER BY c.relname;

  23. 队列 SELECT * FROM pg_resqueue_status;

  24. gpfdist外部表 # 启动服务 gpfdist -d /share/txt -p 8081 –l /share/txt/gpfdist.log & # 创建外部表,分隔符为’/t’ drop EXTERNAL TABLE TD_APP_LOG_BUYER; CREATE EXTERNAL TABLE TD_APP_LOG_BUYER ( IP text, ACCESSTIME text, REQMETHOD text, URL text, STATUSCODE int, REF text, name text, VID text) LOCATION ('gpfdist://gphostm:8081/xxx.txt') FORMAT 'TEXT' (DELIMITER E'/t' FILL MISSING FIELDS) SEGMENT REJECT LIMIT 1 percent;

  25. 创建普通表 create table test select * from TD_APP_LOG_BUYER; # 索引 # CREATE INDEX idx_test ON test USING bitmap (ip); # 查询数据 select ip , count() from test group by ip order by count(); gpload # 创建控制文件 # 加载数据 gpload -f my_load.yml copy COPY country FROM '/data/gpdb/country_data' WITH DELIMITER '|' LOG ERRORS INTO err_country SEGMENT REJECT LIMIT 10 ROWS;

  26. gpfdist外部表 创建可写外部表 CREATE WRITABLE EXTERNAL TABLE unload_expenses ( LIKE expenses ) LOCATION ('gpfdist://etlhost-1:8081/expenses1.out', 'gpfdist://etlhost-2:8081/expenses2.out') FORMAT 'TEXT' (DELIMITER ',') DISTRIBUTED BY (exp_id); # 写权限 GRANT INSERT ON writable_ext_table TO ; # 写数据 INSERT INTO writable_ext_table SELECT * FROM regular_table; copy COPY (SELECT * FROM country WHERE country_name LIKE 'A%') TO '/home/gpadmin/a_list_countries.out'; 执行sql文件 psql gpdbname –f yoursqlfile.sql 或者psql登陆后执行 \i yoursqlfile.sql

你可能感兴趣的:(Greenplum,#,安装部署)