Greenplum本地支持并行地将HDFS上的数据加载到数据库中,采用的方式就是用gphdfs协议,本文简要介绍部署和测试细节。
1.master和segment安装java
1.1 删除已经安装的java组件
yum -y remove java
或者:
rpm -qa | grep java查找安装的包
rpm -e --nodeps java-1.4.2-gcj-compat-1.4.2.0-40jpp.115 分步骤删除
1.2 安装java,这里安装java6
cd /usr
chmod 777 ./jdk-6u45-linux-x64-rpm.bin
./jdk-6u45-linux-x64-rpm.bin
1.3 配置环境变量
vi /etc/profile
export JAVA_HOME=/usr/java/jdk1.6.0_45
export PATH=$PATH:$JAVA_HOME/bin
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export JAVA_HOME JAVA_BIN PATH CLASSPATH
PATH=$PATH:$HOME/bin
export PATH
1.4 验证
source /etc/profile
java -version查看是否是java
java version "1.6.0_45"
Java(TM) SE Runtime Environment (build 1.6.0_45-b06)
Java HotSpot(TM) 64-Bit Server VM (build 20.45-b01, mixed mode)
2.hadoop相关设置
2.1 将hadoop机器上的hadoop目录下的文件直接copy到GPDB的master和segment
2.2 确保hadoop/etc/hadoop/hadoop-env.sh中的java_home配置和gpdb的java配置一致,如果不一致,修改hadoop-env.sh
2.3 修改/etc/hosts将hadoop的namenode和datanode的机器加入
3.环境参数设置
3.1 master和segment都需要增加
su - gpadmin
vi .bashrc
export JAVA_HOME=/usr/java/jdk1.6.0_45
export CLASSPATH=$JAVA_HOME/lib/tools.jar
export HADOOP_HOME=/home/hadoop/hadoop
export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
3.2 master修改配置参数
gpconfig -c gp_hadoop_target_version -v "'hadoop2'"
gpconfig -c gp_hadoop_home -v "'/home/hadoop/hadoop'"
gpstop -M fast -ra重启后检查配置参数
gpconfig -s gp_hadoop_target_version
gpconfig -s gp_hadoop_home
4.基本验证
su - gpadmin
hdfs dfs -ls /
5.设置权限
psql dbname
#写权限
GRANT INSERT ON PROTOCOL gphdfs TO gpadmin;
#读权限
GRANT SELECT ON PROTOCOL gphdfs TO gpadmin;
#所有权限
GRANT ALL ON PROTOCOL gphdfs TO gpadmin;
6.外部表访问
reate external table external_test
(
id int,
name text
)
LOCATION ('gphdfs://c9test91:9000/test.txt')
FORMAT 'TEXT' (delimiter '\t');
select * from external_test;
注意:这里的端口号是指的namenode的IPC: ClientProtocol协议端口号,是参数fs.default.name指定的,默认9000
如果端口号指定错误,会出现如下的报错信息
ERROR: external table gphdfs protocol command ended with error. 17/03/28 12:15:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable (seg0 slice1 viltual2:40000 pid=27368)
详细:
17/03/28 12:15:12 WARN conf.Configuration: hdfs-site.xml:an attempt to override final parameter: dfs.namenode.name.dir; Ignoring.
17/03/28 12:15:12 WARN conf.Configuration: hdfs-site.xml:an attempt to override final parameter: dfs.datanode.data.dir; Ignoring.
17/03/28 12:15:12 WARN conf.Configuration: hdfs-site.xml:an attempt to override final paramete
Command: 'gphdfs://c9test91:8020/test.txt'
External table external_test, file gphdfs://c9test91:8020/test.txt