连接Oracle与Hadoop(1) 使用OLH加载HDFS文件到Oracle

OLHOracle Loader for Hadoop的缩写Oracle的大数据连接器(BDC)的一个组件,可将多种数据格式从HDFS上加载到Oracle数据库库中。

 

连接Oracle与Hadoop(1) 使用OLH加载HDFS文件到Oracle_第1张图片

本文在同一台服务器上模拟oracle数据库与hadoop集群,实验目标:使用OLH从Hadoop端的HDFS加载数据到Oracle表中。

 

Oracle端:

服务器

系统用户

安装软件

软件安装路径

Server1

oracle

Oracle Database 12.1.0.2

/u01/app/oracle/product/12.1.0/dbhome_1

 

Hadoop集群端:

服务器

系统用户

安装软件

软件安装路径

Server1

hadoop

Hadoop 2.6.2

/home/hadoop/hadoop-2.6.2

Hive 1.1.1

/home/hadoop/hive-1.1.1

Hbase 1.1.2

/home/hadoop/hbase-1.1.2

jdk1.8.0_65

/home/hadoop/jdk1.8.0_65

OLH 3.5.0

/home/hadoop/oraloader-3.5.0-h2

 

  • 部署Hadoop/Hive/Hbase/OLH软件

将Hadoop/Hive/Hbase/OLH软件解压到相应目录

  1. [hadoop@server1 ~]$ tree -L 1
  2. ├── hadoop-2.6.2
  3. ├── hbase-1.1.2
  4. ├── hive-1.1.1
  5. ├── jdk1.8.0_65
  6. ├── oraloader-3.5.0-h2

 

  • 配置Hadoop/Hive/Hbase/OLH环境变量
  1. export JAVA_HOME=/home/hadoop/jdk1.8.0_65
  2.  
  3. export HADOOP_USER_NAME=hadoop
  4. export HADOOP_YARN_USER=hadoop
  5. export HADOOP_HOME=/home/hadoop/hadoop-2.6.2
  6. export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
  7. export HADOOP_LOG_DIR=${HADOOP_HOME}/logs
  8. export HADOOP_LIBEXEC_DIR=${HADOOP_HOME}/libexec
  9. export HADOOP_COMMON_HOME=${HADOOP_HOME}
  10. export HADOOP_HDFS_HOME=${HADOOP_HOME}
  11. export HADOOP_MAPRED_HOME=${HADOOP_HOME}
  12. export HADOOP_YARN_HOME=${HADOOP_HOME}
  13. export HDFS_CONF_DIR=${HADOOP_HOME}/etc/hadoop
  14. export YARN_CONF_DIR=${HADOOP_HOME}/etc/hadoop
  15.  
  16. export HIVE_HOME=/home/hadoop/hive-1.1.1
  17. export HIVE_CONF_DIR=${HIVE_HOME}/conf
  18.  
  19. export HBASE_HOME=/home/hadoop/hbase-1.1.2
  20. export HBASE_CONF_DIR=/home/hadoop/hbase-1.1.2/conf
  21.  
  22. export OLH_HOME=/home/hadoop/oraloader-3.5.0-h2
  23.  
  24. export HADOOP_CLASSPATH=/usr/share/java/mysql-connector-java.jar
  25. export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HIVE_HOME/lib/*:$HIVE_CONF_DIR:$HBASE_HOME/lib/*:$HBASE_CONF_DIR
  26. export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$OLH_HOME/jlib/*
  27.  
  28. export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$HBASE_HOME/bin:$PATH

 

  • 配置Hadoop/Hive/Hbase软件

core-site.xml

  1. <configuration>
  2. <property>
  3.         <name>fs.defaultFS</name>
  4.         <value>hdfs://server1:8020</value>
  5. </property>
  6. <property>
  7.         <name>fs.checkpoint.period</name>
  8.         <value>3600</value>
  9. </property>
  10. <property>
  11.         <name>fs.checkpoint.size</name>
  12.         <value>67108864</value>
  13. </property>
  14. <property>
  15.         <name>hadoop.proxyuser.hadoop.groups</name>
  16.         <value>*</value>
  17. </property>
  18. <property>
  19.         <name>hadoop.proxyuser.hadoop.hosts</name>
  20.         <value>*</value>
  21. </property>
  22. </configuration>

 

hdfs-site.xml

  1. <configuration>
  2. <property>
  3.         <name>hadoop.tmp.dir</name>
  4.         <value>file:///home/hadoop</value>
  5. </property>
  6. <property>
  7.         <name>dfs.namenode.name.dir</name>
  8.         <value>file:///home/hadoop/dfs/nn</value>
  9. </property>
  10. <property>
  11.         <name>dfs.datanode.data.dir</name>
  12.         <value>file:///home/hadoop/dfs/dn</value>
  13. </property>
  14. <property>
  15.         <name>dfs.namenode.checkpoint.dir</name>
  16.         <value>file:///home/hadoop/dfs/sn</value>
  17. </property>
  18. <property>
  19.         <name>dfs.replication</name>
  20.         <value>1</value>
  21. </property>
  22. <property>
  23.         <name>dfs.permissions.superusergroup</name>
  24.         <value>supergroup</value>
  25. </property>
  26. <property>
  27.         <name>dfs.namenode.http-address</name>
  28.         <value>server1:50070</value>
  29. </property>
  30. <property>
  31.         <name>dfs.namenode.secondary.http-address</name>
  32.         <value>server1:50090</value>
  33. </property>
  34. <property>
  35.         <name>dfs.webhdfs.enabled</name>
  36.         <value>true</value>
  37. </property>
  38. </configuration>

 

yarn-site.xml

  1. <configuration>
  2. <property>
  3.         <name>yarn.resourcemanager.scheduler.address</name>
  4.         <value>server1:8030</value>
  5. </property>
  6. <property>
  7.         <name>yarn.resourcemanager.resource-tracker.address</name>
  8.         <value>server1:8031</value>
  9. </property>
  10. <property>
  11.         <name>yarn.resourcemanager.address</name>
  12.         <value>server1:8032</value>
  13. </property>
  14. <property>
  15.         <name>yarn.resourcemanager.admin.address</name>
  16.         <value>server1:8033</value>
  17. </property>
  18. <property>
  19.         <name>yarn.resourcemanager.webapp.address</name>
  20.         <value>server1:8088</value>
  21. </property>
  22. <property>
  23.         <name>yarn.nodemanager.local-dirs</name>
  24.         <value>file:///home/hadoop/yarn/local</value>
  25. </property>
  26. <property>
  27.         <name>yarn.nodemanager.log-dirs</name>
  28.         <value>file:///home/hadoop/yarn/logs</value>
  29. </property>
  30. <property>
  31.         <name>yarn.log-aggregation-enable</name>
  32.         <value>true</value>
  33. </property>
  34. <property>
  35.         <name>yarn.nodemanager.remote-app-log-dir</name>
  36.         <value>/yarn/apps</value>
  37. </property>
  38. <property>
  39.         <name>yarn.app.mapreduce.am.staging-dir</name>
  40.         <value>/user</value>
  41. </property>
  42. <property>
  43.         <name>yarn.nodemanager.aux-services</name>
  44.         <value>mapreduce_shuffle</value>
  45. </property>
  46. <property>
  47.         <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
  48.         <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  49. </property>
  50. <property>
  51.         <name>yarn.log.server.url</name>
  52.         <value>http://server1:19888/jobhistory/logs/</value>
  53. </property>
  54. </configuration>

 

  • 设置OraLoader配置文件
  1. <?xml version="1.0" encoding="UTF-8" ?>
  2. <configuration>
  3. <!-- Input settings -->
  4.   <property>
  5.    <name>mapreduce.inputformat.class</name>
  6.    <value>oracle.hadoop.loader.lib.input.DelimitedTextInputFormat</value>
  7.  </property>
  8.  <property>
  9.    <name>mapred.input.dir</name>
  10.    <value>/catalog</value>
  11.  </property>
  12.  <property>
  13.    <name>oracle.hadoop.loader.input.fieldTerminator</name>
  14.    <value>\u002C</value>
  15.  </property>
  16.  <property>
  17.     <name>oracle.hadoop.loader.input.fieldNames</name>
  18.        <value>CATALOGID,JOURNAL,PUBLISHER,EDITION,TITLE,AUTHOR</value>
  19.         </property>
  20.  
  21. <!-- Output settings -->
  22.  <property>
  23.    <name>mapreduce.job.outputformat.class</name>
  24.    <value>oracle.hadoop.loader.lib.output.JDBCOutputFormat</value>
  25.  </property>
  26.  <property>
  27.    <name>mapreduce.output.fileoutputformat.outputdir</name>
  28.    <value>oraloadout</value>
  29.  </property>
  30.  
  31. <!-- Table information -->
  32.  <property>
  33.    <name>oracle.hadoop.loader.loaderMap.targetTable</name>
  34.    <value>catalog</value>
  35.  </property>
  36.  
  37. <!-- Connection information -->
  38. <property>
  39.   <name>oracle.hadoop.loader.connection.url</name>
  40.   <value>jdbc:oracle:thin:@${HOST}:${TCPPORT}/${SERVICE_NAME}</value>
  41. </property>
  42. <property>
  43.   <name>TCPPORT</name>
  44.   <value>1521</value>
  45. </property>
  46. <property>
  47.   <name>HOST</name>
  48.   <value>192.168.56.101</value>
  49. </property>
  50. <property>
  51.  <name>SERVICE_NAME</name>
  52.  <value>orcl</value>
  53. </property>
  54. <property>
  55.   <name>oracle.hadoop.loader.connection.user</name>
  56.   <value>baron</value>
  57. </property>
  58. <property>
  59.   <name>oracle.hadoop.loader.connection.password</name>
  60.   <value>baron</value>
  61. </property>
  • 生成测试数据
  1. $cat catalog.txt
  2. 1,Oracle Magazine,Oracle Publishing,Nov-Dec 2004,Database Resource Manager,Kimberly Floss
  3. 2,Oracle Magazine,Oracle Publishing,Nov-Dec 2004,From ADF UIX to JSF,Jonas Jacobi
  4. 3,Oracle Magazine,Oracle Publishing,March-April 2005,Starting with Oracle ADF,Steve Muench
  5.  
  6. $ hdfs dfs -mkdir /catalog
  7. $ hdfs dfs -put catalog.txt /catalog/catalog.txt

 

  • 使用OraLoader加载HDFS文件到Oracle数据库
  1. hadoop jar $OLH_HOME/jlib/oraloader.jar oracle.hadoop.loader.OraLoader -conf OraLoadJobConf.xml -libjars $OLH_HOME/jlib/oraloader.jar

 

输出结果如下:

  1. Oracle Loader for Hadoop Release 3.4.0 - Production
  2. Copyright (c) 2011, 2015, Oracle and/or its affiliates. All rights reserved.
  3. 15/12/07 08:35:52 INFO loader.OraLoader: Oracle Loader for Hadoop Release 3.4.0 - Production
  4. Copyright (c) 2011, 2015, Oracle and/or its affiliates. All rights reserved.
  5. 15/12/07 08:35:52 INFO loader.OraLoader: Built-Against: hadoop-2.2.0 hive-0.13.0 avro-1.7.3 jackson-1.8.8
  6. 15/12/07 08:35:52 INFO Configuration.deprecation: mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class
  7. 15/12/07 08:35:52 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
  8. 15/12/07 08:36:27 INFO Configuration.deprecation: mapred.submit.replication is deprecated. Instead, use mapreduce.client.submit.file.replication
  9. 15/12/07 08:36:29 INFO loader.OraLoader: oracle.hadoop.loader.loadByPartition is disabled because table: CATALOG is not partitioned
  10. 15/12/07 08:36:29 INFO output.DBOutputFormat: Setting reduce tasks speculative execution to false for : oracle.hadoop.loader.lib.output.JDBCOutputFormat
  11. 15/12/07 08:36:29 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
  12. 15/12/07 08:36:32 WARN loader.OraLoader: Sampler is disabled because the number of reduce tasks is less than two. Job will continue without sampled information.
  13. 15/12/07 08:36:32 INFO loader.OraLoader: Submitting OraLoader job OraLoader
  14. 15/12/07 08:36:32 INFO client.RMProxy: Connecting to ResourceManager at server1/192.168.56.101:8032
  15. 15/12/07 08:36:34 INFO input.FileInputFormat: Total input paths to process : 1
  16. 15/12/07 08:36:34 INFO mapreduce.JobSubmitter: number of splits:1
  17. 15/12/07 08:36:35 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1449494864827_0001
  18. 15/12/07 08:36:36 INFO impl.YarnClientImpl: Submitted application application_1449494864827_0001
  19. 15/12/07 08:36:37 INFO mapreduce.Job: The url to track the job: http://server1:8088/proxy/application_1449494864827_0001/
  20. 15/12/07 08:37:05 INFO loader.OraLoader: map 0% reduce 0%
  21. 15/12/07 08:37:22 INFO loader.OraLoader: map 100% reduce 0%
  22. 15/12/07 08:37:36 INFO loader.OraLoader: map 100% reduce 67%
  23. 15/12/07 08:38:05 INFO loader.OraLoader: map 100% reduce 100%
  24. 15/12/07 08:38:06 INFO loader.OraLoader: Job complete: OraLoader (job_1449494864827_0001)
  25. 15/12/07 08:38:06 INFO loader.OraLoader: Counters: 49
  26.         File System Counters
  27.                 FILE: Number of bytes read=395
  28.                 FILE: Number of bytes written=244157
  29.                 FILE: Number of read operations=0
  30.                 FILE: Number of large read operations=0
  31.                 FILE: Number of write operations=0
  32.                 HDFS: Number of bytes read=367
  33.                 HDFS: Number of bytes written=1861
  34.                 HDFS: Number of read operations=7
  35.                 HDFS: Number of large read operations=0
  36.                 HDFS: Number of write operations=5
  37.         Job Counters
  38.                 Launched map tasks=1
  39.                 Launched reduce tasks=1
  40.                 Data-local map tasks=1
  41.                 Total time spent by all maps in occupied slots (ms)=12516
  42.                 Total time spent by all reduces in occupied slots (ms)=40696
  43.                 Total time spent by all map tasks (ms)=12516
  44.                 Total time spent by all reduce tasks (ms)=40696
  45.                 Total vcore-seconds taken by all map tasks=12516
  46.                 Total vcore-seconds taken by all reduce tasks=40696
  47.                 Total megabyte-seconds taken by all map tasks=12816384
  48.                 Total megabyte-seconds taken by all reduce tasks=41672704
  49.         Map-Reduce Framework
  50.                 Map input records=3
  51.                 Map output records=3
  52.                 Map output bytes=383
  53.                 Map output materialized bytes=395
  54.                 Input split bytes=104
  55.                 Combine input records=0
  56.                 Combine output records=0
  57.                 Reduce input groups=1
  58.                 Reduce shuffle bytes=395
  59.                 Reduce input records=3
  60.                 Reduce output records=3
  61.                 Spilled Records=6
  62.                 Shuffled Maps =1
  63.                 Failed Shuffles=0
  64.                 Merged Map outputs=1
  65.                 GC time elapsed (ms)=556
  66.                 CPU time spent (ms)=9450
  67.                 Physical memory (bytes) snapshot=444141568
  68.                 Virtual memory (bytes) snapshot=4221542400
  69.                 Total committed heap usage (bytes)=331350016
  70.         Shuffle Errors
  71.                 BAD_ID=0
  72.                 CONNECTION=0
  73.                 IO_ERROR=0
  74.                 WRONG_LENGTH=0
  75.                 WRONG_MAP=0
  76.                 WRONG_REDUCE=0
  77.         File Input Format Counters
  78.                 Bytes Read=263
  79.         File Output Format Counters
  80.                 Bytes Written=1620

 

 

  • 在Oracle中验证加载结果
  1. select * fromcatalog;

 

CATALOGID JOURNAL PUBLISHER EDITION TITLE AUTHOR

---------- ------------------------- ------------------------- ------------------------- ------------------------------ ---------------------

1 Oracle Magazine Oracle Publishing Nov-Dec 2004 Database Resource Manager Kimberly Floss

2 Oracle Magazine Oracle Publishing Nov-Dec 2004 From ADF UIX to JSF Jonas Jacobi

3 Oracle Magazine Oracle Publishing March-April 2005 Starting with Oracle ADF Steve Muenc

你可能感兴趣的:(连接Oracle与Hadoop(1) 使用OLH加载HDFS文件到Oracle)