连接Oracle与Hadoop(2) 使用OLH加载Hive表到Oracle

OLHOracle Loader for Hadoop的缩写Oracle的大数据连接器(BDC)的一个组件,可将多种数据格式从HDFS上加载到Oracle数据库库中。

连接Oracle与Hadoop(2) 使用OLH加载Hive表到Oracle_第1张图片

本文在同一台服务器上模拟oracle数据库与hadoop集群,实验目标:使用OLH从Hadoop端的Hive表加载数据到Oracle表中。

 

Oracle端:

服务器

系统用户

安装软件

软件安装路径

Server1

oracle

Oracle Database 12.1.0.2

/u01/app/oracle/product/12.1.0/dbhome_1

 

Hadoop集群端:

服务器

系统用户

安装软件

软件安装路径

Server1

hadoop

Hadoop 2.6.2

/home/hadoop/hadoop-2.6.2

Hive 1.1.1

/home/hadoop/hive-1.1.1

Hbase 1.1.2

/home/hadoop/hbase-1.1.2

jdk1.8.0_65

/home/hadoop/jdk1.8.0_65

OLH 3.5.0

/home/hadoop/oraloader-3.5.0-h2

 

  • 部署Hadoop/Hive/Hbase/OLH软件

将Hadoop/Hive/Hbase/OLH软件解压到相应目录

  1. [hadoop@server1 ~]$ tree -L 1
  2. ├── hadoop-2.6.2
  3. ├── hbase-1.1.2
  4. ├── hive-1.1.1
  5. ├── jdk1.8.0_65
  6. ├── oraloader-3.5.0-h2

 

  • 配置Hadoop/Hive/Hbase/OLH环境变量
  1. export JAVA_HOME=/home/hadoop/jdk1.8.0_65
  2.  
  3. export HADOOP_USER_NAME=hadoop
  4. export HADOOP_YARN_USER=hadoop
  5. export HADOOP_HOME=/home/hadoop/hadoop-2.6.2
  6. export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
  7. export HADOOP_LOG_DIR=${HADOOP_HOME}/logs
  8. export HADOOP_LIBEXEC_DIR=${HADOOP_HOME}/libexec
  9. export HADOOP_COMMON_HOME=${HADOOP_HOME}
  10. export HADOOP_HDFS_HOME=${HADOOP_HOME}
  11. export HADOOP_MAPRED_HOME=${HADOOP_HOME}
  12. export HADOOP_YARN_HOME=${HADOOP_HOME}
  13. export HDFS_CONF_DIR=${HADOOP_HOME}/etc/hadoop
  14. export YARN_CONF_DIR=${HADOOP_HOME}/etc/hadoop
  15.  
  16. export HIVE_HOME=/home/hadoop/hive-1.1.1
  17. export HIVE_CONF_DIR=${HIVE_HOME}/conf
  18.  
  19. export HBASE_HOME=/home/hadoop/hbase-1.1.2
  20. export HBASE_CONF_DIR=/home/hadoop/hbase-1.1.2/conf
  21.  
  22. export OLH_HOME=/home/hadoop/oraloader-3.5.0-h2
  23.  
  24. export HADOOP_CLASSPATH=/usr/share/java/mysql-connector-java.jar
  25. export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HIVE_HOME/lib/*:$HIVE_CONF_DIR:$HBASE_HOME/lib/*:$HBASE_CONF_DIR
  26. export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$OLH_HOME/jlib/*
  27.  
  28. export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$HBASE_HOME/bin:$PATH

 

  • 配置Hadoop/Hive/Hbase软件

core-site.xml

  1. <configuration>
  2. <property>
  3.         <name>fs.defaultFS</name>
  4.         <value>hdfs://server1:8020</value>
  5. </property>
  6. <property>
  7.         <name>fs.checkpoint.period</name>
  8.         <value>3600</value>
  9. </property>
  10. <property>
  11.         <name>fs.checkpoint.size</name>
  12.         <value>67108864</value>
  13. </property>
  14. <property>
  15.         <name>hadoop.proxyuser.hadoop.groups</name>
  16.         <value>*</value>
  17. </property>
  18. <property>
  19.         <name>hadoop.proxyuser.hadoop.hosts</name>
  20.         <value>*</value>
  21. </property>
  22. </configuration>

 

hdfs-site.xml

  1. <configuration>
  2. <property>
  3.         <name>hadoop.tmp.dir</name>
  4.         <value>file:///home/hadoop</value>
  5. </property>
  6. <property>
  7.         <name>dfs.namenode.name.dir</name>
  8.         <value>file:///home/hadoop/dfs/nn</value>
  9. </property>
  10. <property>
  11.         <name>dfs.datanode.data.dir</name>
  12.         <value>file:///home/hadoop/dfs/dn</value>
  13. </property>
  14. <property>
  15.         <name>dfs.namenode.checkpoint.dir</name>
  16.         <value>file:///home/hadoop/dfs/sn</value>
  17. </property>
  18. <property>
  19.         <name>dfs.replication</name>
  20.         <value>1</value>
  21. </property>
  22. <property>
  23.         <name>dfs.permissions.superusergroup</name>
  24.         <value>supergroup</value>
  25. </property>
  26. <property>
  27.         <name>dfs.namenode.http-address</name>
  28.         <value>server1:50070</value>
  29. </property>
  30. <property>
  31.         <name>dfs.namenode.secondary.http-address</name>
  32.         <value>server1:50090</value>
  33. </property>
  34. <property>
  35.         <name>dfs.webhdfs.enabled</name>
  36.         <value>true</value>
  37. </property>
  38. </configuration>

 

yarn-site.xml

  1. <configuration>
  2. <property>
  3.         <name>yarn.resourcemanager.scheduler.address</name>
  4.         <value>server1:8030</value>
  5. </property>
  6. <property>
  7.         <name>yarn.resourcemanager.resource-tracker.address</name>
  8.         <value>server1:8031</value>
  9. </property>
  10. <property>
  11.         <name>yarn.resourcemanager.address</name>
  12.         <value>server1:8032</value>
  13. </property>
  14. <property>
  15.         <name>yarn.resourcemanager.admin.address</name>
  16.         <value>server1:8033</value>
  17. </property>
  18. <property>
  19.         <name>yarn.resourcemanager.webapp.address</name>
  20.         <value>server1:8088</value>
  21. </property>
  22. <property>
  23.         <name>yarn.nodemanager.local-dirs</name>
  24.         <value>file:///home/hadoop/yarn/local</value>
  25. </property>
  26. <property>
  27.         <name>yarn.nodemanager.log-dirs</name>
  28.         <value>file:///home/hadoop/yarn/logs</value>
  29. </property>
  30. <property>
  31.         <name>yarn.log-aggregation-enable</name>
  32.         <value>true</value>
  33. </property>
  34. <property>
  35.         <name>yarn.nodemanager.remote-app-log-dir</name>
  36.         <value>/yarn/apps</value>
  37. </property>
  38. <property>
  39.         <name>yarn.app.mapreduce.am.staging-dir</name>
  40.         <value>/user</value>
  41. </property>
  42. <property>
  43.         <name>yarn.nodemanager.aux-services</name>
  44.         <value>mapreduce_shuffle</value>
  45. </property>
  46. <property>
  47.         <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
  48.         <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  49. </property>
  50. <property>
  51.         <name>yarn.log.server.url</name>
  52.         <value>http://server1:19888/jobhistory/logs/</value>
  53. </property>
  54. </configuration>

 

  • 配置hive metastore

我们采用MySQL作为hivemetastore,创建MySQL数据库

  1. mysql> create database metastore DEFAULT CHARACTER SET latin1;
  2. Query OK, 1 row affected (0.00 sec)
  3.  
  4. mysql> grant all on metastore.* TO 'hive'@'server1' IDENTIFIED BY '123456';
  5. Query OK, 0 rows affected (0.00 sec)
  6.  
  7. mysql> flush privileges;
  8. Query OK, 0 rows affected (0.00 sec)
  9. </property>

 

  • 配置Hive

hive-site.xml

  1. <configuration>
  2. <property>
  3.         <name>javax.jdo.option.ConnectionURL</name>
  4.         <value>jdbc:mysql://server1:3306/metastore?createDatabaseIfNotExist=true</value>
  5. </property>
  6. <property>
  7.         <name>javax.jdo.option.ConnectionDriverName</name>
  8.         <value>com.mysql.jdbc.Driver</value>
  9. </property>
  10. <property>
  11.         <name>javax.jdo.option.ConnectionUserName</name>
  12.         <value>hive</value>
  13. </property>
  14. <property>
  15.         <name>javax.jdo.option.ConnectionPassword</name>
  16.         <value>123456</value>
  17. </property>
  18. <property>
  19.         <name>hbase.master</name>
  20.         <value>server1:16000</value>
  21. </property>
  22. <property>
  23.         <name>hive.cli.print.current.db</name>
  24.         <value>true</value>
  25. </property>
  26. </configuration>

 

  • 创建Hive表
  1. CREATE EXTERNALTABLEcatalog(CATALOGID INT,JOURNAL STRING, PUBLISHER STRING,
  2.   EDITION STRING,TITLE STRING,AUTHOR STRING) ROW FORMAT DELIMITED FIELDS
  3.   TERMINATED BY ',' LINES TERMINATED BY '\n' STORED
  4. AS
  5.   TEXTFILE LOCATION '/catalog';

 

  • 配置Oracle Loader的配置文件
  1. <?xml version="1.0" encoding="UTF-8" ?>
  2. <configuration>
  3.  
  4. <!-- Input settings -->
  5.  <property>
  6.    <name>mapreduce.inputformat.class</name>
  7.    <value>oracle.hadoop.loader.lib.input.HiveToAvroInputFormat</value>
  8.  </property>
  9.   <property>
  10.    <name>oracle.hadoop.loader.input.hive.databaseName</name>
  11.    <value>default</value>
  12.  </property>
  13.  <property>
  14.    <name>oracle.hadoop.loader.input.hive.tableName</name>
  15.    <value>catalog</value>
  16.  </property>
  17.  <property>
  18.    <name>mapred.input.dir</name>
  19.    <value>/user/hive/warehouse/catalog</value>
  20.  </property>
  21.   <property>
  22.    <name>oracle.hadoop.loader.input.fieldTerminator</name>
  23.    <value>\u002C</value>
  24.  </property>
  25.  
  26. <!-- Output settings -->
  27.  <property>
  28.    <name>mapreduce.job.outputformat.class</name>
  29.    <value>oracle.hadoop.loader.lib.output.JDBCOutputFormat</value>
  30.  </property>
  31.  <property>
  32.    <name>mapreduce.output.fileoutputformat.outputdir</name>
  33.    <value>oraloadout</value>
  34.  </property>
  35.  
  36. <!-- Table information -->
  37.   <property>
  38.    <name>oracle.hadoop.loader.loaderMap.targetTable</name>
  39.    <value>catalog</value>
  40.  </property>
  41.  <property>
  42.     <name>oracle.hadoop.loader.input.fieldNames</name>
  43.     <value>CATALOGID,JOURNAL,PUBLISHER,EDITION,TITLE,AUTHOR</value>
  44. </property>
  45.  
  46. <!-- Connection information -->
  47.  <property>
  48.   <name>oracle.hadoop.loader.connection.url</name>
  49.   <value>jdbc:oracle:thin:@${HOST}:${TCPPORT}:${SID}</value>
  50. </property>
  51.  <property>
  52.   <name>TCPPORT</name>
  53.   <value>1521</value>
  54. </property>
  55. <property>
  56.   <name>HOST</name>
  57.   <value>server1</value>
  58. </property>
  59. <property>
  60.  <name>SID</name>
  61.   <value>orcl</value>
  62.   </property>
  63. <property>
  64.   <name>oracle.hadoop.loader.connection.user</name>
  65.   <value>baron</value>
  66. </property>
  67. <property>
  68.   <name>oracle.hadoop.loader.connection.password</name>
  69.   <value>baron</value>
  70. </property>
  71. </configuration>

 

  • 使用Oracle Loader for Hadoop加载Hive表到Oracle数据库

这里需要注意两点:

1 必须添加Hive配置文件路径到HADOOP_CLASSPATH环境变量

2 必须命令行调用hive-exec-*.jar hive-metastore-*.jar libfb303*.jar

  1. export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$OLH_HOME/jlib/*:$HIVE_HOME/lib/*:$HIVE_CONF_DIR
  2. hadoop jar $OLH_HOME/jlib/oraloader.jar oracle.hadoop.loader.OraLoader \
  3. -conf OraLoadJobConf-hive.xml \
  4. -libjars $OLH_HOME/jlib/oraloader.jar,$HIVE_HOME/lib/hive-exec-1.1.1.jar,$HIVE_HOME/lib/hive-metastore-1.1.1.jar,$HIVE_HOME/lib/libfb303-0.9.2.jar

输出结果如下:

  1. Oracle Loader for Hadoop Release 3.4.0 - Production
  2. Copyright (c) 2011, 2015, Oracle and/or its affiliates. All rights reserved.
  3. SLF4J: Class path contains multiple SLF4J bindings.
  4. SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-2.6.2/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  5. SLF4J: Found binding in [jar:file:/home/hadoop/hive-1.1.1/lib/hive-jdbc-1.1.1-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  6. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  7. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
  8. 15/12/08 04:53:51 INFO loader.OraLoader: Oracle Loader for Hadoop Release 3.4.0 - Production
  9. Copyright (c) 2011, 2015, Oracle and/or its affiliates. All rights reserved.
  10. 15/12/08 04:53:51 INFO loader.OraLoader: Built-Against: hadoop-2.2.0 hive-0.13.0 avro-1.7.3 jackson-1.8.8
  11. 15/12/08 04:53:51 INFO Configuration.deprecation: mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class
  12. 15/12/08 04:53:51 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
  13. 15/12/08 04:54:23 INFO Configuration.deprecation: mapred.submit.replication is deprecated. Instead, use mapreduce.client.submit.file.replication
  14. 15/12/08 04:54:24 INFO loader.OraLoader: oracle.hadoop.loader.loadByPartition is disabled because table: CATALOG is not partitioned
  15. 15/12/08 04:54:24 INFO output.DBOutputFormat: Setting reduce tasks speculative execution to false for : oracle.hadoop.loader.lib.output.JDBCOutputFormat
  16. 15/12/08 04:54:24 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
  17. 15/12/08 04:54:26 WARN loader.OraLoader: Sampler is disabled because the number of reduce tasks is less than two. Job will continue without sampled information.
  18. 15/12/08 04:54:26 INFO loader.OraLoader: Submitting OraLoader job OraLoader
  19. 15/12/08 04:54:26 INFO client.RMProxy: Connecting to ResourceManager at server1/192.168.56.101:8032
  20. 15/12/08 04:54:28 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
  21. 15/12/08 04:54:28 INFO metastore.ObjectStore: ObjectStore, initialize called
  22. 15/12/08 04:54:29 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
  23. 15/12/08 04:54:29 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
  24. 15/12/08 04:54:31 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
  25. 15/12/08 04:54:33 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
  26. 15/12/08 04:54:33 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
  27. 15/12/08 04:54:34 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
  28. 15/12/08 04:54:34 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
  29. 15/12/08 04:54:34 INFO DataNucleus.Query: Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is closing
  30. 15/12/08 04:54:34 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is MYSQL
  31. 15/12/08 04:54:34 INFO metastore.ObjectStore: Initialized ObjectStore
  32. 15/12/08 04:54:34 INFO metastore.HiveMetaStore: Added admin role in metastore
  33. 15/12/08 04:54:34 INFO metastore.HiveMetaStore: Added public role in metastore
  34. 15/12/08 04:54:35 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
  35. 15/12/08 04:54:35 INFO metastore.HiveMetaStore: 0: get_table : db=default tbl=catalog
  36. 15/12/08 04:54:35 INFO HiveMetaStore.audit: ugi=hadoop ip=unknown-ip-addr cmd=get_table : db=default tbl=catalog
  37. 15/12/08 04:54:36 INFO mapred.FileInputFormat: Total input paths to process : 1
  38. 15/12/08 04:54:36 INFO metastore.HiveMetaStore: 0: Shutting down the object store...
  39. 15/12/08 04:54:36 INFO HiveMetaStore.audit: ugi=hadoop ip=unknown-ip-addr cmd=Shutting down the object store...
  40. 15/12/08 04:54:36 INFO metastore.HiveMetaStore: 0: Metastore shutdown complete.
  41. 15/12/08 04:54:36 INFO HiveMetaStore.audit: ugi=hadoop ip=unknown-ip-addr cmd=Metastore shutdown complete.
  42. 15/12/08 04:54:37 INFO mapreduce.JobSubmitter: number of splits:2
  43. 15/12/08 04:54:37 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1449544601730_0015
  44. 15/12/08 04:54:38 INFO impl.YarnClientImpl: Submitted application application_1449544601730_0015
  45. 15/12/08 04:54:38 INFO mapreduce.Job: The url to track the job: http://server1:8088/proxy/application_1449544601730_0015/
  46. 15/12/08 04:54:49 INFO loader.OraLoader: map 0% reduce 0%
  47. 15/12/08 04:55:07 INFO loader.OraLoader: map 100% reduce 0%
  48. 15/12/08 04:55:22 INFO loader.OraLoader: map 100% reduce 67%
  49. 15/12/08 04:55:47 INFO loader.OraLoader: map 100% reduce 100%
  50. 15/12/08 04:55:47 INFO loader.OraLoader: Job complete: OraLoader (job_1449544601730_0015)
  51. 15/12/08 04:55:47 INFO loader.OraLoader: Counters: 49
  52.         File System Counters
  53.                 FILE: Number of bytes read=395
  54.                 FILE: Number of bytes written=370110
  55.                 FILE: Number of read operations=0
  56.                 FILE: Number of large read operations=0
  57.                 FILE: Number of write operations=0
  58.                 HDFS: Number of bytes read=6005
  59.                 HDFS: Number of bytes written=1861
  60.                 HDFS: Number of read operations=9
  61.                 HDFS: Number of large read operations=0
  62.                 HDFS: Number of write operations=5
  63.         Job Counters
  64.                 Launched map tasks=2
  65.                 Launched reduce tasks=1
  66.                 Data-local map tasks=2
  67.                 Total time spent by all maps in occupied slots (ms)=29809
  68.                 Total time spent by all reduces in occupied slots (ms)=36328
  69.                 Total time spent by all map tasks (ms)=29809
  70.                 Total time spent by all reduce tasks (ms)=36328
  71.                 Total vcore-seconds taken by all map tasks=29809
  72.                 Total vcore-seconds taken by all reduce tasks=36328
  73.                 Total megabyte-seconds taken by all map tasks=30524416
  74.                 Total megabyte-seconds taken by all reduce tasks=37199872
  75.         Map-Reduce Framework
  76.                 Map input records=3
  77.                 Map output records=3
  78.                 Map output bytes=383
  79.                 Map output materialized bytes=401
  80.                 Input split bytes=5610
  81.                 Combine input records=0
  82.                 Combine output records=0
  83.                 Reduce input groups=1
  84.                 Reduce shuffle bytes=401
  85.                 Reduce input records=3
  86.                 Reduce output records=3
  87.                 Spilled Records=6
  88.                 Shuffled Maps =2
  89.                 Failed Shuffles=0
  90.                 Merged Map outputs=2
  91.                 GC time elapsed (ms)=1245
  92.                 CPU time spent (ms)=14220
  93.                 Physical memory (bytes) snapshot=757501952
  94.                 Virtual memory (bytes) snapshot=6360301568
  95.                 Total committed heap usage (bytes)=535298048
  96.         Shuffle Errors
  97.                 BAD_ID=0
  98.                 CONNECTION=0
  99.                 IO_ERROR=0
  100.                 WRONG_LENGTH=0
  101.                 WRONG_MAP=0
  102.                 WRONG_REDUCE=0
  103.         File Input Format Counters
  104.                 Bytes Read=0
  105.         File Output Format Counters
  106.                 Bytes Written=1620

 

  • 问题

Oracle Loader for Hadoop的问题是出现未加载到数据库成功的行并没有报错提示

你可能感兴趣的:(连接Oracle与Hadoop(2) 使用OLH加载Hive表到Oracle)