In a previous post I showed how to setup Hadoop/Hive to use Derby in server mode as the metastore. Many believe MySQL is a better choice for such purpose, so here I'm going to show how we can configure our cluster which we created previously to use a MySQL server as the metastore for Hive.
First we need to install MySQL. In this scenario, I'm going to install MySQL on our Master node, which is named centos1.
When logged in as root user:
yum install mysql-server
Now make sure MySQL server is started:
/etc/init.d/mysqld start
Next, I'm going to create a new MySQL user for hadoop/hive:
mysql
mysql> CREATE USER 'hadoop'@'centos1' IDENTIFIED BY 'hadoop';
mysql> GRANT ALL PRIVILEGES ON *.* TO 'hadoop'@'centos1' WITH GRANT OPTION;
mysql> exit
To make sure this new user can connect to MySQL server, switch to user hadoop:
mysql -h centos1 -u hadoop -p
We need to change the hive configuration so it can use MySQL:
nano /hadoop/hive/conf/hive-site.xml
and new configuration values are:
<property>
<name>hive.metastore.
local</name>
<value>true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://centos1:3306/hive?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hadoop</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hadoop</value>
</property>
Some of the above parameters do not match what we did to setup derby server in previous post, so I decided to delete the jpox.properties file:
rm /hadoop/hive/conf/jpox.properties
hive needs to have the MySQL jdbc drivers, so we need to download and copy it to hive/lib folder:
cd /hadoop
wget http://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.11.tar.gz/from/http://mysql.he.net/
tar -xvzf mysql-connector-java-5.1.11.tar.gz
cp mysql-connector-java-5.1.11/*.jar /hadoop/hive/lib
To make sure all settings are done correctly, we can do this:
cd /hadoop/hive
bin/hive
hive> show tables;