Soft | Verison |
---|---|
Hadoop | 3.3 |
Hive | 3.1.3 |
Doris | 1.1-pr2 |
Spark | 3.1.2 |
Iceberg | 0.13.2 |
Hadoop3.3 下载
wget https://dlcdn.apache.org/hadoop/common/hadoop-3.3.3/hadoop-3.3.3.tar.gz
Lacalhost SSH 免密登录
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
ssh localhost
执行 ssh localhost,会弹出安全提示,填写yes即可。
如果是docker环境下,需要手动启动sshd服务。
/usr/sbin/sshd
core-site.xml配置
[root@17a5da45700b hadoop]# cat core-site.xml
fs.defaultFS
hdfs://localhost:9000
hadoop.proxyuser.root.hosts
*
hadoop.proxyuser.root.groups
*
hdfs-site.xml配置
[root@17a5da45700b hadoop]# cat hdfs-site.xml
dfs.replication
1
hadoop-env.sh配置:添加如下配置到hadoop-env.sh。
export JAVA_HOME=/opt/software/jdk8
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
格式化hdfs文件系统
bin/hdfs namenode -format
启动hdfs
sbin/start-dfs.sh
创建HDFS目录
bin/hdfs dfs -mkdir /user
bin/hdfs dfs -mkdir /user/root
查看HDFS目录,确保HDFS服务正常。
[root@17a5da45700b hadoop-3.3.3]# bin/hdfs dfs -ls /user
Found 2 items
drwxr-xr-x - root supergroup 0 2022-06-21 03:00 /user/hive
drwxr-xr-x - root supergroup 0 2022-06-15 09:38 /user/root
Hadoop环境变量配置。
export HADOOP_HOME=/opt/software/hadoop/hadoop-3.3.3
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_HDFS_HOME=$HADOOP_HOME
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
下载并安装Hive
wgt https://dlcdn.apache.org/hive/hive-3.1.3/apache-hive-3.1.3-bin.tar.gz
tar -xzvf apache-hive-3.1.3-bin.tar.gz
cd apache-hive-3.1.3-bin
Hive环境配置
export HIVE_HOME=/opt/software/hive/apache-hive-3.1.3-bin
export PATH=$HIVE_HOME/bin:$PATH
在Hdfs上创建Hive目录并赋权。
$HADOOP_HOME/bin/hadoop fs -mkdir /tmp
$HADOOP_HOME/bin/hadoop fs -mkdir /user/hive/warehouse
$HADOOP_HOME/bin/hadoop fs -chmod g+w /tmp
$HADOOP_HOME/bin/hadoop fs -chmod g+w /user/hive/warehouse
hive-env.sh添加HADOOP_HOME配置
#创建hive-env.sh文件
mv hive-env.sh.template hive-env.sh
#添加HADOOP_HOME配置
HADOOP_HOME=/opt/software/hadoop/hadoop-3.3.3
hive-site.xml配置:这里新建hive-site.xml配置文件
vim hive-site.xml
[root@17a5da45700b conf]# cat hive-site.xml
javax.jdo.option.ConnectionURL
jdbc:mysql://localhost:3306/hive?
createDatabaseIfNotExist=true
javax.jdo.option.ConnectionDriverName
com.mysql.jdbc.Driver
javax.jdo.option.ConnectionUserName
root
javax.jdo.option.ConnectionPassword
root
hive.metastore.warehouse.dir
/user/hive/warehouse
location of default database for the
warehouse
hive.metastore.uris
thrift://127.0.0.1:9083
Thrift URI for the remote metastore. Used by metastore
client to connect to remote metastore.
javax.jdo.PersistenceManagerFactoryClass
org.datanucleus.api.jdo.JDOPersistenceManagerFactory
hive.metastore.schema.verification
false
datanucleus.schema.autoCreateAll
true
在MySQL侧创建hive数据库,用户存储Hive元数据信息。
mysql --host=localhost --user=root --password=root
create database hive;
use hive;
cd /opt/software/hive/apache-hive-3.1.3-bin/lib
wegt https://repo1.maven.org/maven2/mysql/mysql-connector-java/5.1.49/mysql-connector-java-5.1.49.jar
:注意mysql-connector-java-5.1.49.jar的版本要和MySQL的大版本匹配,如mysql-connector版本大于MySQL很可能会报错。
我这里MySQL版本为5.7.38:
mysql> show variables like '%version%'
-> ;
+-------------------------+------------------------------+
| Variable_name | Value |
+-------------------------+------------------------------+
| innodb_version | 5.7.38 |
| protocol_version | 10 |
| slave_type_conversions | |
| tls_version | TLSv1,TLSv1.1,TLSv1.2 |
| version | 5.7.38 |
| version_comment | MySQL Community Server (GPL) |
| version_compile_machine | x86_64 |
| version_compile_os | Linux |
+-------------------------+------------------------------+
8 rows in set (0.00 sec)
mysql>
schematool -initSchema -dbType mysql
mysql> show tables;
+-------------------------------+
| Tables_in_hive |
+-------------------------------+
| AUX_TABLE |
| BUCKETING_COLS |
| CDS |
| COLUMNS_V2 |
| COMPACTION_QUEUE |
| COMPLETED_COMPACTIONS |
| COMPLETED_TXN_COMPONENTS |
| CTLGS |
| DATABASE_PARAMS |
| DBS |
| DB_PRIVS |
| DELEGATION_TOKENS |
| FUNCS |
| FUNC_RU |
| GLOBAL_PRIVS |
| HIVE_LOCKS |
| IDXS |
| INDEX_PARAMS |
| I_SCHEMA |
| KEY_CONSTRAINTS |
| MASTER_KEYS |
| MATERIALIZATION_REBUILD_LOCKS |
| METASTORE_DB_PROPERTIES |
| MIN_HISTORY_LEVEL |
| MV_CREATION_METADATA |
| MV_TABLES_USED |
| NEXT_COMPACTION_QUEUE_ID |
| NEXT_LOCK_ID |
| NEXT_TXN_ID |
| NEXT_WRITE_ID |
| NOTIFICATION_LOG |
| NOTIFICATION_SEQUENCE |
| NUCLEUS_TABLES |
| PARTITIONS |
| PARTITION_EVENTS |
| PARTITION_KEYS |
| PARTITION_KEY_VALS |
| PARTITION_PARAMS |
| PART_COL_PRIVS |
| PART_COL_STATS |
| PART_PRIVS |
| REPL_TXN_MAP |
| ROLES |
| ROLE_MAP |
| RUNTIME_STATS |
| SCHEMA_VERSION |
| SDS |
| SD_PARAMS |
| SEQUENCE_TABLE |
| SERDES |
| SERDE_PARAMS |
| SKEWED_COL_NAMES |
| SKEWED_COL_VALUE_LOC_MAP |
| SKEWED_STRING_LIST |
| SKEWED_STRING_LIST_VALUES |
| SKEWED_VALUES |
| SORT_COLS |
| TABLE_PARAMS |
| TAB_COL_STATS |
| TBLS |
| TBL_COL_PRIVS |
| TBL_PRIVS |
| TXNS |
| TXN_COMPONENTS |
| TXN_TO_WRITE_ID |
| TYPES |
| TYPE_FIELDS |
| VERSION |
| WM_MAPPING |
| WM_POOL |
| WM_POOL_TO_TRIGGER |
| WM_RESOURCEPLAN |
| WM_TRIGGER |
| WRITE_SET |
+-------------------------------+
74 rows in set (0.00 sec)
启动HiveServer2和Metastore
hive --service hiveserver2 >/dev/null 2>/dev/null &
hive --service metastore 1>/dev/null 2>&1 &
查看进程
[root@17a5da45700b lib ~]# jobs
[1]- Running hive --service hiveserver2 > /dev/null 2> /dev/null &
[2]+ Running hive --service metastore > /dev/null 2>&1 &
查看metastore端口:
[root@17a5da45700b lib]# netstat -ano |grep 9083
tcp 0 0 0.0.0.0:9083 0.0.0.0:* LISTEN off (0.00/0/0)
Doris安装参照官网:https://doris.apache.org/zh-CN/docs/get-starting/get-starting.html#%E5%8D%95%E6%9C%BA%E9%83%A8%E7%BD%B2
Doris和Hadoop有一些端口冲突,需要对Doirs默认端口进行修改。
vim be/conf/be.conf
将webserver_port = 8040修改为webserver_port = 18040
vim fe/conf/fe.conf
将http_port = 8030修改为http_port = 18030
Spark下载:
wget https://dlcdn.apache.org/spark/spark-3.1.2/spark-3.1.2-bin-hadoop3.2.tgz
tar -xzvf spark-3.1.2-bin-hadoop3.2.tgz
配置Spark环境变量
export SPARK_HOME=/opt/software/spark-3.1.2-bin-hadoop3.2
export PATH=$PATH:$SPARK_HOME/bin
将hive-site.xml复制到Spark conf目录
cp /opt/software/hive/apache-hive-3.1.3-bin/conf/hive-site.xml /opt/software/spark-3.1.2-bin-hadoop3.2/conf
添加MySQL驱动、hive依赖、Iceberg依赖到Spark
cd /opt/software/spark-3.1.2-bin-hadoop3.2/jars/
wegt https://repo1.maven.org/maven2/mysql/mysql-connector-java/5.1.49/mysql-connector-java-5.1.49.jar
wget https://search.maven.org/remotecontent?filepath=org/apache/iceberg/iceberg-spark-runtime-3.1_2.12/0.13.2/iceberg-spark-runtime-3.1_2.12-0.13.2.jar
wget https://search.maven.org/remotecontent?filepath=org/apache/iceberg/iceberg-hive-runtime/0.13.2/iceberg-hive-runtime-0.13.2.jar
注意:这里下载完可能出现如下的文件名,需要我们使用mv命令将其修改为正确的Jar包名称:
remotecontent?filepath=org%2Fapache%2Ficeberg%2Ficeberg-spark-runtime-3.1_2.12%2F0.13.2%2Ficeberg-spark-runtime-3.1_2.12-0.13.2.jar
spark-sql --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
--conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \
--conf spark.sql.catalog.spark_catalog.type=hive \
--conf spark.sql.catalog.hive_prod=org.apache.iceberg.spark.SparkCatalog \
--conf spark.sql.catalog.hive_prod.type=hive \
--conf spark.sql.catalog.hive_prod.uri=thrift://127.0.0.1:9083
SET iceberg.engine.hive.enabled=true;
ADD JAR /opt/software/hive/apache-hive-3.1.3-bin/lib/iceberg-hive-runtime-0.13.2.jar;
ADD JAR /opt/software/hive/apache-hive-3.1.3-bin/lib/iceberg-spark-runtime-3.1_2.12-0.13.2.jar;
create database hive_prod.testdb;
use hive_prod.testdb;
CREATE TABLE hive_prod.testdb.test_table (id bigint, data string) USING iceberg;
INSERT INTO hive_prod.testdb.test_table VALUES (1, 'a'), (2, 'b'), (3, 'c');
mysql -h 127.0.0.1 -P 9030 -uroot
CREATE DATABASE test_iceberg_db PROPERTIES ( "iceberg.database" = "testdb", "iceberg.hive.metastore.uris" = "thrift://127.0.0.1:9083", "iceberg.catalog.type" = "HIVE_CATALOG" )
Doris数据查询。
use test_iceberg_db
mysql> show tables;
+----------------------------+
| Tables_in_test_iceberg_db |
+----------------------------+
| test_table |
+----------------------------+
1 row in set (0.00 sec)
mysql> select * from test_table;
+------+------+
| id | data |
+------+------+
| 1 | a |
| 2 | b |
| 3 | c |
+------+------+
3 rows in set (0.40 sec)