1、下载hadoop
https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/
2、安装jdk
rpm -ivh jdk-8u261-linux-x64.rpm
3、解压hadoop文件
tar zxvf hadoop-3.3.1.tar.gz
4、指定jdk地址
修改hadoop-3.3.1/etc/hadoop/hadoop-env.sh文件
到hadoop-3.3.1目录下,执行
vi etc/hadoop/hadoop-env.sh
增加 export JAVA_HOME=/usr/java/jdk1.8.0_261-amd64
5、伪分布式部署
1)配置etc/hadoop/core-site.xml文件
vi etc/hadoop/core-site.xml
增加以下内容
fs.defaultFS
hdfs://localhost:9000
2)配置etc/hadoop/hdfs-site.xml文件
vi etc/hadoop/hdfs-site.xml
增加以下内容
dfs.replication
1
3)配置ssh localhost无密码登录
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
4)格式化文件系统
bin/hdfs namenode -format
5)执行sbin/start-dfs.sh,报以下错误
[root@iZ2zeb8tcng37z21t5bk9cZ hadoop-3.3.1]# sbin/start-dfs.sh
Starting namenodes on [localhost]
ERROR: Attempting to operate on hdfs namenode as root
ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation.
Starting datanodes
ERROR: Attempting to operate on hdfs datanode as root
ERROR: but there is no HDFS_DATANODE_USER defined. Aborting operation.
Starting secondary namenodes [iZ2zeb8tcng37z21t5bk9cZ]
ERROR: Attempting to operate on hdfs secondarynamenode as root
ERROR: but there is no HDFS_SECONDARYNAMENODE_USER defined. Aborting operation.
需要在sbin目录下start-dfs.sh和stop-dfs.sh文件空白处增加
HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
6)在sbin/start-yarn.sh和sbin/stop-yarn.sh文件空白处增加
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root
启动dfs和yarn
sbin/start-dfs.sh
sbin/start-yarn.sh
7)生成执行MapReduce作业所需的HDFS目录
bin/hdfs dfs -mkdir /user
bin/hdfs dfs -mkdir /user/
8)将输入文件复制到分布式文件系统
bin/hdfs dfs -mkdir input
bin/hdfs dfs -put etc/hadoop/*.xml input
9)检查输出文件:将输出文件从分布式文件系统复制到本地文件系统并进行检查
bin/hdfs dfs -get output output
cat output/*
查看分布式文件系统上的输出文件
bin/hdfs dfs -cat output/*
10)配置单节点YARN
配置etc/hadoop/mapred-site.xml文件
vi etc/hadoop/mapred-site.xml
增加
mapreduce.framework.name
yarn
mapreduce.application.classpath
$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*
配置etc/hadoop/yarn-site.xml文件
vi etc/hadoop/yarn-site.xml
增加
yarn.nodemanager.aux-services
mapreduce_shuffle
yarn.nodemanager.env-whitelist
JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME,PATH,LANG,TZ,HADOOP_MAPRED_HOME
11)安装配置Hive
下载文件地址
https://hive.apache.org/general/downloads/
解压文件
tar zxvf apache-hive-2.3.9-bin.tar.gz
修改目录
mv apache-hive-2.3.9-bin/ hive-2.3.9
12)进入hive-2.3.9/conf路径,重命名配置文件:
mv hive-env.sh.template hive-env.sh
13)修改hive-env.sh文件
vi hive-env.sh
增加以下内容
# Set HADOOP_HOME to point to a specific hadoop install directory
# 指定Hadoop安装路径
HADOOP_HOME=/root/hadoop-3.3.1
# Hive Configuration Directory can be controlled by:
# 指定Hive配置文件夹
export HIVE_CONF_DIR=/root/hive-2.3.9/conf
14)修改环境变量
vi /etc/profile
增加以下内容
export HIVE_HOME=/root/hive-2.3.9
export PATH=$PATH:$HIVE_HOME/bin
# Hadoop环境加入Hive依赖
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HIVE_HOME/lib/*
声明环境变量:
source /etc/profile
15)安装mysql
解压安装包
tar xvf mysql-8.0.26-1.el8.x86_64.rpm-bundle.tar
安装顺序
rpm -ivh mysql-community-common-8.0.26-1.el8.x86_64.rpm
rpm -ivh mysql-community-client-plugins-8.0.26-1.el8.x86_64.rpm
rpm -ivh mysql-community-libs-8.0.26-1.el8.x86_64.rpm
rpm -ivh mysql-community-client-8.0.26-1.el8.x86_64.rpm
rpm -ivh mysql-community-server-8.0.26-1.el8.x86_64.rpm
16)初始化数据库
mysqld --initialize
17)查看配置文件、赋权
cat /etc/my.cnf
这里面有一行是 datadir=/var/lib/mysql 表示数据文件存放地址,这个文件夹要给mysql用户赋权,不然是无法启动数据库的。执行
chown mysql:mysql /var/lib/mysql -R
命令进行赋权。赋权必须在数据库初始化后进行(前提是你用非mysql用户安装mysql,而不是mysql用户),不然启动数据库会报错。
18)启动数据库
启动MySql
systemctl start mysqld.service
停止MySql
systemctl stop mysqld.service
重启MySql
systemctl restart mysqld.service
设置MySql开机自启
systemctl enable mysqld
19)查看修改root用户密码
查看数据库的初始密码
cat /var/log/mysqld.log | grep password
修改初始密码
mysqladmin -uroot -p'Ush&4PGR=0Vj' password Mysql123456
如果初始密码中有特殊字符,如<、&等字符,可以在密码信息两边加上单引号。
20)设置远程客户端登录mysql
mysql -u root -p
use mysql
update user set host='%' where user = 'root';
select host,user from user;
设置完成需要重启数据库
systemctl restart mysqld.service
21)在hive中上传mysql驱动包
安装mysql驱动包
rpm -ivh mysql-connector-java-8.0.26-1.el8.noarch.rpm
说明需要java-headless
执行以下命令进行安装
yum install java-headless
再次安装mysql驱动包
rpm -ivh mysql-connector-java-8.0.26-1.el8.noarch.rpm
查找mysql驱动包路径
find / -name mysql-connector-java.jar
拷贝mysql-connector-java.jar到hive-2.3.9/lib目录下
cp /usr/share/java/mysql-connector-java.jar /root/hive-2.3.9/lib
22)在hive-2.3.9/conf路径创建配置文件hive-site.xml
vi hive-site.xml
增加以下内容
javax.jdo.option.ConnectionURL
jdbc:mysql://localhost:3306/metastore?createDatabaseIfNotExist=true
JDBC connect string for a JDBC metastore
javax.jdo.option.ConnectionDriverName
com.mysql.cj.jdbc.Driver
Driver class name for a JDBC metastore
javax.jdo.option.ConnectionUserName
root
username to use against metastore database
javax.jdo.option.ConnectionPassword
HadoopMysql123456
password to use against metastore database
# 查询表时显示表头信息
hive.cli.print.header
true
# 显示当前所在的数据库
hive.cli.print.current.db
true
初始化metastore,起服务
schematool -dbType mysql -initSchema
hive --service metastore &
23)修改hadoop 配置文件 etc/hadoop/core-site.xml,加入如下配置项
vi etc/hadoop/core-site.xml
增加以下内容
hadoop.proxyuser.root.hosts
*
hadoop.proxyuser.root.groups
*
需要重启stop-dfs.sh、stop-yarn.sh
sbin/stop-dfs.sh
sbin/stop-yarn.sh
sbin/start-dfs.sh
sbin/start-yarn.sh
24)创建数据库、赋权
登录hive,创建chinese_consul数据库
hive
create database if not exists chinese_consul;
quit;
赋权
./hadoop dfs -chmod -R 777 /user/hive/warehouse/chinese_consul.db
通过前端客户端insert数据时报以下错误:
org.apache.hadoop.security.AccessControlException: Permission denied: user=anonymous, access=EXECUTE, inode="/tmp/hadoop-yarn":root:supergroup:drwx------
是权限不足导致
执行以下语句进行赋权
hadoop fs -chown hadoop:hadoop /tmp/hadoop-yarn
hadoop fs -chmod -R 777 /tmp/hadoop-yarn
存在的问题
Exiting with status 1: java.io.IOException: NameNode is not formatted.
此时9000端口是没有打开
解决方法:重新格式化文件系统
bin/hdfs namenode -format
如果想修改resourcemanager前端管理页面地址http://localhost:8088
修改hadoop-3.3.1/etc/hadoop/yarn-site.xml文件
增加
yarn.resourcemanager.webapp.address
${yarn.resourcemanager.hostname}:8088
hive建表,注释内容乱码解决
登录mysql数据库
mysql -u root -p
切换到metastore库。
use metastore;
执行一下操作。
alter table COLUMNS_V2 modify column COMMENT varchar(256) character set utf8;
alter table TABLE_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8;
alter table PARTITION_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8;
alter table PARTITION_KEYS modify column PKEY_COMMENT varchar(4000) character set utf8;
alter table INDEX_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8;
重新建表即可。
修改hadoop的ResourceManager的默认端口,默认端口为8088,但是这个地址很容易被挖矿程序攻击。
修改yarn-site.xml文件
vi etc/hadoop/yarn-site.xml
增加以下内容
yarn.resourcemanager.webapp.address
${yarn.resourcemanager.hostname}:8888
8888为新的端口
重启
sbin/stop-dfs.sh
sbin/stop-yarn.sh
sbin/start-dfs.sh
sbin/start-yarn.sh
参考文档:
https://hadoop.apache.org/docs/r3.3.1/hadoop-yarn/hadoop-yarn-common/yarn-default.xml