spark101 | spark102 | spark103 |
---|---|---|
192.168.171.101 | 192.168.171.102 | 192.168.171.103 |
namenode | namenode | |
journalnode | journalnode | journalnode |
datanode | datanode | datanode |
nodemanager | nodemanager | nodemanager |
recource manager | recource manager | |
job history | ||
job log | job log | job log |
yum -y update
升级后建议重启
yum -y install gcc gcc-c++ autoconf automake cmake make rsync vim man zip unzip net-tools zlib zlib-devel openssl openssl-devel pcre-devel tcpdump lrzsz tar wget openssh-server
hostnamectl set-hostname spark01
hostnamectl set-hostname spark02
hostnamectl set-hostname spark03
vim /etc/sysconfig/network-scripts/ifcfg-ens160
网卡 配置文件示例
TYPE="Ethernet"
PROXY_METHOD="none"
BROWSER_ONLY="no"
BOOTPROTO="none"
DEFROUTE="yes"
IPV4_FAILURE_FATAL="no"
IPV6INIT="yes"
IPV6_AUTOCONF="yes"
IPV6_DEFROUTE="yes"
IPV6_FAILURE_FATAL="no"
IPV6_ADDR_GEN_MODE="stable-privacy"
NAME="ens32"
DEVICE="ens32"
ONBOOT="yes"
IPADDR="192.168.171.101"
PREFIX="24"
GATEWAY="192.168.171.2"
DNS1="192.168.171.2"
IPV6_PRIVACY="no"
sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/configsetenforce 0
systemctl stop firewalld
systemctl disable firewalld
vim /etc/hosts
修改内容如下:
192.168.171.101 spark01
192.168.171.102 spark02
192.168.171.103 spark03
在所有主机节点创建软件目录
mkdir -p /opt/soft
以下操作在 hadoop101 主机上完成
进入软件目录
cd /opt/soft
下载 JDK
wget https://download.oracle.com/otn/java/jdk/8u391-b13/b291ca3e0c8548b5a51d5a5f50063037/jdk-8u391-linux-x64.tar.gz?AuthParam=1698206552_11c0bb831efdf87adfd187b0e4ccf970
下载 zookeeper
wget https://dlcdn.apache.org/zookeeper/zookeeper-3.8.3/apache-zookeeper-3.8.3-bin.tar.gz
下载 hadoop
wget https://dlcdn.apache.org/hadoop/common/hadoop-3.3.5/hadoop-3.3.5.tar.gz
解压 JDK 修改名称
解压 zookeeper 修改名称
解压 hadoop 修改名称
tar -zxvf jdk-8u391-linux-x64.tar.gz -C /opt/soft/
mv jdk1.8.0_391/ jdk-8
tar -zxvf apache-zookeeper-3.8.3-bin.tar.gz
mv apache-zookeeper-3.8.3-bin zookeeper-3
tar -zxvf hadoop-3.3.5.tar.gz -C /opt/soft/
mv hadoop-3.3.5/ hadoop-3
配置环境变量
vim /etc/profile.d/my_env.sh
编写以下内容:
export JAVA_HOME=/opt/soft/jdk-8
export set JAVA_OPTS="--add-opens java.base/java.lang=ALL-UNNAMED"
export ZOOKEEPER_HOME=/opt/soft/zookeeper-3
export HDFS_NAMENODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_ZKFC_USER=root
export HDFS_JOURNALNODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
export HADOOP_HOME=/opt/soft/hadoop-3
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export PATH=$PATH:$JAVA_HOME/bin:$ZOOKEEPER_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
生成新的环境变量
注意:分发软件和配置文件后 在所有主机执行该步骤
source /etc/profile
cd $ZOOKEEPER_HOME/conf
vim zoo.cfg
# 心跳单位,2s
tickTime=2000
# zookeeper-3初始化的同步超时时间,10个心跳单位,也即20s
initLimit=10
# 普通同步:发送一个请求并得到响应的超时时间,5个心跳单位也即10s
syncLimit=5
# 内存快照数据的存储位置
dataDir=/home/zookeeper-3/data
# 事务日志的存储位置
dataLogDir=/home/zookeeper-3/datalog
# 当前zookeeper-3节点的端口
clientPort=2181
# 单个客户端到集群中单个节点的并发连接数,通过ip判断是否同一个客户端,默认60
maxClientCnxns=1000
# 保留7个内存快照文件在dataDir中,默认保留3个
autopurge.snapRetainCount=7
# 清除快照的定时任务,默认1小时,如果设置为0,标识关闭清除任务
autopurge.purgeInterval=1
#允许客户端连接设置的最小超时时间,默认2个心跳单位
minSessionTimeout=4000
#允许客户端连接设置的最大超时时间,默认是20个心跳单位,也即40s,
maxSessionTimeout=300000
#zookeeper-3 3.5.5启动默认会把AdminService服务启动,这个服务默认是8080端口
admin.serverPort=9001
#集群地址配置
server.1=spark01:2888:3888
server.2=spark02:2888:3888
server.3=spark03:2888:3888
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/home/zookeeper-3/data
dataLogDir=/home/zookeeper-3/datalog
clientPort=2181
maxClientCnxns=1000
autopurge.snapRetainCount=7
autopurge.purgeInterval=1
minSessionTimeout=4000
maxSessionTimeout=300000
admin.serverPort=9001
server.1=spark01:2888:3888
server.2=spark02:2888:3888
server.3=spark03:2888:3888
在每台服务器上执行
mkdir -p /home/zookeeper-3/data
mkdir -p /home/zookeeper-3/datalog
spark01
echo 1 > /home/zookeeper-3/data/myid
more /home/zookeeper-3/data/myid
spark02
echo 2 > /home/zookeeper-3/data/myid
more /home/zookeeper-3/data/myid
spark03
echo 3 > /home/zookeeper-3/data/myid
more /home/zookeeper-3/data/myid
在/etc/systemd/system/文件夹下创建一个启动脚本zookeeper-3.service
注意:在每台服务器上编写
cd /etc/systemd/system
vim zookeeper.service
内容如下:
[Unit]
Description=zookeeper
After=syslog.target network.target
[Service]
Type=forking
# 指定zookeeper-3 日志文件路径,也可以在zkServer.sh 中定义
Environment=ZOO_LOG_DIR=/home/zookeeper-3/datalog
# 指定JDK路径,也可以在zkServer.sh 中定义
Environment=JAVA_HOME=/opt/soft/jdk-8
ExecStart=/opt/soft/zookeeper-3/bin/zkServer.sh start
ExecStop=/opt/soft/zookeeper-3/bin/zkServer.sh stop
Restart=always
User=root
Group=root
[Install]
WantedBy=multi-user.target
[Unit]
Description=zookeeper
After=syslog.target network.target
[Service]
Type=forking
Environment=ZOO_LOG_DIR=/home/zookeeper-3/datalog
Environment=JAVA_HOME=/opt/soft/jdk-8
ExecStart=/opt/soft/zookeeper-3/bin/zkServer.sh start
ExecStop=/opt/soft/zookeeper-3/bin/zkServer.sh stop
Restart=always
User=root
Group=root
[Install]
WantedBy=multi-user.target
systemctl daemon-reload
# 等所有主机配置好后再执行以下命令
systemctl start zookeeper
systemctl enable zookeeper
systemctl status zookeeper
修改配置文件
cd $HADOOP_HOME/etc/hadoop
- hadoop-env.sh
- core-site.xml
- hdfs-site.xml
- workers
- mapred-site.xml
- yarn-site.xml
export JAVA_HOME=/opt/soft/jdk-8
export HADOOP_OPTS="--add-opens java.base/java.lang=ALL-UNNAMED"
export HDFS_NAMENODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_ZKFC_USER=root
export HDFS_JOURNALNODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
<configuration>
<property>
<name>fs.defaultFSname>
<value>hdfs://lihaozhevalue>
property>
<property>
<name>hadoop.tmp.dirname>
<value>/home/hadoop/datavalue>
property>
<property>
<name>ha.zookeeper.quorumname>
<value>spark01:2181,spark02:2181,spark03:2181value>
property>
<property>
<name>hadoop.http.staticuser.username>
<value>rootvalue>
property>
<property>
<name>dfs.permissions.enabledname>
<value>falsevalue>
property>
<property>
<name>hadoop.proxyuser.root.hostsname>
<value>*value>
property>
<property>
<name>hadoop.proxyuser.root.groupsname>
<value>*value>
property>
configuration>
<configuration>
<property>
<name>dfs.nameservicesname>
<value>lihaozhevalue>
property>
<property>
<name>dfs.ha.namenodes.lihaozhename>
<value>nn1,nn2value>
property>
<property>
<name>dfs.namenode.rpc-address.lihaozhe.nn1name>
<value>spark01:8020value>
property>
<property>
<name>dfs.namenode.rpc-address.lihaozhe.nn2name>
<value>spark02:8020value>
property>
<property>
<name>dfs.namenode.http-address.lihaozhe.nn1name>
<value>spark01:9870value>
property>
<property>
<name>dfs.namenode.http-address.lihaozhe.nn2name>
<value>spark02:9870value>
property>
<property>
<name>dfs.namenode.shared.edits.dirname>
<value>qjournal://spark01:8485;spark02:8485;spark03:8485/lihaozhevalue>
property>
<property>
<name>dfs.client.failover.proxy.provider.lihaozhename>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvidervalue>
property>
<property>
<name>dfs.ha.fencing.methodsname>
<value>sshfencevalue>
property>
<property>
<name>dfs.ha.fencing.ssh.private-key-filesname>
<value>/root/.ssh/id_rsavalue>
property>
<property>
<name>dfs.journalnode.edits.dirname>
<value>/home/hadoop/journalnode/datavalue>
property>
<property>
<name>dfs.ha.automatic-failover.enabledname>
<value>truevalue>
property>
<property>
<name>dfs.safemode.threshold.pctname>
<value>1value>
property>
configuration>
spark01
spark02
spark03
<configuration>
<property>
<name>mapreduce.framework.namename>
<value>yarnvalue>
property>
<property>
<name>mapreduce.application.classpathname>
<value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*value>
property>
<property>
<name>mapreduce.jobhistory.addressname>
<value>spark01:10020value>
property>
<property>
<name>mapreduce.jobhistory.webapp.addressname>
<value>hadoop102:19888value>
property>
configuration>
<configuration>
<property>
<name>yarn.resourcemanager.ha.enabledname>
<value>truevalue>
property>
<property>
<name>yarn.resourcemanager.cluster-idname>
<value>cluster1value>
property>
<property>
<name>yarn.resourcemanager.ha.rm-idsname>
<value>rm1,rm2value>
property>
<property>
<name>yarn.resourcemanager.hostname.rm1name>
<value>spark01value>
property>
<property>
<name>yarn.resourcemanager.hostname.rm2name>
<value>spark02value>
property>
<property>
<name>yarn.resourcemanager.webapp.address.rm1name>
<value>spark01:8088value>
property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2name>
<value>spark02:8088value>
property>
<property>
<name>yarn.resourcemanager.zk-addressname>
<value>spark01:2181,spark02:2181,spark03:2181value>
property>
<property>
<name>yarn.nodemanager.aux-servicesname>
<value>mapreduce_shufflevalue>
property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.classname>
<value>org.apache.hadoop.mapred.ShuffleHandlervalue>
property>
<property>
<name>yarn.nodemanager.env-whitelistname>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOMEvalue>
property>
<property>
<name>yarn.nodemanager.pmem-check-enabledname>
<value>falsevalue>
property>
<property>
<name>yarn.nodemanager.vmem-check-enabledname>
<value>falsevalue>
property>
<property>
<name>yarn.log-aggregation-enablename>
<value>truevalue>
property>
<property>
<name>yarn.log.server.urlname>
<value>http://spark01:19888/jobhistory/logsvalue>
property>
<property>
<name>yarn.log-aggregation.retain-secondsname>
<value>604800value>
property>
configuration>
创建本地秘钥并将公共秘钥写入认证文件
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
ssh-copy-id root@spark01
ssh-copy-id root@spark02
ssh-copy-id root@spark03
ssh root@spark01
exit
ssh root@spark02
exit
ssh root@spark03
exit
scp -r /etc/profile.d root@spark02:/etc
scp -r /etc/profile.d root@spark03:/etc
scp -r /opt/soft/zookeeper-3 root@spark02:/opt/soft
scp -r /opt/soft/zookeeper-3 root@spark03:/opt/soft
scp -r /opt/soft/hadoop-3/etc/hadoop/* root@spark02:/opt/soft/hadoop-3/etc/hadoop/
scp -r /opt/soft/hadoop-3/etc/hadoop/* root@spark03:/opt/soft/hadoop-3/etc/hadoop/
source /etc/profile
spark01
echo 1 > /home/zookeeper-3/data/myid
more /home/zookeeper-3/data/myid
spark02
echo 2 > /home/zookeeper-3/data/myid
more /home/zookeeper-3/data/myid
spark03
echo 3 > /home/zookeeper-3/data/myid
more /home/zookeeper-3/data/myid
在各节点执行以下命令
systemctl daemon-reload
systemctl start zookeeper
systemctl enable zookeeper
systemctl status zookeeper
jps
zkServer.sh status
1. 启动三个zookeeper:zkServer.sh start
2. 启动三个JournalNode:hadoop-daemon.sh start journalnode 或者 hdfs --daemon start journalnode
7. 在其中一个namenode上格式化:hdfs namenode -format
8. 把刚刚格式化之后的元数据拷贝到另外一个namenode上
a) 启动刚刚格式化的namenode :hadoop-daemon.sh start namenode
b) 在没有格式化的namenode上执行:hdfs namenode -bootstrapStandby
c) 启动第二个namenode: hadoop-daemon.sh start namenode
9. 在其中一个namenode上初始化hdfs zkfc -formatZK
10. 停止上面节点:stop-dfs.sh
11. 全面启动:start-all.sh
12. 启动resourcemanager节点 yarn-daemon.sh start resourcemanager
start-yarn.sh
http://dl.bintray.com/sequenceiq/sequenceiq-bin/hadoop-native-64-2.5.0.tar
高版本不需要执行第 12 步
13. 启动历史服务
mapred --daemon start historyserver
14 15 16 不需要执行
14、安全模式
hdfs dfsadmin -safemode enter
hdfs dfsadmin -safemode leave
15、查看哪些节点是namenodes并获取其状态
hdfs getconf -namenodes
hdfs haadmin -getServiceState spark01
16、强制切换状态
hdfs haadmin -transitionToActive --forcemanual spark01
# 关机之前 依关闭服务
stop-yarn.sh
stop-dfs.sh
# 开机后 依次开启服务
start-dfs.sh
start-yarn.sh
或者
# 关机之前关闭服务
stop-all.sh
# 开机后开启服务
start-all.sh
#jps 检查进程正常后开启胡哦关闭在再做其它操作
C:\Windows\System32\drivers\etc\hosts
追加以下内容:
192.168.171.101 hadoop101
192.168.171.102 hadoop102
192.168.171.103 hadoop103
Windows11 注意 修改权限
开始搜索 cmd
找到命令头提示符 以管理身份运行
进入 C:\Windows\System32\drivers\etc 目录
cd drivers/etc
去掉 hosts文件只读属性
attrib -r hosts
打开 hosts 配置文件
start hosts
追加以下内容后保存
192.168.171.101 spark01
192.168.171.102 spark02
192.168.171.103 spark03
浏览器访问: http://spark01:9870
浏览器访问:http://spark01:8088
浏览器访问:http://spark01:19888/
本地文件系统创建 测试文件 wcdata.txt
vim wcdata.txt
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
FlinkHBase Flink
Hive StormHive Flink HadoopHBase
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
FlinkHBase Flink
Hive StormHive Flink HadoopHBase
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
FlinkHBase Flink
Hive StormHive Flink HadoopHBase
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
FlinkHBase Flink
Hive StormHive Flink HadoopHBase
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
FlinkHBase Flink
Hive StormHive Flink HadoopHBase
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
FlinkHBase Flink
Hive StormHive Flink HadoopHBase
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
FlinkHBase Flink
Hive StormHive Flink HadoopHBase
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
FlinkHBase Flink
Hive StormHive Flink HadoopHBase
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
FlinkHBase Flink
Hive StormHive Flink HadoopHBase
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
FlinkHBase Flink
Hive StormHive Flink HadoopHBase
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
FlinkHBase Flink
Hive StormHive Flink HadoopHBase
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
FlinkHBase Flink
Hive StormHive Flink HadoopHBase
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
FlinkHBase Flink
Hive StormHive Flink HadoopHBase
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
FlinkHBase Flink
Hive StormHive Flink HadoopHBase
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
FlinkHBase Flink
Hive StormHive Flink HadoopHBase
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
FlinkHBase Flink
Hive StormHive Flink HadoopHBase
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
FlinkHBase Flink
Hive StormHive Flink HadoopHBase
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
FlinkHBase Flink
Hive StormHive Flink HadoopHBase
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
FlinkHBase Flink
Hive StormHive Flink HadoopHBase
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
FlinkHBase Flink
Hive StormHive Flink HadoopHBase
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
FlinkHBase Flink
Hive StormHive Flink HadoopHBase
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
FlinkHBase Flink
Hive StormHive Flink HadoopHBase
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
在 HDFS 上创建目录 /wordcount/input
hdfs dfs -mkdir -p /wordcount/input
查看 HDFS 目录结构
hdfs dfs -ls /
hdfs dfs -ls /wordcount
hdfs dfs -ls /wordcount/input
上传本地测试文件 wcdata.txt 到 HDFS 上 /wordcount/input
hdfs dfs -put wcdata.txt /wordcount/input
检查文件是否上传成功
hdfs dfs -ls /wordcount/input
hdfs dfs -cat /wordcount/input/wcdata.txt
计算 PI 的值
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.5.jar pi 10 10
单词统计
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.5.jar wordcount /wordcount/input/wcdata.txt /wordcount/result
hdfs dfs -ls /wordcount/result
hdfs dfs -cat /wordcount/result/part-r-00000
hadoop101
cd /home/hadoop_data/dfs/name/current
ls
看到如下内容:
edits_0000000000000000001-0000000000000000009 edits_inprogress_0000000000000000299 fsimage_0000000000000000298 VERSION
edits_0000000000000000010-0000000000000000011 fsimage_0000000000000000011 fsimage_0000000000000000298.md5
edits_0000000000000000012-0000000000000000298 fsimage_0000000000000000011.md5 seen_txid
查看fsimage
hdfs oiv -p XML -i fsimage_0000000000000000011
将元数据内容按照指定格式读取后写入到新文件中
hdfs oiv -p XML -i fsimage_0000000000000000011 -o /opt/soft/fsimage.xml
查看edits
将元数据内容按照指定格式读取后写入到新文件中
hdfs oev -p XML -i edits_inprogress_0000000000000000299 -o /opt/soft/edit.xml