趁这几天放假,把以前大数据学习笔记梳理下,复习&整合下知识点,包含hadoop系列,流计算框架,ELK Stack等;大数据的笔记相对完整些,开源词法&语法分析工具ANTLR4本来想分享一个系列,无奈笔记太零散了,代码又比较多,实在没精力整合;希望这个系列能完成。
2.1.OS
准备3台centos/rhel7服务器,虚机/实体机都可以,OS默认安装即可。IP&HOSTNAME:
192.168.100.101 ipsnode1
192.168.100.102 ipsnode2
192.168.100.103 ipsnode3
2.2 java
所有节点:
rpm -ivh jdk-8u251-linux-x64.rpm
vi /etc/profile
…
export JAVA_HOME=/usr/java/jdk1.8.0_251-amd64
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin
…
2.3 vi /etc/hosts
所有节点:
192.168.100.101 ipsnode1
192.168.100.102 ipsnode2
192.168.100.103 ipsnode3
2.4 免密登录,注意:如果namecode HA的话主备要能相互免密
node1:
ssh-keygen -t dsa -P ‘’ -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
node2&3:mkdir ~/.ssh
node1:
scp ~/.ssh/authorized_keys root@ipsnode2:~/.ssh/authorized_keys
scp ~/.ssh/authorized_keys root@ipsnode3:~/.ssh/authorized_keys
node1&2&3:chmod 600 ~/.ssh/authorized_keys
2.5 others
所有节点:
systemctl stop firewalld.service
systemctl disable firewalld.service
vi /etc/selinux/config
SELINUX=disabled
2.6 reboot all nodes os
3.1 install package(for hdfs ha failover)
yum -y install psmisc #所有节点
3.2 download
下载&解压到指定目录即可,建议用低点的版本,后面用到流计算框架的时候,最高匹配版本也就是2.x.x;单纯用hadoop核心功能hdfs,mapreduce无所谓,我当初配的是3.x的版本,后面由于其它组件匹配问题降到了hadoop-2.10.0,用hadoop原生生态圈最大的问题之一就是版本,各种不匹配&各种问题,好处是:每解决一个问题,你都可以学的更深入,加深对hadoop系统的理解^_^
生产环境还是建议用CDH发行版。
3.3 vi /etc/profile
…
export HADOOP_HOME=/usr/local/hadoop-2.10.0/
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin
…
3.4 create dir #所有节点
/hadoop/tmp,/hadoop/hdfs/data, /hadoop/hdfs/name
3.5 edit config files
in /usr/local/hadoop-2.10.0/etc/hadoop: hadoop-env.sh,yarn-env.sh,core-site.xml,hdfs-site.xml,mapred-site.xml,yarn-site.xml,slaves
3.5.1 vi hadoop-env.sh
###################
export JAVA_HOME=… #UPDATE
#after 3.1.x please add
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
#added for HDFS ha
export HDFS_JOURNALNODE_USER=root
export HDFS_ZKFC_USER=root
3.5.2 vi yarn-env.sh #after 3.1.x skip
###################
export JAVA_HOME=… #UPDATE
3.5.3 vi core-site.xml #adding
###################
<property>
<name>fs.defaultFSname>
<value>hdfs://ipsnode1:9000value>
<description>URI of HDFSdescription>
property>
<property>
<name>hadoop.tmp.dirname>
<value>/hadoop/tmpvalue>
<description>node tmp dirdescription>
property>
3.5.4 vi hdfs-site.xml
###################
<property>
<name>dfs.namenode.name.dirname>
<value>file:/hadoop/hdfs/namevalue>
property>
<property>
<name>dfs.datanode.data.dirname>
<value>file:/hadoop/hdfs/datavalue>
property>
<property>
<name>dfs.replicationname>
<value>3value>
property>
3.5.5 vi mapred-site.xml
###################
cp mapred-site.xml.template mapred-site.xml,vi mapred-site.xml
<property>
<name>mapreduce.framework.namename>
<value>yarnvalue>
property>
<property>
<name>yarn.app.mapreduce.am.envname>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}value>
property>
<property>
<name>mapreduce.map.envname>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}value>
property>
<property>
<name>mapreduce.reduce.envname>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}value>
property>
3.5.6 vi yarn-site.xml
###################
<property>
<name>yarn.resourcemanager.hostnamename>
<value>ipsnode1value>
property>
<property>
<name>yarn.nodemanager.aux-servicesname>
<value>mapreduce_shufflevalue>
property>
3.5.7 vi slaves #after 3.1.x edit workers
###################
ipsnode1
ipsnode2
ipsnode3
3.5.8 scp hadoop dir to others nodes
###################
3.5.9 format namenode #primary node
###################
hdfs namenode -format
[root@ipsnode1 hadoop]# tree /hadoop/
/hadoop/
├── hdfs
│ ├── data
│ └── name
│ └── current
│ ├── fsimage_0000000000000000000
│ ├── fsimage_0000000000000000000.md5
│ ├── seen_txid
│ └── VERSION
└── tmp
3.5.10 start hadoop #primary node
###################
/usr/local/hadoop-2.10.0/sbin
./start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [ipsnode1]
The authenticity of host ‘ipsnode1 (192.168.100.101)’ can’t be established.
ECDSA key fingerprint is 65:37:6b:13:53:56:ea:a2:1c:02:50:2e:b6:2e:fc:25.
Are you sure you want to continue connecting (yes/no)? yes
ipsnode1: Warning: Permanently added ‘ipsnode1,192.168.100.101’ (ECDSA) to the list of known hosts.
ipsnode1: starting namenode, logging to /usr/local/hadoop-2.10.0/logs/hadoop-root-namenode-ipsnode1.out
ipsnode1: starting datanode, logging to /usr/local/hadoop-2.10.0/logs/hadoop-root-datanode-ipsnode1.out
ipsnode2: starting datanode, logging to /usr/local/hadoop-2.10.0/logs/hadoop-root-datanode-ipsnode2.out
ipsnode3: starting datanode, logging to /usr/local/hadoop-2.10.0/logs/hadoop-root-datanode-ipsnode3.out
Starting secondary namenodes [0.0.0.0]
The authenticity of host ‘0.0.0.0 (0.0.0.0)’ can’t be established.
ECDSA key fingerprint is 65:37:6b:13:53:56:ea:a2:1c:02:50:2e:b6:2e:fc:25.
Are you sure you want to continue connecting (yes/no)? yes
0.0.0.0: Warning: Permanently added ‘0.0.0.0’ (ECDSA) to the list of known hosts.
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop-2.10.0/logs/hadoop-root-secondarynamenode-ipsnode1.out
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop-2.10.0/logs/yarn-root-resourcemanager-ipsnode1.out
ipsnode2: starting nodemanager, logging to /usr/local/hadoop-2.10.0/logs/yarn-root-nodemanager-ipsnode2.out
ipsnode3: starting nodemanager, logging to /usr/local/hadoop-2.10.0/logs/yarn-root-nodemanager-ipsnode3.out
ipsnode1: starting nodemanager, logging to /usr/local/hadoop-2.10.0/logs/yarn-root-nodemanager-ipsnode1.out
[root@ipsnode1 sbin]# jps
10465 NameNode
10593 DataNode
10753 SecondaryNameNode
10900 ResourceManager
11003 NodeManager
11295 Jps
4.1 web check hadoop status
http://namenodeips:8088 or 50070(9870 after 3.x) #50070 health check
4.2 test upload files
语法:
hadoop fs -put localfile /path #path like /path/… or hdfs://ipsnode1:9000/
hdfs dfs -put localfile /path #path like /path/… or hdfs://ipsnode1:9000/
例子:
[root@ipsnode1 data]# ls -lt
-rw-r–r--. 1 root root 302335130 Mar 13 17:06 UID1548997664_FILE1_1.csv
-rw-r–r--. 1 root root 212817771 Oct 5 17:00 UID1548996165_FILE1_1.csv
-rw-r–r--. 1 root root 272970904 Oct 9 09:00 UID1548994589_FILE1_4.csv
[root@ipsnode1 data]# hadoop fs -ls /
[root@ipsnode1 data]# hadoop fs -put UID1548997664_FILE1_1.csv /
2019-11-02 17:56:16,082 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2019-11-02 17:56:20,865 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2019-11-02 17:56:26,602 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
[root@ipsnode1 data]# hdfs dfs -put UID1548996165_FILE1_1.csv /
2019-11-02 17:56:47,802 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2019-11-02 17:56:52,465 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
[root@ipsnode1 data]# hadoop fs -put UID1548994589_FILE1_4.csv hdfs://ipsnode1:9000/
2019-11-02 18:08:30,811 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2019-11-02 18:08:38,654 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2019-11-02 18:08:47,179 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
[root@ipsnode1 data]# hadoop fs -ls /
Found 3 items
-rw-r–r-- 3 root supergroup 272970904 2019-11-02 18:08 /UID1548994589_FILE1_4.csv
-rw-r–r-- 3 root supergroup 212817771 2019-11-02 17:56 /UID1548996165_FILE1_1.csv
-rw-r–r-- 3 root supergroup 302335130 2019-11-02 17:56 /UID1548997664_FILE1_1.csv
[root@ipsnode1 data]# hdfs dfs -ls /
Found 3 items
-rw-r–r-- 3 root supergroup 272970904 2019-11-02 18:08 /UID1548994589_FILE1_4.csv
-rw-r–r-- 3 root supergroup 212817771 2019-11-02 17:56 /UID1548996165_FILE1_1.csv
-rw-r–r-- 3 root supergroup 302335130 2019-11-02 17:56 /UID1548997664_FILE1_1.csv
hadoop hdfs文件分片分析:
[root@ipsnode1 subdir0]# pwd
/hadoop/hdfs/data/current/BP-1624759233-192.168.100.101-1587376817767/current/finalized/subdir0/subdir0
[root@ipsnode1 subdir0]# ls -lt
total 775696
-rw-r–r--. 1 root root 4535448 Nov 2 19:46 blk_1073741832
-rw-r–r--. 1 root root 35443 Nov 2 19:46 blk_1073741832_1008.meta
-rw-r–r--. 1 root root 134217728 Nov 2 19:46 blk_1073741831 #128M
-rw-r–r--. 1 root root 1048583 Nov 2 19:46 blk_1073741831_1007.meta
-rw-r–r--. 1 root root 134217728 Nov 2 19:45 blk_1073741830 #128M
-rw-r–r--. 1 root root 1048583 Nov 2 19:45 blk_1073741830_1006.meta
-rw-r–r--. 1 root root 33899674 Nov 2 19:45 blk_1073741829
-rw-r–r--. 1 root root 264851 Nov 2 19:45 blk_1073741829_1005.meta
-rw-r–r--. 1 root root 134217728 Nov 2 19:45 blk_1073741828 #128M
-rw-r–r--. 1 root root 1048583 Nov 2 19:45 blk_1073741828_1004.meta
-rw-r–r--. 1 root root 134217728 Nov 2 19:45 blk_1073741827 #128M
-rw-r–r--. 1 root root 1048583 Nov 2 19:45 blk_1073741827_1003.meta
-rw-r–r--. 1 root root 78600043 Nov 2 19:44 blk_1073741826
-rw-r–r--. 1 root root 614071 Nov 2 19:44 blk_1073741826_1002.meta
-rw-r–r--. 1 root root 134217728 Nov 2 19:44 blk_1073741825 #128M
-rw-r–r--. 1 root root 1048583 Nov 2 19:44 blk_1073741825_1001.meta
[root@ipsnode1 /]# tree hadoop
hadoop
|-- hdfs
| |-- data
| | |-- current
| | | |-- BP-1624759233-192.168.100.101-1587376817767
| | | | |-- current
| | | | | |-- finalized
| | | | | | `-- subdir0
| | | | | | `-- subdir0
| | | | | | |-- blk_1073741825 #UID1548996165_FILE1_1.csv
| | | | | | |-- blk_1073741825_1001.meta
| | | | | | |-- blk_1073741826 #UID1548996165_FILE1_1.csv
| | | | | | |-- blk_1073741826_1002.meta
| | | | | | |-- blk_1073741827 #UID1548997664_FILE1_1.csv
| | | | | | |-- blk_1073741827_1003.meta
| | | | | | |-- blk_1073741828 #UID1548997664_FILE1_1.csv
| | | | | | |-- blk_1073741828_1004.meta
| | | | | | |-- blk_1073741829 #UID1548997664_FILE1_1.csv
| | | | | | |-- blk_1073741829_1005.meta
| | | | | | |-- blk_1073741830 #UID1548994589_FILE1_4.csv
| | | | | | |-- blk_1073741830_1006.meta
| | | | | | |-- blk_1073741831 #UID1548994589_FILE1_4.csv
| | | | | | |-- blk_1073741831_1007.meta
| | | | | | |-- blk_1073741832 #UID1548994589_FILE1_4.csv
| | | | | | `-- blk_1073741832_1008.meta
| | | | | |-- rbw
| | | | | `-- VERSION
| | | | |-- scanner.cursor
| | | | `-- tmp
| | | `-- VERSION
| | `-- in_use.lock
| `-- name
| |-- current
| | |-- edits_0000000000000000001-0000000000000000002
| | |-- edits_0000000000000000003-0000000000000000004
| | |-- edits_0000000000000000005-0000000000000000006
| | |-- edits_0000000000000000007-0000000000000000008
| | |-- edits_0000000000000000009-0000000000000000010
| | |-- edits_0000000000000000011-0000000000000000012
| | |-- edits_0000000000000000013-0000000000000000014
| | |-- edits_0000000000000000015-0000000000000000016
| | |-- edits_0000000000000000017-0000000000000000018
| | |-- edits_0000000000000000019-0000000000000000020
| | |-- edits_0000000000000000021-0000000000000000022
| | |-- edits_0000000000000000023-0000000000000000024
| | |-- edits_0000000000000000025-0000000000000000026
| | |-- edits_0000000000000000027-0000000000000000028
| | |-- edits_0000000000000000029-0000000000000000030
| | |-- edits_0000000000000000031-0000000000000000032
| | |-- edits_0000000000000000033-0000000000000000034
| | |-- edits_0000000000000000035-0000000000000000036
| | |-- edits_0000000000000000037-0000000000000000038
| | |-- edits_0000000000000000039-0000000000000000040
| | |-- edits_0000000000000000041-0000000000000000042
| | |-- edits_0000000000000000043-0000000000000000044
| | |-- edits_0000000000000000045-0000000000000000046
| | |-- edits_0000000000000000047-0000000000000000048
| | |-- edits_0000000000000000049-0000000000000000050
| | |-- edits_0000000000000000051-0000000000000000051
| | |-- edits_0000000000000000052-0000000000000000087
| | |-- edits_0000000000000000088-0000000000000000089
| | |-- edits_0000000000000000090-0000000000000000091
| | |-- edits_0000000000000000092-0000000000000000093
| | |-- edits_0000000000000000094-0000000000000000095
| | |-- edits_0000000000000000096-0000000000000000097
| | |-- edits_0000000000000000098-0000000000000000099
| | |-- edits_0000000000000000100-0000000000000000101
| | |-- edits_0000000000000000102-0000000000000000103
| | |-- edits_0000000000000000104-0000000000000000105
| | |-- edits_0000000000000000106-0000000000000000107
| | |-- edits_0000000000000000108-0000000000000000109
| | |-- edits_0000000000000000110-0000000000000000111
| | |-- edits_0000000000000000112-0000000000000000113
| | |-- edits_0000000000000000114-0000000000000000115
| | |-- edits_0000000000000000116-0000000000000000117
| | |-- edits_0000000000000000118-0000000000000000119
| | |-- edits_0000000000000000120-0000000000000000121
| | |-- edits_0000000000000000122-0000000000000000123
| | |-- edits_0000000000000000124-0000000000000000125
| | |-- edits_inprogress_0000000000000000126
| | |-- fsimage_0000000000000000123
| | |-- fsimage_0000000000000000123.md5
| | |-- fsimage_0000000000000000125
| | |-- fsimage_0000000000000000125.md5
| | |-- seen_txid
| | `-- VERSION
| `-- in_use.lock
`-- tmp
|-- dfs
| `-- namesecondary
| |-- current
| | |-- edits_0000000000000000001-0000000000000000002
| | |-- edits_0000000000000000003-0000000000000000004
| | |-- edits_0000000000000000005-0000000000000000006
| | |-- edits_0000000000000000007-0000000000000000008
| | |-- edits_0000000000000000009-0000000000000000010
| | |-- edits_0000000000000000011-0000000000000000012
| | |-- edits_0000000000000000013-0000000000000000014
| | |-- edits_0000000000000000015-0000000000000000016
| | |-- edits_0000000000000000017-0000000000000000018
| | |-- edits_0000000000000000019-0000000000000000020
| | |-- edits_0000000000000000021-0000000000000000022
| | |-- edits_0000000000000000023-0000000000000000024
| | |-- edits_0000000000000000025-0000000000000000026
| | |-- edits_0000000000000000027-0000000000000000028
| | |-- edits_0000000000000000029-0000000000000000030
| | |-- edits_0000000000000000031-0000000000000000032
| | |-- edits_0000000000000000033-0000000000000000034
| | |-- edits_0000000000000000035-0000000000000000036
| | |-- edits_0000000000000000037-0000000000000000038
| | |-- edits_0000000000000000039-0000000000000000040
| | |-- edits_0000000000000000041-0000000000000000042
| | |-- edits_0000000000000000043-0000000000000000044
| | |-- edits_0000000000000000045-0000000000000000046
| | |-- edits_0000000000000000047-0000000000000000048
| | |-- edits_0000000000000000049-0000000000000000050
| | |-- edits_0000000000000000052-0000000000000000087
| | |-- edits_0000000000000000088-0000000000000000089
| | |-- edits_0000000000000000090-0000000000000000091
| | |-- edits_0000000000000000092-0000000000000000093
| | |-- edits_0000000000000000094-0000000000000000095
| | |-- edits_0000000000000000096-0000000000000000097
| | |-- edits_0000000000000000098-0000000000000000099
| | |-- edits_0000000000000000100-0000000000000000101
| | |-- edits_0000000000000000102-0000000000000000103
| | |-- edits_0000000000000000104-0000000000000000105
| | |-- edits_0000000000000000106-0000000000000000107
| | |-- edits_0000000000000000108-0000000000000000109
| | |-- edits_0000000000000000110-0000000000000000111
| | |-- edits_0000000000000000112-0000000000000000113
| | |-- edits_0000000000000000114-0000000000000000115
| | |-- edits_0000000000000000116-0000000000000000117
| | |-- edits_0000000000000000118-0000000000000000119
| | |-- edits_0000000000000000120-0000000000000000121
| | |-- edits_0000000000000000122-0000000000000000123
| | |-- edits_0000000000000000124-0000000000000000125
| | |-- fsimage_0000000000000000123
| | |-- fsimage_0000000000000000123.md5
| | |-- fsimage_0000000000000000125
| | |-- fsimage_0000000000000000125.md5
| | `-- VERSION
| `-- in_use.lock
`-- nm-local-dir
|-- filecache
|-- nmPrivate
`-- usercache
20 directories, 125 files
[root@ipsnode1 /]#
4.3 test mapreduce
[root@ipsnode2 bin]# hadoop fs -cat /input/a.txt
hello test world
test
world
test
world
hello
[root@ipsnode2 bin]#hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar wordcount /input /output2 #output2 must not exist
[root@ipsnode1 hadoop-3.1.3]# hadoop fs -ls /output2
Found 2 items
-rw-r–r-- 3 root supergroup 0 2019-11-06 18:48 /output2/_SUCCESS
-rw-r–r-- 3 root supergroup 45 2019-11-06 18:48 /output2/part-r-00000
[root@ipsnode1 hadoop-3.1.3]# hadoop fs -cat /output2/_SUCCESS
[root@ipsnode1 hadoop-3.1.3]# hadoop fs -cat /output2/part-r-00000
hello 2
test 3
world 3