1、安装Docker ToolBox
参考1:https://www.cnblogs.com/weschen/p/6697926.html
Oracle VirtualBox、Git和Kitematic可选,若没有玩过微软自带的Hyper-V,则不会有相应设置
2、Docker ToolBox优化配置
2.1 设置ali镜像:
运行Git Bash,进入终端
xxxxMINGW64~
$ docker-machine ssh default
$ sudo sed -i "s|EXTRA_ARGS='|EXTRA_ARGS='--registry-mirror=【此处为你的加速地址】 |g" /var/lib/boot2docker/profile
$ exit
$ docker-machine restart default
重启VirtualBox虚拟机
启动Docker Quickstart Terminal
2.2 更换虚拟硬盘位置
打开Oracle VM VirtualBox,全局工具->释放(虚拟机已启动的情况下不能移动)/移动(虚拟机停止时可移动),再到虚拟机设置里:存储->添加存储附件->添加虚拟硬盘
3 拉镜像,起容器
参考2:https://blog.csdn.net/yangym2002/article/details/79000241
4 安装配置hadoop集群
从裸镜像安装配置环境参考3:https://blog.51cto.com/865516915/2150651
该方法充分利用了镜像的封装可重用的特点
直接拉Hadoop镜像安装配置hadoop集群参考4:https://www.jianshu.com/p/30604c820e9d
5 避坑提示
5.1 容器报错,加上 -itd
docker run --name hadoop2--add-host ..................... 加上-itd参数启动容器后再使用docker attach进入容器
反正hadoop需要使用ssh,最好使用 /usr/sbin/sshd -D 启动容器,再是用ssh client工具进入容器进行管理
若使用/bin/bash起容器,则使用docker attach进入容器后需要手动启动sshd
5.2 从零安装后文件夹瘦身
使用Dockerfile制作的镜像1.59G,纯手工安装制作镜像1.04G
将参考3中的Dockerfile指令手动执行,handoop安装配置好后只需制作一个镜像
hadoop里面的share/doc目录占用空间较大,可删除
使用jdk而不是jre,jdk里面的两个src包可以删除掉以减少空间
java环境变量和hadoop环境变量要放到.bashrc里面去
5.3 执行测试报关于maxiummem相关的错
给虚拟机内存分配大点
yarn配置调优参考5:https://blog.csdn.net/stark_summer/article/details/48494391
5.4 Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster
参考6:https://www.cnblogs.com/tele-share/p/9497613.html
6 测试成功
[root@966316f84c04 mapreduce]# hadoop jar hadoop-mapreduce-examples-3.2.0.jar wordcount /input /output1
2019-06-24 09:49:01,834 INFO client.RMProxy: Connecting to ResourceManager at hadoop-master/172.17.0.2:8032
2019-06-24 09:49:03,201 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_1561369728889_0001
2019-06-24 09:49:04,593 INFO input.FileInputFormat: Total input files to process : 1
2019-06-24 09:49:05,098 INFO mapreduce.JobSubmitter: number of splits:1
2019-06-24 09:49:05,216 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
2019-06-24 09:49:05,557 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1561369728889_0001
2019-06-24 09:49:05,560 INFO mapreduce.JobSubmitter: Executing with tokens: []
2019-06-24 09:49:05,864 INFO conf.Configuration: resource-types.xml not found
2019-06-24 09:49:05,865 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2019-06-24 09:49:06,628 INFO impl.YarnClientImpl: Submitted application application_1561369728889_0001
2019-06-24 09:49:06,729 INFO mapreduce.Job: The url to track the job: http://hadoop-master:8088/proxy/application_1561369728889_0001/
2019-06-24 09:49:06,731 INFO mapreduce.Job: Running job: job_1561369728889_0001
2019-06-24 09:49:20,084 INFO mapreduce.Job: Job job_1561369728889_0001 running in uber mode : false
2019-06-24 09:49:20,092 INFO mapreduce.Job: map 0% reduce 0%
2019-06-24 09:49:30,281 INFO mapreduce.Job: map 100% reduce 0%
2019-06-24 09:49:37,356 INFO mapreduce.Job: map 100% reduce 100%
2019-06-24 09:49:38,385 INFO mapreduce.Job: Job job_1561369728889_0001 completed successfully
2019-06-24 09:49:38,525 INFO mapreduce.Job: Counters: 54
File System Counters
FILE: Number of bytes read=1831
FILE: Number of bytes written=447569
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1468
HDFS: Number of bytes written=1301
HDFS: Number of read operations=8
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
HDFS: Number of bytes read erasure-coded=0
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=6141
Total time spent by all reduces in occupied slots (ms)=4617
Total time spent by all map tasks (ms)=6141
Total time spent by all reduce tasks (ms)=4617
Total vcore-milliseconds taken by all map tasks=6141
Total vcore-milliseconds taken by all reduce tasks=4617
Total megabyte-milliseconds taken by all map tasks=6288384
Total megabyte-milliseconds taken by all reduce tasks=4727808
Map-Reduce Framework
Map input records=31
Map output records=179
Map output bytes=2050
Map output materialized bytes=1831
Input split bytes=107
Combine input records=179
Combine output records=131
Reduce input groups=131
Reduce shuffle bytes=1831
Reduce input records=131
Reduce output records=131
Spilled Records=262
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=148
CPU time spent (ms)=1120
Physical memory (bytes) snapshot=324399104
Virtual memory (bytes) snapshot=5145825280
Total committed heap usage (bytes)=170004480
Peak Map Physical memory (bytes)=211697664
Peak Map Virtual memory (bytes)=2569519104
Peak Reduce Physical memory (bytes)=112701440
Peak Reduce Virtual memory (bytes)=2576306176
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1361
File Output Format Counters
Bytes Written=1301