7. 数据图形展示

本章内容源于原书第五章节

1. 启动HDFS和YARN

root@10049605-ThinkPad-T470-W10DG:~# start-dfs.sh
Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [10049605-ThinkPad-T470-W10DG]
root@10049605-ThinkPad-T470-W10DG:~# start-yarn.sh
Starting resourcemanager
Starting nodemanagers
root@10049605-ThinkPad-T470-W10DG:~# 

2. Mapeduce例子

2.1. 把input文件放到HDFS中

图片.png
root@10049605-ThinkPad-T470-W10DG:/# hdfs dfs -mkdir data
mkdir: `data': No such file or directory
root@10049605-ThinkPad-T470-W10DG:/# hdfs dfs -mkdir /user/root
mkdir: `/user/root': No such file or directory
root@10049605-ThinkPad-T470-W10DG:/# hdfs dfs -mkdir /user
root@10049605-ThinkPad-T470-W10DG:/# hdfs dfs -mkdir /user/root
root@10049605-ThinkPad-T470-W10DG:/# hdfs dfs -mkdir data
root@10049605-ThinkPad-T470-W10DG:/# hdfs dfs -mkdir data/weblogs
root@10049605-ThinkPad-T470-W10DG:/# hdfs dfs -copyFromLocal /home/yay/下载/hcb/chapter4/resources data/weblogs
root@10049605-ThinkPad-T470-W10DG:/# hdfs dfs -ls data/weblogs
Found 1 items
drwxr-xr-x   - root supergroup          0 2019-01-12 23:25 data/weblogs/resources
root@10049605-ThinkPad-T470-W10DG:/# hdfs dfs -ls data/weblogs/resources
Found 1 items
-rw-r--r--   1 root supergroup      10851 2019-01-12 23:25 data/weblogs/resources/NASA_log_sample.txt
root@10049605-ThinkPad-T470-W10DG:/# 

2.2. 执行Mapeduce

root@10049605-ThinkPad-T470-W10DG:~# hadoop jar /home/yay/eclipse-workspace/countwords/target/countwords-0.0.1-SNAPSHOT.jar chapter5.weblog.MsgSizeAggregateMapReduce data/weblogs/resources data/msgsize-out

root@10049605-ThinkPad-T470-W10DG:~# hdfs dfs -cat data/msgsize-out/part*
Mean    15195
Max 305722
Min 0
root@10049605-ThinkPad-T470-W10DG:~# 

也可以copy到本地查看:

root@10049605-ThinkPad-T470-W10DG:~# hdfs dfs -copyToLocal data/msgsize-out /home/yay/

root@10049605-ThinkPad-T470-W10DG:~# cat /home/yay/msgsize-out/*
Mean    15195
Max 305722
Min 0

2.3. 执行另外一个Mapeduce

root@10049605-ThinkPad-T470-W10DG:~# hadoop jar /home/yay/eclipse-workspace/countwords/target/countwords-0.0.1-SNAPSHOT.jar chapter5.weblog.HitCountMapReduce  data/weblogs/resources data/hit-count-out

root@10049605-ThinkPad-T470-W10DG:~# hdfs dfs -cat data/hit-count-out/part*

2.4 以2.3输出作为输入的Mapeduce

root@10049605-ThinkPad-T470-W10DG:~# hadoop jar /home/yay/eclipse-workspace/countwords/target/countwords-0.0.1-SNAPSHOT.jar chapter5.weblog.FrequencyDistributionMapReduce  data/hit-count-out data/freq-dist-out

3. 安装gnuplot plotting program

yay@10049605-ThinkPad-T470-W10DG:~/下载$ mv gnuplot-5.2.6 /home/yay/software/gnuplot
yay@10049605-ThinkPad-T470-W10DG:~/software$ cd gnuplot
yay@10049605-ThinkPad-T470-W10DG:~/software/gnuplot$ dir
aclocal.m4  configure.ac   GNUmakefile  Makefile.maint  README         VERSION
BUGS        configure.vms  INSTALL  man     RELEASE_NOTES  win
ChangeLog   Copyright      INSTALL.gnu  missing     share
compile     demo       install-sh   mkinstalldirs   src
config      depcomp    m4       NEWS        term
config.hin  docs       Makefile.am  PATCHLEVEL  TODO
configure   FAQ.pdf    Makefile.in  PGPKEYS     tutorial


yay@10049605-ThinkPad-T470-W10DG:~/software/gnuplot$ ./configure --prefix=/home/yay/gnuplot5
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /bin/mkdir -p
checking for gawk... no
........

yay@10049605-ThinkPad-T470-W10DG:~/software/gnuplot$ make
yay@10049605-ThinkPad-T470-W10DG:~/software/gnuplot$ make install

然后在/etc/profile最后加入下面内容:

#gunplot
export GNUPLOT=/home/yay/gnuplot5  
export PATH=$PATH:$GNUPLOT/bin  
export MANPATH=/home/yay/gnuplot5/share/man/man1:$MANPATH 

还需要注意安装X11,否则报错:

gnuplot> set term png
Terminal type is now 'unknown'
^
unknown or ambiguous terminal type; type just 'set terminal' for a list

root@10049605-ThinkPad-T470-W10DG:~/plots# sudo apt-get install gnuplot-x11

4. 绘图

root@10049605-ThinkPad-T470-W10DG:~# hdfs dfs -copyToLocal data/freq-dist-out/part-r-00000 2.dat
root@10049605-ThinkPad-T470-W10DG:~# dir
2.dat
root@10049605-ThinkPad-T470-W10DG:~# cp -r /home/yay/下载/hcb/chapter5/plots . 
root@10049605-ThinkPad-T470-W10DG:~# 
root@10049605-ThinkPad-T470-W10DG:~# cd plots
root@10049605-ThinkPad-T470-W10DG:~/plots# mv ../2.dat .
root@10049605-ThinkPad-T470-W10DG:~/plots# dir
2.dat  httpfreqdist.plot    httphitsvsmsgsize.plot  sendvsreceive.plot
data   httphistbyhour.plot  plot-images
root@10049605-ThinkPad-T470-W10DG:~/plots# mv 2.dat 2.data
root@10049605-ThinkPad-T470-W10DG:~/plots# gnuplot httpfreqdist.plot
root@10049605-ThinkPad-T470-W10DG:~/plots# dir
2.data  freqdist.png       httphistbyhour.plot     plot-images
data    httpfreqdist.plot  httphitsvsmsgsize.plot  sendvsreceive.plot
root@10049605-ThinkPad-T470-W10DG:~/plots# 
图片.png

打开这个文档:


图片.png

5. 绘制另外一个图

root@10049605-ThinkPad-T470-W10DG:~/plots# hadoop jar /home/yay/eclipse-workspace/countwords/target/countwords-0.0.1-SNAPSHOT.jar chapter5.weblog.HistogramGenerationMapReduce  data/weblogs/resources data/histogram-out
图片.png

你可能感兴趣的:(7. 数据图形展示)