mac 10.14 大数据环境搭建

安装包版本列表(2019.4.17)

名称 版本号
zookeeper 3.4.13
hadoop 3.1.2
flume 1.9.0
hbase 1.2.9
hive 3.1.1
kafka 2.1.1
sqoop 1.4.6_1
storm 1.2.2
mysql 8.0.15

 

jdk安装(略)(1.8)
安装zookeeper(3.4.13)


环境变量
conf目录下:

cp zoo_sample.cfg zoo.cfg


修改zoo.cfg

dataDir=/usr/local/Cellar/zookeeper/3.4.13/tmp
dataLogDir=/usr/local/Cellar/zookeeper/3.4.13/logs



创建tmp和logs
bin下zkServer.sh start启动 zkServer.sh status查看启动状态

hadoop伪分布式安装(3.1.2)

配置环境变量:$HADOOP_HOME=… ${HADOOP_HOME}/bin

source ~/.profile


配置免密登录

ssh-keygen
ssh-copy-id [email protected]



关闭防火墙
配置hadoop-env.sh

export JAVA_HOME="/Library/Java/JavaVirtualMachines/jdk1.8.0_201.jdk/Contents/Home"

配置yarn-env.sh

JAVA_HOME="/Library/Java/JavaVirtualMachines/jdk1.8.0_201.jdk/Contents/Home"


配置core-site.xml


  
  
    fs.defaultFS
    hdfs://localhost:9000
  
  
  
    hadoop.tmp.dir
    /usr/local/Cellar/hadoop/hdfs/tmp/hadoop-${user.name}
  



配置hdfs-site.xml



        dfs.replication
        1



配置yarn-site.xml

 
    yarn.nodemanager.aux-services 
    mapreduce_shuffle 
 
 
    yarn.nodemanager.aux-services.mapreduce.shuffle.class 
    org.apache.hadoop.mapred.ShuffleHandler 
 
 
    yarn.resourcemanager.address 
    127.0.0.1:8032 
 
 
    yarn.resourcemanager.scheduler.address 
    127.0.0.1:8030 
 
 
    yarn.resourcemanager.resource-tracker.address 
    127.0.0.1:8031 



配置 mapred-site.xml

 
    
    mapreduce.framework.name 
    yarn 


hadoop version 测试是否安装成功
在bin目录下执行,提示"……has been successfully formatted" 等字样出现即说明格式化成功

hdfs namenode -format


在sbin启动
start-all.sh
jps查看有5个进程SecondaryNameNode、DataNode、NodeManager、ResourceManager、NameNode表示成功
访问http://localhost:9870(旧版hadoop2.x使用端口50070)

安装mysql(8.0.15)

 

sudo apt-get install mysql-server

sudo apt-get install mysql-client

sudo apt-get install libmysqlclient-dev



检查是否安装成功:

linux

sudo netstat -tap | grep mysql

mac

lsof -i:端口号

#查看所有监听的端口
sudo lsof | grep mysql


登录验证:

mysql -u root -p


启动/关闭/重启mysql

service mysql start

service mysql stop

service mysql restart



运行上面命令,其实是service命令去找/etc/init.d下的相关的mysql脚本去执行启动、关闭动作。

安装Hive(3.1.1)


环境变量
/conf目录下:

cp hive-env.sh.template hive-env.sh
cp hive-default.xml.template hive-site.xml
cp hive-log4j2.properties.template hive-log4j2.properties
cp hive-exec-log4j2.properties.template hive-exec-log4j2.properties



在hdfs目录下建立三个文件,用来存放hive信息,并赋予777权限

注意:必须开启hdfs服务,不然报错

创建以下目录均可在localhost://9870中目录栏utility/browse the dictionary中找到

hdfs dfs -mkdir -p /user/hive/warehouse
hdfs dfs -mkdir -p /user/hive/tmp
hdfs dfs -mkdir -p /user/hive/log
hdfs dfs -chmod -R 777 /user/hive/warehouse
hadoop fs -chmod 777 /user/hive/tmp
hdfs dfs -chmod -R 777 /user/hive/tmp 
hdfs dfs -chmod -R 777 /user/hive/log



修改hive-env.sh文件

export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_201.jdk/Contents/Home
export HADOOP_HOME=/usr/local/Cellar/hadoop/3.1.2/libexec
export HIVE_HOME=/usr/local/Cellar/hive/3.1.1/libexec
export HIVE_CONF_DIR=/usr/local/Cellar/hive/3.1.1/libexec/conf



修改hive-site.xml文件



 
    
        hive.exec.scratchdir
        /usr/hive/tmp
        HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/<username> is created, with ${hive.scratch.dir.permission}.
    

    
        hive.metastore.warehouse.dir
        /usr/hive/warehouse
        location of default database for the warehouse                  
    

    
        hive.querylog.location
        /usr/hive/log
        Location of Hive run time structured log file
    

    
        javax.jdo.option.ConnectionURL
        jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true&characterEncoding=UTF-8&useSSL=false
        
      JDBC connect string for a JDBC metastore.
      To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
      For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
    

  
    javax.jdo.option.ConnectionDriverName
    com.mysql.jdbc.Driver
  

  
    javax.jdo.option.ConnectionUserName
    root
  

  
    javax.jdo.option.ConnectionPassword
    root
  



hive目录下创建tmp文件
将mysql-connector-java-5.1.46-bin.jar放入lib目录下
初始化hive,在hive2.0以后的版本,初始化命令都是:

schematool -dbType mysql -initSchema

注意:如果原先已有mysql强烈建议卸载后用brew再安装,不然会报如下错误:

failed to get schema version 或者 access denied for user

如果遇到上述错误网上建议的修改权限方法非但没有用,还有可能弄崩mysql,之后只能重装了~


或者
在mysql命令下执行
#创建数据库

mysql> create database hive;
#赋予访问权限
mysql> grant all privileges on hive.* to root@localhost identified by '密码' with grant option;
mysql> flush privileges;



bin目录下执行hive启动

安装Flume(1.9.0)
配置环境变量
简单示例:在conf文件中 cp flume-conf.properties.template flume-conf.properties
vim flume-conf.properties 如下:删除所有内容并添加

a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = avro
a1.sources.r1.bind = 0.0.0.0
a1.sources.r1.port = 44444

a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://localhost:9000/test
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.rollInterval = 0
a1.sinks.k1.hdfs.rollSize = 0
a1.sinks.k1.hdfs.rollCount = 1000

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1



启动flume:

flume-ng agent --conf ../conf --conf-file /usr/local/Cellar/flume/1.9.0/libexec/conf/flume-conf.properties --name a1 -Dflume.root.logger=INFO,console



创建一个test.txt写些内容文件用于测试
测试:

./flume-ng avro-client --conf /usr/local/Cellar/flume/1.9.0/libexec/conf --host 0.0.0.0 --port 44444 --filename ../../test.txt



查看flume的进程

linux

ps -aux | grep flume

mac

sudo lsof | grep flume


监听文件并向hdfs和kafka写数据配置

a1.sources=r1
a1.sinks=fs kfk
a1.channels=c1 c2

a1.sources.r1.type=exec
a1.sources.r1.command=tail -F /home/tellhow-iot2/doc/test.log
a1.sources.r1.interceptors=i1
a1.sources.r1.interceptors.i1.type=org.apache.flume.interceptor.TimestampInterceptor$Builder
a1.sources.r1.selector.type=replicating 

a1.sinks.kfk.type=org.apache.flume.sink.kafka.KafkaSink
a1.sinks.kfk.topic=test
a1.sinks.kfk.brokerList=localhost:9092
#接收数据安全机制0 1 -1
a1.sinks.kfk.requiredAcks=1
#a1.sinks.kfk.batchSize = 2
a1.sinks.kfk.serializer.class=kafka.serializer.StringEncoder

a1.sinks.fs.type=hdfs
#%y-%m-%d/%H%M/
a1.sinks.fs.hdfs.path=hdfs://localhost:9000/source/%y-%m-%d/
# 文件的命名, 前缀
#a1.sinks.k1.hdfs.filePrefix = events-
# 文件的命名, 后缀
a1.sinks.k1.hdfs.fileSuffix=.log
# 临时文件名前缀inUsePrefix,临时文件名后缀inUseSuffix
# 10 分钟就改目录,生成新的目录 2018-11-20/1010  2018-11-20/1020  2018-11-20/1030
#a1.sinks.k1.hdfs.round = true
#a1.sinks.k1.hdfs.roundValue = 10
#a1.sinks.k1.hdfs.roundUnit = minute
#压缩格式gzip, bzip2, lzo, lzop, snappy
#a1.sinks.fs.hdfs.codeC = gzip
# 时间:每3s滚动生成一个新的文件 0表示不使用时间来滚动

a1.sinks.fs.hdfs.rollInterval=0
#空间: 文件滚动的大小限制(bytes) 当达到500b是滚动生成新的文件,默认128M
a1.sinks.fs.hdfs.rollSize=0
#写入多少个event数据后滚动文件(事件个数),滚动生成新的文件
a1.sinks.fs.hdfs.rollCount=0

#5个事件就开始往里面写入
#a1.sinks.k1.hdfs.batchSize = 5
#用本地时间格式化目录
a1.sinks.k1.hdfs.useLocalTimeStamp=flase
#下沉后, 生成的文件类型,默认是Sequencefile,可用DataStream,则为普通文本
a1.sinks.k1.hdfs.fileType=DataStream
#最大允许打开的HDFS文件数,当打开的文件数达到该值,最早打开的文件将会被关闭
a1.sinks.k1.hdfs.maxOpenFiles=5000
#HDFS副本数,写入 HDFS 文件块的最小副本数, 该参数会影响文件的滚动配置,一般将该参数配置成1,才可以按照配置正确滚动文件
#a1.sinks.k1.hdfs.minBlockReplicas = 1
#默认值:10000,执行HDFS操作的超时时间(单位:毫秒);callTimeout
#threadsPoolSize:默认值:10,hdfs sink 启动的操作HDFS的线程数。
#rollTimerPoolSize:默认值:1,hdfs sink 启动的根据时间滚动文件的线程数。

a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100

a1.channels.c2.type=memory
a1.channels.c2.capacity=1000
a1.channels.c2.transactionCapacity=100

a1.sources.r1.channels=c1 c2
a1.sinks.fs.channel=c1
a1.sinks.kfk.channel=c2


安装kafka(2.11-2.1.1)
kafka目录下创建日志路径

mkdir logs


vim config/server.properties修改配置文件中21、31、36和60行:

broker.id=1
listeners=PLAINTEXT://localhost:9092
advertised.listeners=PLAINTEXT://localhost:9092
log.dirs=/usr/local/Cellar/kafka/2.1.1/libexec/logs



启动zookeeper验证

bin/kafka-server-start.sh config/server.properties



创建主题

bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test



查看列表

bin/kafka-topics.sh --list --zookeeper localhost:2181



生产者

./kafka-console-producer.sh --broker-list localhost:9092 --topic test


消费者

./kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning


查看Topic消息

bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic test



第一行给出了所有分区的摘要,每个附加行给出了关于一个分区的信息。 由于我们只有一个分区,所以只有一行。
“Leader”: 是负责给定分区的所有读取和写入的节点。 每个节点将成为分区随机选择部分的领导者。
“Replicas”: 是复制此分区日志的节点列表,无论它们是否是领导者,或者即使他们当前处于活动状态。
“Isr”: 是一组“同步”副本。这是复制品列表的子集,当前活着并被引导到领导者。
启动kafka:

bin/kafka-server-start.sh config/server.properties &



安装storm(先要有jdk、zookeeper)(1.2.2)
修改storm.yaml

storm.zookeeper.servers:
    - "127.0.0.1"
storm.zookeeper.port: 2181
supervisor.slots.ports:
    - 6700
    - 6701
    - 6702
    - 6703
    - 6704
storm.local.dir: "/usr/local/Cellar/storm/1.2.2/data"
nimbus.seeds: ["127.0.0.1"]


启动storm:
启动niumbus

./storm nimbus >> /usr/local/Cellar/storm/1.2.2/logs/nimbus.out 2>&1 &
tail -f /usr/local/Cellar/storm/1.2.2/logs/nimbus.log



启动UI

./storm ui>> /usr/local/Cellar/storm/1.2.2/logs/ui.out 2>&1 &
tail -f /usr/local/Cellar/storm/1.2.2/logs/ui.log



启动supervisor

./storm supervisor >> /usr/local/Cellar/storm/1.2.2/logs/supervisor.out 2>&1 &
tail -f /usr/local/Cellar/storm/1.2.2/logs/supervisor.log



启动logviewer

./storm logviewer>> /usr/local/Cellar/storm/1.2.2/logs/logviewer.out 2>&1 &
tail -f /usr/local/Cellar/storm/1.2.2/logs/logviewer.log



验证:浏览器打开webUI,http://localhost:8080


*当正式运行项目导入jar包之后,启动topology

./bin/storm jar /usr/local/Cellar/storm/1.2.2/libexec/examples/storm-starter/storm-starter-topologies-0.9.5.jar storm.starter.WordCountTopologywordcount


安装hbase

在"/usr/local/Cellar/hbase/1.2.9"下面新建文件夹如下

/hadoop/pids

/hbasetmp

/zookeepertmp


修改conf下的hbase-env.sh:

export JAVA_HOME="/Library/Java/JavaVirtualMachines/jdk1.8.0_201.jdk/Contents/Home"

export HBASE_PID_DIR="/usr/local/Cellar/hbase/1.2.9/hadoop/pids"



修改conf下的hbase-site.xml:


    
            hbase.rootdir
            /usr/local/Cellar/hbase/1.2.9/hbasetmp
    
    
            hbase.zookeeper.property.dataDir
            /usr/local/Cellar/hbase/1.2.9/zookeepertmp
    
    
            hbase.cluster.distributed
            true
      


环境变量
启动:

start-hbase.sh

hbase shell



sqoop安装(1.4.6_1)
环境变量
conf目录下cp sqoop-env-template.sh sqoop-env.sh

修改:

export HADOOP_COMMON_HOME=/usr/local/Cellar/hadoop/3.1.2
export HADOOP_MAPRED_HOME=/usr/local/Cellar/hadoop/3.1.2
export HIVE_HOME=/usr/local/Cellar/hive/3.1.1
export HBASE_HOME=/usr/local/Cellar/hbase/1.2.9



lib放入mysql-connect-java-5.1.41.jar
将sqoop-1.4.4.jar放到hadoop中的/share/hadoop/mapreduce/的lib目录下

sqoop命令:
mysql创建表后缀

create table stu2 (`id` varchar(20),`name` varchar(20)) ENGINE=InnoDB DEFAULT CHARSET=utf8;


#sqoop导出hive的数据到mysql

sqoop export --connect jdbc:mysql://localhost:3306/test --username root --password root --table stu1 --export-dir '/user/hive/warehouse/parkdb.db/stu1' --fields-terminated-by '\t';



#从mysql导出到hive

sqoop import --connect jdbc:mysql://localhost:3306/test --username root --password root --table kafka --hive-import --create-hive-table --hive-table parkdb.stu2 -m 1



各别启动项:

zookeeper/3.4.13/libexec/bin/zkServer.sh start

hadoop/3.1.2/libexec/sbin/start-all.sh

kafka/2.1.1/libexec/bin/kafka-server-start.sh ./kafka/2.1.1/config/libexec/server.properties &

kafka/2.1.1/libexec/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning

flume-ng agent --conf /usr/local/Cellar/flume/1.9.0/libexec/conf --conf-file /usr/local/Cellar/flume/1.9.0/libexec/conf/bridge03.properties --name a1 -Dflume.root.logger=INFO,console



删除kafka主题

kafka/2.1.1/bin/kafka-topics.sh --delete --zookeeper localhost:2181 --topic test


间隔1s循环shell插入文件

#!/bin/sh
function rand(){
    min=$1
    max=$(($2-$min+1))
    num=$(date +%s%N)
    echo $(($num%$max+$min))
}
for i in {1..100};
do
  random = $(rand 1 100)
  echo "$random $random $random $random $random 2018-11-$i" >> /home/tellhow-iot2/doc/test.log;
  sleep 1;
done;

 

安装错误解决方案:

1.ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2)

 

先检查/usr/etc/my.cnf,可以选择添加

[client]
port = 3306
socket = /tmp/mysql.sock
default-character-set = utf8

[mysqld]
collation-server = utf8_unicode_ci
character-set-server = utf8
init-connect ='SET NAMES utf8'
max_allowed_packet = 64M
bind-address = 127.0.0.1
port = 3306
socket = /tmp/mysql.sock
innodb_file_per_table=1

[mysqld_safe]
timezone = '+0:00'

然后修改.bash_profile

export PATH=$PATH:/usr/local/Cellar/mysql/8.0.15/bin

接着去/usr/local/var设置权限如下(关键步骤)

sudo chmod -R 777 mysql

最后去/usr/local/Cellar/mysql/8.0.15/bin下面启动

mysql.server start

 

你可能感兴趣的:(mac 10.14 大数据环境搭建)