基于cloudera搭建大数据集群(docker)记录

1、安装docker

安装最新稳定版的

# step 1: 安装必要的一些系统工具
sudo yum install -y yum-utils device-mapper-persistent-data lvm2
# Step 2: 添加软件源信息
sudo yum-config-manager --add-repo https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
# Step 3: 更新并安装Docker-CE
sudo yum makecache fast
sudo yum -y install docker-ce
# Step 4: 开启Docker服务
sudo service docker start

安装指定版

#step 1:查看仓库中docker的版本
yum list docker-ce.x86_64 --showduplicates | sort -r
#step 2:安装指定版本docker
yum install -y docker-ce-18.09.9 docker-ce-cli-18.09.9 containerd.io

2、基于docker搭建cloudera(sudo权限)

#step1:拉取cloudera镜像
sudo docker pull cloudera/quickstart:latest
    #如果pull的过程过于缓慢,修改镜像源,
    #在 /etc/docker/daemon.json 文件中添加以下参数(没有该文件则新建):

    {
          "registry-mirrors": ["https://9cpn8tt6.mirror.aliyuncs.com"]
    }

    #服务重启:
    systemctl daemon-reload
    systemctl restart docker
    
#step2: 创建容器
sudo docker run -t -i -d 
--name cdh 
--hostname=quickstart.cloudera 
--privileged=true 
-v /data/CDH:/src 
-p 8020:8020 -p 8022:8022 -p 7180:7180 -p 21050:21050 -p 50070:50070 -p 50075:50075 -p 50010:50010 -p 50020:50020 -p 8890:8890 -p 60010:60010 -p 10002:10002 -p 25010:25010 -p 25020:25020 -p 18088:18088 -p 8088:8088 -p 19888:19888 -p 7187:7187 -p 11000:11000 -p 8888:8888 cloudera/quickstart 
/bin/bash -c '/usr/bin/docker-quickstart'

其中

Option Description
–hostname=quickstart.cloudera Required: Pseudo-distributed configuration assumes this hostname.容器主机名(/etc/hosts中指定hostname)。
–privileged=true Required: For HBase, MySQL-backed Hive metastore, Hue, Oozie, Sentry, and Cloudera Manager.这是Hbase组件需要的模式。
-t Required: Allocate a pseudoterminal. Once services are started, a Bash shell takes over. This switch starts a terminal emulator to run the services.
-i Required: If you want to use the terminal, either immediately or connect to the terminal later.
-p 8888

Recommended: Map the Hue port in the guest to another port on the host.端口映射参数。

格式:-p 8888:8888,:左侧端口为本机端口,:右侧为docker集群端口

-p [PORT] Optional: Map any other ports (for example, 7180 for Cloudera Manager, 80 for a guided tutorial).
-d Optional: Run the container in the background.容器后台启动。
–name 容器的名字
-v host_path:container_path 主机上目录挂载到容器中目录上,主机上该放入任何东西,Docker容器中对于目录可以直接访问。

CDH端口汇总

service name parameter port number
HBase REST Server Port hbase.rest.port 20550
HBase REST Server Web UI Port hbase.rest.info.port 8085
HBase Thrift Server Port hbase.regionserver.thrift.port 9090
HBase Thrift Server Web UI Port hbase.thrift.info.port 9095
HBase Master Port hbase.master.port 60000
HBase Master Web UI Port hbase.master.info.port 60010
HBase RegionServer Port hbase.regionserver.port 60020
HBase RegionServer Web UI port hbase.regionserver.info.port 60030
DataNode Protocol Port dfs.datanode.ipc.address 50020
DataNode Transceiver Port dfs.datanode.address 50010
DataNode HTTP Web UI Port dfs.datanode.http.address 50075
Secure DataNode Web UI Port (TLS/SSL) dfs.datanode.https.address 50475
REST Port hdfs.httpfs.http.port 14000
Administration Port hdfs.httpfs.admin.port 14001
JournalNode RPC Port dfs.journalnode.rpc-address 8485
JournalNode HTTP Port dfs.journalnode.http-address 8480
Secure JournalNode Web UI Port (TLS/SSL) dfs.journalnode.https-address 8481
NFS Gateway Server Port nfs3.server.port 2049
NFS Gateway MountD Port nfs3.mountd.port 4242
Portmap (or Rpcbind) Port - 111
NameNode Port fs.default.name, fs.defaultFS 8020
NameNode Service RPC Port dfs.namenode.servicerpc-address 8022
NameNode Web UI Port dfs.http.address, dfs.namenode.http-address 50070
Secure NameNode Web UI Port (TLS/SSL) dfs.https.port 50470
SecondaryNameNode Web UI Port dfs.secondary.http.address, dfs.namenode.secondary.http-address 50090
Secure SecondaryNameNode Web UI Port (TLS/SSL) dfs.secondary.https.port 50495
HBase Indexer HTTP Port hbaseindexer.http.port 11060
Solr HTTP Port solr_http_port 8983
Solr Admin Port - 8984
Solr HTTPS port solr_https_port 8985
Client Port clientPort 2181
Quorum Port - 3181
Election Port - 4181
JMX Remote Port - 9010

在创建容器的时候,如果run后有error,名字会被占用,需要remove掉已创建的container后重新run

#查看当前已启动的container
docker ps -a 
#rm掉选择的container
docker rm container_id

3、开启cloudera manager

#启动的cdh
sudo docker start CONTAINER_ID
#进入已启动的cdh container
sudo docker exec -it CONTAINER_ID /bin/bash
# [root@quickstart /] #
#运行cloudera-manager
sudo /home/cloudera/cloudera-manager --force --enterpise

启动后可通过浏览器访问:IP:7180,其中7180为cloudera-manager的端口,连接后username:cloudera,passwd:cloudera

如图:

基于cloudera搭建大数据集群(docker)记录_第1张图片

启动集群组件服务:HDFS、Hive、Hue、Yarn等

基于cloudera搭建大数据集群(docker)记录_第2张图片

4、在客户端测试组件使用

创建test.py文件

from hdfs.client import Client
client = Client("http://192.168.31.3:50070", root="/", timeout=100)
print(client.list("/"))

返回hdfs系统中的路径

5、安装kafka

https://blog.csdn.net/nevergiveup54/article/details/50545020

 

你可能感兴趣的:(大数据)