Clickhouse+Spark+Flink一体化实时数仓(开源)
(spark默认只有300mb内存占用最大处理200w行数据)
因cdh6已经收费并且不开源。寻找替代产品
以下为最低配置
8核心16g 500g 单机就可以搭建1亿以下数据级别离线+实时数仓。
主要用于配置低的小型大数据项目。也要做数仓的。(例如:学校,单位)
优点:需要配置低,而且全是开源最新,过漏扫方便,查询性能比cdh6强。单机也能实现上亿数据统计。
缺点:无cdh6监控界面,对运维人员提出要求高,原有cdh6开发代码sql需要改动较大。
资源下载地址
https://download.csdn.net/download/qq_37401291/65269276
已正常安装centos7 虚拟机
systemctl stop firewalld
systemctl disable firewalld
iptables -F
关闭所有节点的selinux
vi /etc/selinux/config
将SELINUX=enforcing改为SELINUX=disabled
设置后需要重启才能⽣效
退出保存wq
安裝jdk
mkdir /usr/java
tar -zxvf jdk-8u212-linux-x64.tar.gz -C /usr/java/
chown -R root:root /usr/java/jdk1.8.0_212
echo “export JAVA_HOME=/usr/java/jdk1.8.0_212” >> /etc/profile
echo “export PATH=/usr/java/jdk1.8.0_212/bin:${PATH}” >> /etc/profile
source /etc/profile
which java
安装scala
mkdir /usr/scala
tar -zxvf scala-2.11.12.tgz -C /usr/scala/
echo “export PATH=/usr/scala/scala-2.11.12/bin:${PATH}” >> /etc/profile
source /etc/profile
输入scala弹出如下
Ctrl+c退出
安装mysql5.7.27
先卸载mariadb防止mysql冲突
rpm -qa |grep mariadb
rpm -e --nodeps mariadb-libs-5.5.60-1.el7_5.x86_64
按顺序执行下面五句话安装mysql
rpm -ivh mysql-community-common-5.7.33-1.el7.x86_64.rpm --nodeps --force
rpm -ivh mysql-community-libs-5.7.33-1.el7.x86_64.rpm --nodeps --force
rpm -ivh mysql-community-libs-compat-5.7.33-1.el7.x86_64.rpm
rpm -ivh mysql-community-client-5.7.33-1.el7.x86_64.rpm --nodeps --force
rpm -ivh mysql-community-server-5.7.33-1.el7.x86_64.rpm --nodeps --force
service mysqld status
service mysqld start
service mysqld stop
service mysqld restart
vi /etc/rc.local
添加 service mysqld start
确认mysql初始密码
more /var/log/mysqld.log |grep password
mysql -p
修改密码(大写+小写+特殊字符)
mysql> set password = password(“Mysql_123456”);
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql远程连接配置
grant all privileges on . to root@’%’ identified by “Mysql_123456”;
flush privileges;
Exit; 退出
show variables like ‘character%’;
vim /etc/my.cnf (注意 下面的字段文件内没有时,自己添加)
1、在[mysqld]字段里加入character-set-server=utf8
[client]
default-character-set = utf8
[mysqld]
default-storage-engine = INNODB
character-set-server = utf8
collation-server = utf8_general_ci
搞定!
navicat连接mysql。创建所需的元数据数据库。
Clickhouse数仓搭建(用于替换cdh6中hive数仓功能)
cd /root/software/
rpm -ivh clickhouse-*
输入默认密码
默认用户名default
默认设置密码Mysql_123456
或者直接替换我文件夹里面的文件
users.xml
config.xml
启动 clickhouse
sudo clickhouse start
clickhouse-client -h 192.168.80.131 --port 9000 -u default --password Mysql_123456
spark批量处理搭建(用于替代cdh6中spark)
mkdir /opt/spark
tar xvf spark-2.4.8-bin-hadoop2.7.tgz /opt/spark/
配置spark各种缓存或者集群(略)可以参考
Spark-shell
实时同步clickhouse和mysql(可选)
MaterializeMySQL 数据库引擎
修改my.cnf开启mysql binlog模式
vi /etc/my.cnf
具体网址
https://www.jianshu.com/p/d0d4306411b3?hmsr=toutiao.io
log-bin=mysql-bin # 开启 binlog
binlog-format=ROW # 选择 ROW 模式
server_id=1 # 配置 MySQL replaction 需要定义,不要和 canal >的 slaveId 重复
gtid-mode=ON
enforce-gtid-consistency = ON
clickhouse-client -h 114.132.247.190 --port 9000 -u default --password Mysql_123456
SET allow_experimental_database_materialize_mysql = 1;
CREATE DATABASE scene_mms ENGINE = MaterializeMySQL(‘127.0.0.1:3306’, ‘clickhouse’, ‘root’, ‘Mysql_123456’);