创建MySQL实例:
设置 SequoiaSQL-MySQL 实例程序权限为可执行:chmod +x sequoiasql-mysql-3.4-linux_x86_64-installer.run
创建myinst实例:/opt/sequoiasql/mysql/bin/sdb_sql_ctl addinst myinst -D database/3306/
配置参数有三种修改方式:
/opt/sequoiasql/mysql/bin/mysql -h 127.0.0.1 -P 3306 -u root
db.list(SDB_LIST_COLLECTIONS);
创建postgreSQL实例:
创建PostgreSQL实例:/opt/sequoiasql/postgresql/bin/sdb_sql_ctl addinst myinst -D database/5432
启动PostgreSQL实例:/opt/sequoiasql/postgresql/bin/sdb_sql_ctl start myinst
检查创建的实例状态:/opt/sequoiasql/postgresql/bin/sdb_sql_ctl status
进入SequoiaDB:
创建company_domain逻辑域:db.createDomain("company_domain", [ "group1", "group2", "group3" ], { AutoSplit: true } );
创建company集合空间:db.createCS("company", { Domain: "company_domain" } );
创建employee集合:db.company.createCL("employee", {"ShardingKey": { "_id": 1}, "ShardingType": "hash", "ReplSize": -1, "Compressed": true, "CompressionType": "lzw", "AutoSplit": true, "EnsureShardingIndex": false } );
插入数据:
db.company.employee.insert( { "empno": 1, "ename": "Georgi", "age": 48 } );
db.company.employee.insert( { "empno": 2, "ename": "Bezalel", "age": 21 } );
创建company数据库:/opt/sequoiasql/postgresql/bin/sdb_sql_ctl createdb company myinst
进入PostgreSQL shell:/opt/sequoiasql/postgresql/bin/psql -p 5432 company
加载SequoiaDB连接驱动:create extension sdb_fdw
配置与SequoiaDB连接参数:CREATE SERVER sdb_server FOREIGN DATA WRAPPER sdb_fdw OPTIONS (address '127.0.0.1', service '11810', preferedinstance 'A', transaction 'on');
CREATE FOREIGN TABLE employee
(
empno INTEGER,
ename TEXT,
age INTEGER
) SERVER sdb_server
OPTIONS (collectionspace 'company', collection 'employee', decimal 'on');
ANALYZE employee;
创建SparkSQL实例
连接MySQL实例:/opt/sequoiasql/mysql/bin/mysql -h 127.0.0.1 -P 3306 -u root
创建metauser用户:CREATE USER 'metauser'@'%' IDENTIFIED BY 'metauser';
给metauser用户授权:GRANT ALL ON *.* TO 'metauser'@'%';
刷新权限:FLUSH PRIVILEGES;
创建元数据库:CREATE DATABASE metastore CHARACTER SET 'latin1' COLLATE 'latin1_bin';
执行 ssh-keygen 生成公钥和密钥,执行后连续回车即可:ssh-keygen -t rsa
执行 ssh-copy-id,把公钥拷贝到本机的 sdbadmin 用户:ssh-copy-id sdbadmin@sdbserver1
进行Spark的配置目录:cd /opt/spark-2.4.4-bin-hadoop2.7/conf
从模板中拷贝spark-env.sh文件:cp spark-env.sh.template spark-env.sh
设置Spark实例的Master:echo "SPARK_MASTER_HOST=sdbserver1" >> spark-env.sh
指定Spark实例的元数据信息存放的数据库信息。
创建设置元数据库配置文件 hive-site.xml
cat > /opt/spark-2.4.4-bin-hadoop2.7/conf/hive-site.xml << EOF
hive.metastore.schema.verification
false
javax.jdo.option.ConnectionURL
jdbc:mysql://localhost:3306/metastore?useSSL=false
JDBC connect string for a JDBC metastore
javax.jdo.option.ConnectionDriverName
com.mysql.jdbc.Driver
Driver class name for a JDBC metastore
javax.jdo.option.ConnectionUserName
metauser
javax.jdo.option.ConnectionPassword
metauser
datanucleus.autoCreateSchema
true
creates necessary schema on a startup if one doesn't exist. set this to false, after creating it once
EOF
检查是否创建成功:cat /opt/spark-2.4.4-bin-hadoop2.7/conf/hive-site.xml
拷贝相关驱动:
用户只要将 SequoiaDB for Spark 连接器 spark-sequoiadb_2.11-3.4.jar 和 SequoiaDB 的 Java 驱动 sequoiadb-driver-3.4.jar 加入 Spark 的 jar 目录即可,另外本示例使用了 MySQL 作为元数据存储数据库,也需要加入 MySQL 的 Java 驱动 mysql-jdbc.jar。
拷贝spark-sequoiadb_2.11-3.4.jar驱动连接器:cp /opt/sequoiadb/spark/spark-sequoiadb_2.11-3.4.jar /opt/spark-2.4.4-bin-hadoop2.7/jars/
拷贝 SequoiaDB 的 java 驱动 sequoiadb-driver-3.4.jar:cp /opt/sequoiadb/java/sequoiadb-driver-3.4.jar /opt/spark-2.4.4-bin-hadoop2.7/jars/
拷贝 MySQL 的 java 驱动 mysql-jdbc.jar:cp /home/sdbadmin/soft/mysql-jdbc.jar /opt/spark-2.4.4-bin-hadoop2.7/jars/
设置Spark日志级别:
由于 Spark 默认日志级别为 INFO ,运行 spark-sql 客户端时会打印大量日志输出屏幕,为了避免这个问题把日志级别改为 ERROR。
拷贝 log4j.properties:cp log4j.properties.template log4j.properties
log4j.properties 中设置日志级别:sed -i 's/log4j.rootCategory=INFO, console/log4j.rootCategory=ERROR, console/g' log4j.properties
检查日志输出配置是否成功:cat log4j.properties
测试SparkSQL实例:
进入 Spark 的安装目录:cd /opt/spark-2.4.4-bin-hadoop2.7
启动 Spark:sbin/start-all.sh
查看 Spark 的 master 和 worker 是否启动完成:jps
启动 spark-sql 客户端:bin/spark-sql
创建company数据库:create database company; use company;
创建映射表:
CREATE TABLE company.employee
(
empno INT,
ename STRING,
age INT
) USING com.sequoiadb.spark OPTIONS (host 'localhost:11810', collectionspace 'company', collection 'employee', username '', password '');
测试运行sql:SELECT AVG(age) FROM company.employee;
PostgreSQL实例访问数据:
创建company数据库:/opt/sequoiasql/postgresql/bin/sdb_sql_ctl createdb company myinst
进入PostgreSQL shell:/opt/sequoiasql/postgresql/bin/psql -p 5432 company
加载SequoiaDB连接驱动:CREATE EXTENSION sdb_fdw;
配置与SequoiaDB连接参数:
CREATE SERVER sdb_server FOREIGN DATA WRAPPER sdb_fdw OPTIONS
(
address '127.0.0.1',
service '11810',
preferedinstance 'A',
transaction 'off'
);
关联存储引擎中的集合:
创建company数据库外表:
CREATE FOREIGN TABLE employee
(
empno INTEGER,
ename TEXT,
age INTEGER
) SERVER sdb_server
OPTIONS (collectionspace 'company', collection 'employee', decimal 'on');
更新表的统计信息:ANALYZE employee;
CREATE TABLE company.employee
(
empno INT,
ename STRING,
age INT
) USING com.sequoiadb.spark OPTIONS (host 'localhost:11810', collectionspace 'company', collection 'employee', username '', password '');
运行测试sql:SELECT AVG(age) FROM company.employee;
创建S3对象存储实例:
配置巨杉数据库事务级别:
设置事务级别及配置为等锁模式:
db.updateConf( { transactionon: true, transisolation: 1, translockwait: true } );
配置SequoiaS3:
进入SequoiaS3程序目录:cd /opt/sequoiadb/tools/sequoias3
配置 SequoiaS3,打开 config 目录中的 application.properties 文件:cat config/application.properties
增加以下内容至 application.properties 文件,配置对外监听端口:echo "server.port=8002" >> /opt/sequoiadb/tools/sequoias3/config/application.properties
增加以下内容至 application.properties 文件,配置 coord 节点的 IP 和端口,可以配置多组并使用逗号分隔:echo "sdbs3.sequoiadb.url=sequoiadb://localhost:11810" >> /opt/sequoiadb/tools/sequoias3/config/application.properties
SequoiaS3配置
启动S3实例:
启动SequoiaS3:./sequoias3.sh start
Note: 如需停止 SequoiaS3 进程,执行 stop -p {port} 停止监听指定端口的 SequoiaS3 进程,或执行 stop -a 停止所有 SequoiaS3 进程:./sequoias3.sh stop -p 8002
操作bucket及文件对象:
在本例中将使用 curl restful 方式来测试 s3 接口。
创建存放数据文件的目录 /home/sdbadmin/s3:mkdir -p /home/sdbadmin/s3
进入目录:cd /home/sdbadmin/s3
创建同sdbbucket:
在 shell 环境中设置变量及相关值:
bucket="sdbbucket"
dateValue=`date -R`
resource="/${bucket}/"
contentType="application/octet-stream"
stringToSign="PUT\n\n\n${dateValue}\n${resource}"
s3Key="ABCDEFGHIJKLMNOPQRST"
s3Secret="abcdefghijklmnopqrstuvwxyz0123456789ABCD"
signature=`echo -en ${stringToSign} | openssl sha1 -hmac ${s3Secret} -binary | base64`
使用 curl 创建一个 sdbbucket:curl -v -X PUT "http://localhost:8002/${bucket}" -H "Host: localhost:8002" -H "Date: ${dateValue}" -H "Authorization: AWS ${s3Key}:${signature}"
获取S3中的sdbbucket信息:
使用 curl 获取上个小节创建的 sdbbucket 的信息:curl -v -X GET "http://localhost:8002" -H "Host: localhost:8002" -H "Date: ${dateValue}" -H "Authorization: AWS ${s3Key}:${signature}"
向桶中写入文件:
查看 /opt/sequoiadb/bin/sdb 文件的信息:ls -l /opt/sequoiadb/bin/sdb
在 shell 环境中设置变量及相关值:
file="/opt/sequoiadb/bin/sdb"
objname="sdb"
bucket=sdbbucket
url="localhost:8002"
resource="/${bucket}/${objname}"
contentType="application/x-compressed-tar"
dateValue=`date -R`
stringToSign="PUT\n\n${contentType}\n${dateValue}\n${resource}"
s3Key="ABCDEFGHIJKLMNOPQRST"
s3Secret="abcdefghijklmnopqrstuvwxyz0123456789ABCD"
signature=`echo -en ${stringToSign} | openssl sha1 -hmac ${s3Secret} -binary | base64`
使用 curl 向 sdbbucket 中写入文件 sdb,在 S3 中的名称是 sdb:curl -X PUT -T "${file}" -H "Host: ${url}" -H "Date: ${dateValue}" -H "Content-Type: ${contentType}" -H "Authorization: AWS ${s3Key}:${signature}" "http://${url}/${bucket}/${objname}"
从桶中下载文件:
使用 curl 从 sdbbucket 中读取文件对象"sdb",并存放到当前目录,保存的文件名为"sdb_download"。
在 shell 环境中设置变量及相关值:
file="./sdb_download"
objname="sdb"
bucket=sdbbucket
url="localhost:8002"
resource="/${bucket}/${objname}"
contentType="application/x-compressed-tar"
dateValue=`date -R`
stringToSign="GET\n\n${contentType}\n${dateValue}\n${resource}"
s3Key="ABCDEFGHIJKLMNOPQRST"
s3Secret="abcdefghijklmnopqrstuvwxyz0123456789ABCD"
signature=`echo -en ${stringToSign} | openssl sha1 -hmac ${s3Secret} -binary | base64`
使用 curl 从 sdbbucket 中下载文件:curl -o ${file} -X GET -H "Host: ${url}" -H "Date: ${dateValue}" -H "Content-Type: ${contentType}" -H "Authorization: AWS ${s3Key}:${signature}" "http://${url}/${bucket}/${objname}"
观察S3元数据:
查看创建的“sdbbucket” bucket的元数据:db.S3_SYS_Meta.S3_Bucket.find();
查看上载的文件的元数据:db.S3_SYS_Meta.S3_ObjectMeta.find();
文件对象所在的集合,这个集合由 S3 实例自动创建,并且默认按照每年每个季度分表,及每个季度产生一个新的Collection:
db.S3_SYS_Data_2020_1.S3_ObjectData_Q1.listLobs();
在S3实例中删除文件和桶:
在 shell 环境中设置变量及相关值:
objname="sdb"
bucket=sdbbucket
url="localhost:8002"
resource="/${bucket}/${objname}"
contentType="application/x-compressed-tar"
dateValue=`date -R`
stringToSign="GET\n\n${contentType}\n${dateValue}\n${resource}"
s3Key="ABCDEFGHIJKLMNOPQRST"
s3Secret="abcdefghijklmnopqrstuvwxyz0123456789ABCD"
signature=`echo -en ${stringToSign} | openssl sha1 -hmac ${s3Secret} -binary | base64`
使用 curl 命令 删除 sdbbucket 中的文件对象"sdb":curl -X DELETE -H "Host: ${url}" -H "Date: ${dateValue}" -H "Content-Type: ${contentType}" -H "Authorization: AWS ${s3Key}:${signature}" "http://${url}/${bucket}/${objname}"
使用 curl 命令,从 SequoiaS3 中删除桶"sdbbucket":curl -X DELETE -H "Host: ${url}" -H "Date: ${dateValue}" -H "Content-Type: ${contentType}" -H "Authorization: AWS ${s3Key}:${signature}" "http://${url}/${bucket}"
创建NFS网络文件系统实例:
查看fuse的版本信息:fusermount --version
查看 SequoiaFS 实例程序:ls -lrt /opt/sequoiadb/bin/sequoiafs
创建集合空间和集合:
创建逻辑域:db.createDomain("fs_domain", [ "group1", "group2", "group3" ], { AutoSplit: true } );
创建集合空间:db.createCS("fscs", { Domain: "fs_domain" } );
创建 fscl 集合,存储挂载目录下文件的内容:db.fscs.createCL("fscl", { "ShardingKey": { "_id": 1}, "ShardingType": "hash", "ReplSize": -1, "Compressed": true, "CompressionType": "lzw", "AutoSplit": true, "EnsureShardingIndex": false } );
创建挂载点及配置文件:
创建挂载点mountpoint:mkdir -p /opt/sequoiafs/mountpoint
创建 SequoiaFS 的配置文件目录和日志目录:
mkdir -p /opt/sequoiafs/conf/fscs_fscl/001/
mkdir -p /opt/sequoiafs/log/fscs_fscl/001/
产生一个空配置文件,SequoiaFS 服务在启动时会将指定的值写入该文件中,其他参数使用缺省值:touch /opt/sequoiafs/conf/fscs_fscl/001/sequoiafs.conf
sequoiafs /opt/sequoiafs/mountpoint -i localhost:11810 -l fscs.fscl --autocreate -c /opt/sequoiafs/conf/fscs_fscl/001/ --diagpath /opt/sequoiafs/log/fscs_fscl/001/ -o big_writes -o max_write=131072 -o max_read=131072
sequoiaFS实例参数配置
查看挂载目录:
本地 SequoiaFS 节点通过 mount 可以看到挂载信息。通过 sequoiafs 挂载上的 /opt/sequoiafs/mountpoint 目录,文件系统类型为 fuse.sequoiafs:mount
查看 SequoiaFS 实例在 SequoiaDB 存储引擎存储的挂载信息:db.sequoiafs.maphistory.find();
挂载目录下文件和目录操作:本章节将创建 fsdir 目录和 fsfile.txt 文件用于作为 SequoiaFS 实例的测试例子。
创建目录:
进入挂载目录:cd /opt/sequoiafs/mountpoint/
创建目录:mkdir fsdir
创建文件并写入内容:
进入新建的 fsdir 目录:cd /opt/sequoiafs/mountpoint/fsdir
使用echo重定向内容创建文件:echo 'hello, this is a fsfile!' >> fsfile.txt
查看文件内容是否存在:cat fsfile.txt