Broker 导入,主要用于从HDFS上把文件导入到Doris中。这是一个异步导入的方式。(任务执行成功并不代表数据全部都导入成功)
前提:启动HDFS。
案例演示:
--创建表
CREATE TABLE test_db.user_result(
id BIGINT,
name VARCHAR(50),
age INT,
gender INT,
province VARCHAR(50),
city VARCHAR(50),
region VARCHAR(50),
phone VARCHAR(50),
birthday VARCHAR(50),
hobby VARCHAR(50),
register_date VARCHAR(50)
)
DUPLICATE KEY(id)
DISTRIBUTED BY HASH(id) BUCKETS 10
PROPERTIES("replication_num" = "1");
--创建导入作业
LOAD LABEL test_db.user_result
(
DATA INFILE("hdfs://node1:8020/datas/doris/user.csv")
INTO TABLE `user_result`
COLUMNS TERMINATED BY ","
FORMAT AS "csv"
(id, name, age, gender, province,city,region,phone,birthday,hobby,register_date)
)
WITH BROKER broker_name
(
"dfs.nameservices" = "my_cluster",
"dfs.ha.namenodes.my_cluster" = "nn1",
"dfs.namenode.rpc-address.my_cluster.nn1" = "node1:8020",
"dfs.client.failover.proxy.provider" = "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider"
)
PROPERTIES
(
"max_filter_ratio"="0.00002"
);
--准备数据(数据在`/预习资料/02_资料/03_data/doris目录下`)
--先把数据上传到Linux的/export/data/doris目录下,再把数据上传到HDFS之上
hadoop fs -mkdir -p /datas/doris
hadoop fs -put /export/data/doris/user.csv /datas/doris
--查看导入任务是否完成
show load\G;
如果显示State:FINISHED,表示任务完成。数据已经完全导入,可以正常查看。
主要是把文件系统或者程序中的数据流的数据导入到Doris中。这是一个异步导入的方式。
语法:
curl --location-trusted -u user:passwd [-H ""...] -T data.file -XPUT http://fe_host:http_port/api/{db}/{table}/_stream_load
案例演示:
curl --location-trusted -u root:123456 -H "column_separator:," -T /export/data/doris/user.csv -X PUT http://node1:8030/api/test_db/user_result/_stream_load
是一关常驻的任务,是从指定的数据源(比如Kafka)中导入数据到Doris。这是异步导入方式。
演示案例:
#1.启动zk
zkServer.sh start
#2.启动kafka
nohup /export/server/kafka/bin/kafka-server-start.sh /export/server/kafka/config/server.properties > /tmp/kakka.log &
#3.查看topic是否存在
bin/kafka-topics.sh --list --zookeeper node1:2181
#4.创建topic,名字为test
bin/kafka-topics.sh --create \
--zookeeper node1:2181,node2:2181,node3:2181 \
--replication-factor 1 \
--partitions 1 \
--topic test
#5.进入生产者,发送数据
bin/kafka-console-producer.sh --broker-list node1:9092 --topic test
{"id":1,"name":"zhangsan","age":30}
{"id":2,"name":"lisi","age":18}
#6.创建表
create table student_kafka
(
id int,
name varchar(50),
age int
)
DUPLICATE KEY(id)
DISTRIBUTED BY HASH(id) BUCKETS 10
PROPERTIES("replication_num" = "1");
#7.创建routine load任务
CREATE ROUTINE LOAD test_db.kafka_job1 on student_kafka
PROPERTIES
(
"desired_concurrent_number"="1",
"strict_mode"="false",
"format" = "json"
)
FROM KAFKA
(
"kafka_broker_list"= "node1:9092",
"kafka_topic" = "test",
"property.group.id" = "test_group_1",
"property.kafka_default_offsets" = "OFFSET_BEGINNING",
"property.enable.auto.commit" = "false"
);
#8.查看表数据
select * from student_kafka;
普通的insert语句。语法如下:(和MySQL类似)
案例演示:
insert into test_db.example_site_visit values(10005,'2017-10-03','广州',35,0,'2017-10-03 10:20:22',11,6,6);
数据导出(Export)是 Doris 提供的一种将数据导出的功能。该功能可以将用户指定的表或分区的数据,以文本的格式,通过 Broker 进程导出到远端存储上,如 HDFS/BOS 等。
案例演示:把表数据导出到HDFS
--准备目标表的数据
select * from example_site_visit;
--创建导出的任务
EXPORT TABLE test_db.example_site_visit
TO "hdfs://node1:8020/datas/output"
WITH BROKER "broker_name" (
"username"="root",
"password"="123456"
);
--查看HDFS的路径,看是否有数据