clickhouse实时大数据分析引擎的SQL写法详解

 

ClickHouse的优点

  • 单个查询的并行处理(利用多个核心)
  • 多个服务器上的分布式处理
  • 非常快速的扫描(参见下面的基准测试),可用于实时查询
  • 列存储非常适合使用“宽”/“非规范化”表(许多列)
  • 压缩性好
  • SQL支持(有限制)
  • 良好的功能集,包括支持近似计算
  • 不同的存储引擎(磁盘存储格式)
  • 非常适合结构日志/事件数据以及时间序列数据(引擎MergeTree需要日期字段)
  • 索引支持(仅限主键,不是所有存储引擎)
  • 漂亮的命令行界面,具有用户友好的进度条和格式

以下是ClickHouse功能的完整列表

ClickHouse的缺点

  • 没有真正的删除/更新支持,也没有事务(与Spark和大多数大数据系统相同)
  • 没有二级密钥(与Spark和大多数大数据系统相同)
  • 自己的协议(没有MySQL协议支持)
  • 有限的SQL支持,以及连接实现是不同的。如果要从MySQL或Spark迁移,则可能必须使用连接重新编写所有查询。
  • 没有窗口功能

此处为clickhouse的sql写法详解:

clickhouse的SQL查询语句与mysql,presto的SQL大致相同,但是也有少许不同的地方,此文仅记录与我日常编写的clickhouseSQL查询语句,包含计算函数,聚合函数,关联语句写法,和一些需要注意的地方。

clickhouse官方网站链接地址:https://clickhouse.yandex/

一:简单查询语句

简单查询语句与mysql等数据库并无差异

select * from 库名.表名 where 条件

ex:select  snow,sname,sage from student where sno= 1

需要注意的地方:clickhouse 是用于实时大数据分析引擎,所以,一般使用clickhouse表内数据很大,也有很多分区,所以,查询时如果查询数据太大,where条件限制作用很小的情况下需要使用 limit做查询限制。否则会提示查询数据超过XXGB的报错提示。

二:关联查询语句

首先:clickhouse查询不支持大于两个表以上的直接join,像如下这种mysql等常用的多表关联写法

第二:关联条件从on改为 using  ,using 字段必须在各表中名称一致,如果不一致可以通过select 字段 as 别名,将字段名统一

第三:关联 关键字  常用的有一下几种

1.ALL LEFT JOIN     

2.ANY LEFT JOIN

3.ALL FULL JOIN  

 

一般的sql关联查询语句如下

ex:select * from table as a                                                                                                                                                                          left join table as b on a.id=b.aid                                                                                                                                                              left join table as c on a.id=c.aid

这种写法在clickhouse中会报错 ,两张表以上的表关联,可以通过子查询的方式来处理

ex: 先将TABLEA表与TABLEB表通过left join 关联之后的结果再left join TABLEC  关联条件从on改为 using                                     using 字段必须在各表中名称一致,如果不一致可以通过select 字段 as 别名,将字段名统一

SELECT
	*
    FROM
(       SELECT
	    *
        FROM
	    (
            (select *
                FROM
	            TABLEA )                                                                                                                                         
            ALL LEFT JOIN
             (select *
                 FROM
	             TABLEB )using aid
         )                                                   
         ALL left  JOIN
             (select *
                FROM
	            TABLEC) USING aid
)     

 

三:常用函数或表达式

1.sum(字段)  求和

2.avg(字段)  求平均

3.round(字段/sum(字段)/(计算公式a/b),2) 四舍五入取2位小数

4.case when 字段B= 0 then null/0 else round( 字段A/字段B ,4)   判断语句  如果被除数为0 那么返回null或者0中的一个 

5.toString()转化为字符串

6.concat(字段值,'要拼接的内容如%')

示例:concat(toString(round(round(a/b,4) * 100 ,2)),'%')

 

四:相同部分

在where条件与group by ,order by,limit 这些的使用上同mysql的SQL写法

 

五:实际示例:

SELECT
	account_id,
	'2018-12-12~2018-12-15' AS date,
	account,
	ad_click
FROM
	(
		SELECT
			account_id,
			fr,
			fr_name,
			account,
			account_balance,
			account_budget,
			account_exclude_ip,
			account_budget_offline_time,
			account_status
		FROM
			marketing.sem_account_type
		WHERE
			1 = 1
		AND lower(fr) IN ('bd_sem')
		AND account_id IN (
			'18091503',
			'18091505',
			'18091501'
		)
	) ALL
LEFT JOIN (
	SELECT
		account_id,
		fr,
		round(sum(ad_cost) / 3, 2) AS ad_cost,
		round(sum(ad_cost_real) / 3, 2) AS ad_cost_real,
		round(sum(ad_impression) / 3, 2) AS ad_impression,
		round(sum(ad_click) / 3, 2) AS ad_click,
		round(sum(clue_all) / 3, 2) AS clue_all,
		round(sum(clue_all_new) / 3, 2) AS clue_all_new,
		round(
			sum(
				c1_kpi_daily_new_customer_amount
			) / 3,
			2
		) AS c1_kpi_daily_new_customer_amount,
		round(
			sum(c1_kpi_new_customer_amount) / 3,
			2
		) AS c1_kpi_new_customer_amount,
		round(
			sum(
				c2_kpi_daily_new_customer_amount
			) / 3,
			2
		) AS c2_kpi_daily_new_customer_amount,
		round(
			sum(c2_kpi_new_customer_amount) / 3,
			2
		) AS c2_kpi_new_customer_amount,
		round(sum(c2c_c1_create) / 3, 2) AS c2c_c1_create,
		round(sum(c2b_c1_create) / 3, 2) AS c2b_c1_create,
		round(sum(c2c_c1_onsite) / 3, 2) AS c2c_c1_onsite,
		round(sum(c2b_c1_onsite) / 3, 2) AS c2b_c1_onsite,
		round(sum(c2c_c1_onsale) / 3, 2) AS c2c_c1_onsale,
		round(sum(c2b_c1_onsale) / 3, 2) AS c2b_c1_onsale,
		round(sum(c2c_c2_appoint) / 3, 2) AS c2c_c2_appoint,
		round(sum(b2c_c2_appoint) / 3, 2) AS b2c_c2_appoint,
		round(sum(ssss_c2_appoint) / 3, 2) AS ssss_c2_appoint,
		round(
			sum(c2c_c2_finish_appoint) / 3,
			2
		) AS c2c_c2_finish_appoint,
		round(
			sum(b2c_c2_finish_appoint) / 3,
			2
		) AS b2c_c2_finish_appoint,
		round(
			sum(ssss_c2_finish_appoint) / 3,
			2
		) AS ssss_c2_finish_appoint,
		round(sum(c2c_c2_order) / 3, 2) AS c2c_c2_order,
		round(sum(b2c_c2_order) / 3, 2) AS b2c_c2_order,
		round(sum(weighting_number) / 3, 2) AS weighting_number,
		round(sum(ssss_c2_order) / 3, 2) AS ssss_c2_order
	FROM
		(
			SELECT
				fr,
				keyword_id,
				account_id,
				cost AS ad_cost,
				cost_real AS ad_cost_real,
				impression AS ad_impression,
				click AS ad_click
			FROM
				marketing.sem_keyword_report
			WHERE
				1 = 1
			AND the_day >= '2018-12-12'
			AND the_day <= '2018-12-15'
			AND lower(fr) IN ('bd_sem')
			AND account_id IN (
				'18091503',
				'18091505',
				'18091501'
			)
			AND (
				campaign_city IN (
					'上海',
					'东莞',
					'中山',
					'临沂',
					'乌鲁木齐',
					'伊犁',
					'佛山',
					'保定',
					'全国',
					'兰州',
					'包头',
					'北京',
					'南京',
					'南宁',
					'南昌',
					'南通',
					'南阳',
					'厦门',
					'合肥',
					'呼和浩特',
					'咸阳',
					'哈尔滨',
					'唐山',
					'嘉兴',
					'大同',
					'大连',
					'天津',
					'太原',
					'宁波',
					'宜昌',
					'宿迁',
					'常州',
					'广州',
					'廊坊',
					'徐州',
					'惠州',
					'成都',
					'扬州',
					'新乡',
					'无锡',
					'昆明',
					'杭州',
					'武汉',
					'沈阳',
					'泉州',
					'泰州',
					'泸州',
					'洛阳',
					'济南',
					'济宁',
					'淮安',
					'深圳',
					'温州',
					'澳门',
					'烟台',
					'珠海',
					'盐城',
					'石家庄',
					'福州',
					'绵阳',
					'芜湖',
					'苏州',
					'襄阳',
					'西安',
					'许昌',
					'贵阳',
					'达州',
					'郑州',
					'重庆',
					'金华',
					'银川',
					'镇江',
					'长春',
					'长沙',
					'青岛'
				)
			)
		) ALL
	FULL JOIN (
		SELECT
			fr,
			keyword_id,
			account_id,
			clue_all,
			clue_all_new,
			c1_kpi_daily_new_customer_amount,
			c1_kpi_new_customer_amount,
			c2_kpi_daily_new_customer_amount,
			c2_kpi_new_customer_amount,
			c2c_c1_create,
			c2b_c1_create,
			c2c_c1_onsite,
			c2b_c1_onsite,
			c2c_c1_onsale,
			c2b_c1_onsale,
			c2c_c2_appoint,
			b2c_c2_appoint,
			ssss_c2_appoint,
			c2c_c2_finish_appoint,
			b2c_c2_finish_appoint,
			ssss_c2_finish_appoint,
			c2c_c2_order,
			b2c_c2_order,
			weighting_number,
			ssss_c2_order
		FROM
			(
				SELECT
					fr,
					kid AS keyword_id,
					sum(clue_all) AS clue_all,
					sum(clue_all_new) AS clue_all_new,
					sum(
						c1_kpi_daily_new_customer_amount
					) AS c1_kpi_daily_new_customer_amount,
					sum(c1_kpi_new_customer_amount) AS c1_kpi_new_customer_amount,
					sum(
						c2_kpi_daily_new_customer_amount
					) AS c2_kpi_daily_new_customer_amount,
					sum(c2_kpi_new_customer_amount) AS c2_kpi_new_customer_amount,
					sum(c2c_c1_create) AS c2c_c1_create,
					sum(c2b_c1_create) AS c2b_c1_create,
					sum(c2c_c1_onsite) AS c2c_c1_onsite,
					sum(c2b_c1_onsite) AS c2b_c1_onsite,
					sum(c2c_c1_onsale) AS c2c_c1_onsale,
					sum(c2b_c1_onsale) AS c2b_c1_onsale,
					sum(c2c_c2_appoint) AS c2c_c2_appoint,
					sum(b2c_c2_appoint) AS b2c_c2_appoint,
					sum(ssss_c2_appoint) AS ssss_c2_appoint,
					sum(c2c_c2_finish_appoint) AS c2c_c2_finish_appoint,
					sum(b2c_c2_finish_appoint) AS b2c_c2_finish_appoint,
					sum(ssss_c2_finish_appoint) AS ssss_c2_finish_appoint,
					sum(c2c_c2_order) AS c2c_c2_order,
					sum(b2c_c2_order) AS b2c_c2_order,
					sum(weighting_number) AS weighting_number,
					sum(ssss_c2_order) AS ssss_c2_order
				FROM
					marketing.market_kid_stat_new_v5
				WHERE
					1 = 1
				AND dts >= '2018-12-12'
				AND dts <= '2018-12-15'
				AND lower(fr) IN ('bd_sem')
				AND (
					city IN (
						'上海',
						'东莞',
						'中山',
						'临沂',
						'乌鲁木齐',
						'伊犁',
						'佛山',
						'保定',
						'全国',
						'兰州',
						'包头',
						'北京',
						'南京',
						'南宁',
						'南昌',
						'南通',
						'南阳',
						'厦门',
						'合肥',
						'呼和浩特',
						'咸阳',
						'哈尔滨',
						'唐山',
						'嘉兴',
						'大同',
						'大连',
						'天津',
						'太原',
						'宁波',
						'宜昌',
						'宿迁',
						'常州',
						'广州',
						'廊坊',
						'徐州',
						'惠州',
						'成都',
						'扬州',
						'新乡',
						'无锡',
						'昆明',
						'杭州',
						'武汉',
						'沈阳',
						'泉州',
						'泰州',
						'泸州',
						'洛阳',
						'济南',
						'济宁',
						'淮安',
						'深圳',
						'温州',
						'澳门',
						'烟台',
						'珠海',
						'盐城',
						'石家庄',
						'福州',
						'绵阳',
						'芜湖',
						'苏州',
						'襄阳',
						'西安',
						'许昌',
						'贵阳',
						'达州',
						'郑州',
						'重庆',
						'金华',
						'银川',
						'镇江',
						'长春',
						'长沙',
						'青岛'
					)
				)
				GROUP BY
					keyword_id,
					fr
			) ANY
		LEFT JOIN (
			SELECT
				fr,
				keyword_id,
				account_id
			FROM
				marketing.sem_keyword_type
		) USING keyword_id,
		fr
	WHERE
		1 = 1
	AND lower(fr) IN ('bd_sem')
	AND account_id IN (
		'18091503',
		'18091505',
		'18091501'
	)
	) USING keyword_id,
	fr
GROUP BY
	account_id,
	fr
) USING account_id,
 fr
ORDER BY
	account_id
LIMIT 0,
 50

 

你可能感兴趣的:(java,clickhouse,大数据)