clickhouse使用1

ClickHouse是战斗民族出品的一个用于联机分析(OLAP)的列式数据库管理系统(DBMS)。ClickHouse不单单是一个数据库, 它是一个数据库管理系统。因为它允许在运行时创建表和数据库、加载数据和运行查询,而无需重新配置或重启服务。

基于Hadoop生态的Druid、Kylin等具有大数据运算能力的组件,它们都具有实时查询的能力,可满足大部份实时分析场景的需求。ClickHouse具有以上组件的优点,同时还能够高效利用CPU资源,对数据做任何预处理的情况下以极低的延迟处理查询并将结果返回。对SQL的支持:基于SQL的声明式查询语言,大部分情况下是与SQL标准兼容的,支持的查询包括 GROUP BY,ORDER BY,IN,JOIN以及非相关子查询,但不支持窗口函数和相关子查询。支持实时数据更新:ClickHouse支持在表中定义主键,为了使查询能够快速在主键中进行范围查找,数据总是以增量的方式有序的存储在MergeTree中,数据可以持续不断地高效的写入到表中,并且写入的过程中不会存在任何加锁的行为。可满足海量数据实时分析统计需求,单机可达每秒几亿行的吞吐量。

安装:

环境:CentOS 7 64位

添加官方存储库:

sudo yum install yum-utils
sudo rpm --import https://repo.yandex.ru/clickhouse/CLICKHOUSE-KEY.GPG
sudo yum-config-manager --add-repo https://repo.yandex.ru/clickhouse/rpm/stable/x86_64

安装server与client

sudo yum install clickhouse-server clickhouse-client

 

启动服务:

sudo service clickhouse-server start

/var/log/clickhouse-server/目录中查看日志。

如果服务没有启动,请检查配置文件 /etc/clickhouse-server/config.xml

你也可以在控制台中直接启动服务:

clickhouse-server --config-file=/etc/clickhouse-server/config.xml

使用命令行客户端连接到服务:

clickhouse-client

检查是否可以工作:

clickhouse使用1_第1张图片

 

下载测试数据:

for s in `seq 1987 2018`

do

for m in `seq 1 12`

do

wget https://transtats.bts.gov/PREZIP/On_Time_Reporting_Carrier_On_Time_Performance_1987_present_${s}_${m}.zip

done

done

创建表结构:

通过命令:clickhouse-client进入终端,执行创建表语句(需转成一行):

CREATE TABLE `ontime` (

  `Year` UInt16,

  `Quarter` UInt8,

  `Month` UInt8,

  `DayofMonth` UInt8,

  `DayOfWeek` UInt8,

  `FlightDate` Date,

  `UniqueCarrier` FixedString(7),

  `AirlineID` Int32,

  `Carrier` FixedString(2),

  `TailNum` String,

  `FlightNum` String,

  `OriginAirportID` Int32,

  `OriginAirportSeqID` Int32,

  `OriginCityMarketID` Int32,

  `Origin` FixedString(5),

  `OriginCityName` String,

  `OriginState` FixedString(2),

  `OriginStateFips` String,

  `OriginStateName` String,

  `OriginWac` Int32,

  `DestAirportID` Int32,

  `DestAirportSeqID` Int32,

  `DestCityMarketID` Int32,

  `Dest` FixedString(5),

  `DestCityName` String,

  `DestState` FixedString(2),

  `DestStateFips` String,

  `DestStateName` String,

  `DestWac` Int32,

  `CRSDepTime` Int32,

  `DepTime` Int32,

  `DepDelay` Int32,

  `DepDelayMinutes` Int32,

  `DepDel15` Int32,

  `DepartureDelayGroups` String,

  `DepTimeBlk` String,

  `TaxiOut` Int32,

  `WheelsOff` Int32,

  `WheelsOn` Int32,

  `TaxiIn` Int32,

  `CRSArrTime` Int32,

  `ArrTime` Int32,

  `ArrDelay` Int32,

  `ArrDelayMinutes` Int32,

  `ArrDel15` Int32,

  `ArrivalDelayGroups` Int32,

  `ArrTimeBlk` String,

  `Cancelled` UInt8,

  `CancellationCode` FixedString(1),

  `Diverted` UInt8,

  `CRSElapsedTime` Int32,

  `ActualElapsedTime` Int32,

  `AirTime` Int32,

  `Flights` Int32,

  `Distance` Int32,

  `DistanceGroup` UInt8,

  `CarrierDelay` Int32,

  `WeatherDelay` Int32,

  `NASDelay` Int32,

  `SecurityDelay` Int32,

  `LateAircraftDelay` Int32,

  `FirstDepTime` String,

  `TotalAddGTime` String,

  `LongestAddGTime` String,

  `DivAirportLandings` String,

  `DivReachedDest` String,

  `DivActualElapsedTime` String,

  `DivArrDelay` String,

  `DivDistance` String,

  `Div1Airport` String,

  `Div1AirportID` Int32,

  `Div1AirportSeqID` Int32,

  `Div1WheelsOn` String,

  `Div1TotalGTime` String,

  `Div1LongestGTime` String,

  `Div1WheelsOff` String,

  `Div1TailNum` String,

  `Div2Airport` String,

  `Div2AirportID` Int32,

  `Div2AirportSeqID` Int32,

  `Div2WheelsOn` String,

  `Div2TotalGTime` String,

  `Div2LongestGTime` String,

  `Div2WheelsOff` String,

  `Div2TailNum` String,

  `Div3Airport` String,

  `Div3AirportID` Int32,

  `Div3AirportSeqID` Int32,

  `Div3WheelsOn` String,

  `Div3TotalGTime` String,

  `Div3LongestGTime` String,

  `Div3WheelsOff` String,

  `Div3TailNum` String,

  `Div4Airport` String,

  `Div4AirportID` Int32,

  `Div4AirportSeqID` Int32,

  `Div4WheelsOn` String,

  `Div4TotalGTime` String,

  `Div4LongestGTime` String,

  `Div4WheelsOff` String,

  `Div4TailNum` String,

  `Div5Airport` String,

  `Div5AirportID` Int32,

  `Div5AirportSeqID` Int32,

  `Div5WheelsOn` String,

  `Div5TotalGTime` String,

  `Div5LongestGTime` String,

  `Div5WheelsOff` String,

  `Div5TailNum` String

) ENGINE = MergeTree

PARTITION BY Year

ORDER BY (Carrier, FlightDate)

SETTINGS index_granularity = 8192;

加载数据:

退出clickhouse终端,进行shell命令:

$ for i in *.zip; do echo $i; unzip -cq $i '*.csv' | sed 's/\.00//g' | clickhouse-client --host=example-perftest01j --query="INSERT INTO ontime FORMAT CSVWithNames"; done

验证数据:

查询总记录数:

select count(*) from ontime

clickhouse使用1_第2张图片

查询平均数:

SELECT avg(c1)

FROM

(

    SELECT Year, Month, count(*) AS c1

    FROM ontime

    GROUP BY Year, Month

);

clickhouse使用1_第3张图片

在普通双核虚机上,千万级数据能达到毫秒级输出,可见其查询性能之高。

你可能感兴趣的:(clickhouse使用1)