我们先获取一些开源数据样本集,我们将使用美国1987到2015年的民用航班数据,很难称这个样本为大数据(只包含1亿6千6百万行数据,未压缩时有63GB),但我们能用它很快地开干。
创建表语句:
CREATE TABLE `ontime` ( \
`Year` UInt16, \
`Quarter` UInt8, \
`Month` UInt8, \
`DayofMonth` UInt8, \
`DayOfWeek` UInt8, \
`FlightDate` Date, \
`UniqueCarrier` FixedString(7), \
`AirlineID` Int32, \
`Carrier` FixedString(2), \
`TailNum` String, \
`FlightNum` String, \
`OriginAirportID` Int32, \
`OriginAirportSeqID` Int32, \
`OriginCityMarketID` Int32, \
`Origin` FixedString(5), \
`OriginCityName` String, \
`OriginState` FixedString(2), \
`OriginStateFips` String, \
`OriginStateName` String, \
`OriginWac` Int32, \
`DestAirportID` Int32, \
`DestAirportSeqID` Int32, \
`DestCityMarketID` Int32, \
`Dest` FixedString(5), \
`DestCityName` String, \
`DestState` FixedString(2), \
`DestStateFips` String, \
`DestStateName` String, \
`DestWac` Int32, \
`CRSDepTime` Int32, \
`DepTime` Int32, \
`DepDelay` Int32, \
`DepDelayMinutes` Int32, \
`DepDel15` Int32, \
`DepartureDelayGroups` String, \
`DepTimeBlk` String, \
`TaxiOut` Int32, \
`WheelsOff` Int32, \
`WheelsOn` Int32, \
`TaxiIn` Int32, \
`CRSArrTime` Int32, \
`ArrTime` Int32, \
`ArrDelay` Int32, \
`ArrDelayMinutes` Int32, \
`ArrDel15` Int32, \
`ArrivalDelayGroups` Int32, \
`ArrTimeBlk` String, \
`Cancelled` UInt8, \
`CancellationCode` FixedString(1), \
`Diverted` UInt8, \
`CRSElapsedTime` Int32, \
`ActualElapsedTime` Int32, \
`AirTime` Int32, \
`Flights` Int32, \
`Distance` Int32, \
`DistanceGroup` UInt8, \
`CarrierDelay` Int32, \
`WeatherDelay` Int32, \
`NASDelay` Int32, \
`SecurityDelay` Int32, \
`LateAircraftDelay` Int32, \
`FirstDepTime` String, \
`TotalAddGTime` String, \
`LongestAddGTime` String, \
`DivAirportLandings` String, \
`DivReachedDest` String, \
`DivActualElapsedTime` String, \
`DivArrDelay` String, \
`DivDistance` String, \
`Div1Airport` String, \
`Div1AirportID` Int32, \
`Div1AirportSeqID` Int32, \
`Div1WheelsOn` String, \
`Div1TotalGTime` String, \
`Div1LongestGTime` String, \
`Div1WheelsOff` String, \
`Div1TailNum` String, \
`Div2Airport` String, \
`Div2AirportID` Int32, \
`Div2AirportSeqID` Int32, \
`Div2WheelsOn` String, \
`Div2TotalGTime` String, \
`Div2LongestGTime` String, \
`Div2WheelsOff` String, \
`Div2TailNum` String, \
`Div3Airport` String, \
`Div3AirportID` Int32, \
`Div3AirportSeqID` Int32, \
`Div3WheelsOn` String, \
`Div3TotalGTime` String, \
`Div3LongestGTime` String, \
`Div3WheelsOff` String, \
`Div3TailNum` String, \
`Div4Airport` String, \
`Div4AirportID` Int32, \
`Div4AirportSeqID` Int32, \
`Div4WheelsOn` String, \
`Div4TotalGTime` String, \
`Div4LongestGTime` String, \
`Div4WheelsOff` String, \
`Div4TailNum` String, \
`Div5Airport` String, \
`Div5AirportID` Int32, \
`Div5AirportSeqID` Int32, \
`Div5WheelsOn` String, \
`Div5TotalGTime` String, \
`Div5LongestGTime` String, \
`Div5WheelsOff` String, \
`Div5TailNum` String \
) ENGINE = MergeTree(FlightDate, (Year, FlightDate), 8192);
数据可以从这里下载https://yadi.sk/d/pOZxpa42sDdgm
导入数据(我这行命令导入失败,下面的方法成功了):xz -v -c -d < ontime.csv.xz | clickhouse-client –query=”INSERT INTO ontime FORMAT CSV”(等了半天一条数据也没插进去,后来我又在Windows10下面将ontime.csv.xz解压后六十多G的CSV文件再传到Ubuntu上面,再用clickhouse-client –query “INSERT INTO ontime FORMAT CSV” < ontime.csv命令导入,可是导入了一百多万条后就又挂掉了。我感觉是文件过大的问题,像下面是按每个月划分,最大的也才二十多M就OK)
你也可以从原地址下载,下载数据的shell脚本(该脚本下载的数据已经更新为1987年到2017年,由于文件数目、大小、网络的原因,我整整下载了两天,zip文件总大小为6.3G):https://github.com/Percona-Lab/ontime-airline-performance/blob/master/download.sh
for s in `seq 1987 2017`
do
for m in `seq 1 12`
do
wget http://transtats.bts.gov/PREZIP/On_Time_On_Time_Performance_${s}_${m}.zip
done
done
导入数据:
for i in *.zip; do echo $i; unzip -cq $i '*.csv' | sed 's/\.00//g' | clickhouse-client --host=localhost --query="INSERT INTO ontime FORMAT CSVWithNames"; done
如果报下面这类错的话,在脚本的wget后加–no-check-certificate参数:
Connecting to transtats.bts.gov (transtats.bts.gov)|204.68.194.70|:443... connected.
ERROR: cannot verify transtats.bts.gov's certificate, issued by ‘/C=US/O=Entrust, Inc./OU=See www.entrust.net/legal-terms/OU=(c) 2012 Entrust, Inc. - for authorized use only/CN=Entrust Certification Authority - L1K’:
Unable to locally verify the issuer's authority.
To connect to transtats.bts.gov insecurely, use `--no-check-certificate'.
总行数:
:) select count(*) from ontime;
SELECT count(*)
FROM ontime
┌───count()─┐
│ 176668654 │
└───────────┘
1 rows in set. Elapsed: 4.365 sec. Processed 176.67 million rows, 176.67 MB (40.47 million rows/s., 40.47 MB/s.)
2015年最受欢迎的目的地:
SELECT \
OriginCityName, \
DestCityName, \
count(*) AS flights, \
bar(flights, 0, 20000, 40) \
FROM ontime \
WHERE Year = 2015 \
GROUP BY \
OriginCityName, \
DestCityName \
ORDER BY flights DESC \
LIMIT 20;
┌─OriginCityName────┬─DestCityName──────┬─flights─┬─bar(count(), 0, 20000, 40)──────┐
│ San Francisco, CA │ Los Angeles, CA │ 15116 │ ██████████████████████████████▏ │
│ Los Angeles, CA │ San Francisco, CA │ 14799 │ █████████████████████████████▌ │
│ New York, NY │ Chicago, IL │ 14734 │ █████████████████████████████▍ │
│ Chicago, IL │ New York, NY │ 14632 │ █████████████████████████████▎ │
│ Boston, MA │ New York, NY │ 13201 │ ██████████████████████████▍ │
│ New York, NY │ Boston, MA │ 13201 │ ██████████████████████████▍ │
│ New York, NY │ Los Angeles, CA │ 13113 │ ██████████████████████████▏ │
│ Los Angeles, CA │ New York, NY │ 13106 │ ██████████████████████████▏ │
│ Chicago, IL │ Washington, DC │ 12509 │ █████████████████████████ │
│ Washington, DC │ Chicago, IL │ 12310 │ ████████████████████████▌ │
│ Atlanta, GA │ Chicago, IL │ 12213 │ ████████████████████████▍ │
│ Chicago, IL │ Atlanta, GA │ 12103 │ ████████████████████████▏ │
│ Los Angeles, CA │ Chicago, IL │ 11111 │ ██████████████████████▏ │
│ Atlanta, GA │ New York, NY │ 11004 │ ██████████████████████ │
│ New York, NY │ Atlanta, GA │ 10986 │ █████████████████████▊ │
│ Miami, FL │ New York, NY │ 10790 │ █████████████████████▌ │
│ New York, NY │ Miami, FL │ 10779 │ █████████████████████▌ │
│ Chicago, IL │ Los Angeles, CA │ 10755 │ █████████████████████▌ │
│ Las Vegas, NV │ Los Angeles, CA │ 10657 │ █████████████████████▎ │
│ Boston, MA │ Washington, DC │ 10655 │ █████████████████████▎ │
└───────────────────┴───────────────────┴─────────┴─────────────────────────────────┘
20 rows in set. Elapsed: 11.339 sec. Processed 7.18 million rows, 331.70 MB (633.25 thousand rows/s., 29.25 MB/s.)
最受欢迎的出发城市:
SELECT \
OriginCityName, \
count(*) AS flights \
FROM ontime \
GROUP BY OriginCityName \
ORDER BY flights DESC \
LIMIT 20;
┌─OriginCityName────────┬──flights─┐
│ Chicago, IL │ 11151277 │
│ Atlanta, GA │ 9560972 │
│ Dallas/Fort Worth, TX │ 7921213 │
│ Houston, TX │ 6054671 │
│ Los Angeles, CA │ 5963597 │
│ New York, NY │ 5426917 │
│ Denver, CO │ 5351312 │
│ Phoenix, AZ │ 5006112 │
│ Washington, DC │ 4355229 │
│ San Francisco, CA │ 4141722 │
│ Detroit, MI │ 4109780 │
│ Las Vegas, NV │ 3923183 │
│ Minneapolis, MN │ 3830458 │
│ Newark, NJ │ 3717883 │
│ Charlotte, NC │ 3619757 │
│ Boston, MA │ 3292009 │
│ St. Louis, MO │ 3180881 │
│ Orlando, FL │ 3038619 │
│ Salt Lake City, UT │ 3020356 │
│ Seattle, WA │ 2969059 │
└───────────────────────┴──────────┘
20 rows in set. Elapsed: 21.299 sec. Processed 176.67 million rows, 3.92 GB (8.29 million rows/s., 183.82 MB/s.)
目的地最多的出发城市:
SELECT \
OriginCityName, \
uniq(Dest) As u \
FROM ontime \
GROUP BY OriginCityName \
ORDER BY u DESC \
LIMIT 20;
┌─OriginCityName────────┬───u─┐
│ Chicago, IL │ 213 │
│ Atlanta, GA │ 210 │
│ Dallas/Fort Worth, TX │ 190 │
│ Denver, CO │ 179 │
│ Minneapolis, MN │ 158 │
│ Houston, TX │ 152 │
│ Detroit, MI │ 147 │
│ Salt Lake City, UT │ 147 │
│ Cincinnati, OH │ 145 │
│ New York, NY │ 135 │
│ Los Angeles, CA │ 128 │
│ Washington, DC │ 127 │
│ Charlotte, NC │ 124 │
│ Newark, NJ │ 124 │
│ Orlando, FL │ 121 │
│ Phoenix, AZ │ 121 │
│ Las Vegas, NV │ 117 │
│ Pittsburgh, PA │ 114 │
│ Memphis, TN │ 113 │
│ San Francisco, CA │ 110 │
└───────────────────────┴─────┘
20 rows in set. Elapsed: 44.371 sec. Processed 176.67 million rows, 4.80 GB (3.98 million rows/s., 108.15 MB/s.)
周内各天的航班延误:
SELECT \
DayOfWeek, \
count() AS c, \
avg(DepDelay > 60) AS delays \
FROM ontime \
GROUP BY DayOfWeek \
ORDER BY DayOfWeek ASC;
┌─DayOfWeek─┬────────c─┬───────────────delays─┐
│ 1 │ 26032980 │ 0.044869738308868215 │
│ 2 │ 25752217 │ 0.03884279167110156 │
│ 3 │ 25883344 │ 0.04181356937496175 │
│ 4 │ 25985675 │ 0.04855652200683646 │
│ 5 │ 26026260 │ 0.05150490312476706 │
│ 6 │ 22380078 │ 0.035844781238027854 │
│ 7 │ 24608100 │ 0.04395995627456 │
└───────────┴──────────┴──────────────────────┘
7 rows in set. Elapsed: 26.459 sec. Processed 176.67 million rows, 883.34 MB (6.68 million rows/s., 33.38 MB/s.)
最常延误1小时及以上的出发城市:
SELECT \
OriginCityName, \
count() AS c, \
avg(DepDelay > 60) AS delays \
FROM ontime \
GROUP BY OriginCityName \
HAVING c > 100000 \
ORDER BY delays DESC \
LIMIT 20;
┌─OriginCityName──────┬────────c─┬───────────────delays─┐
│ Fayetteville, AR │ 185229 │ 0.06730047670721108 │
│ Newark, NJ │ 3717883 │ 0.06609971319699948 │
│ Chicago, IL │ 11151277 │ 0.0617198371092387 │
│ San Francisco, CA │ 4141722 │ 0.06033722205401521 │
│ Eugene, OR │ 114522 │ 0.05732523008679555 │
│ Santa Barbara, CA │ 199334 │ 0.0560968023518316 │
│ New York, NY │ 5426917 │ 0.05557851723179109 │
│ White Plains, NY │ 202042 │ 0.05521624216747013 │
│ Springfield, MO │ 140140 │ 0.05520907663764807 │
│ Burlington, VT │ 150360 │ 0.05497472732109603 │
│ Miami, FL │ 2096946 │ 0.05327461937503398 │
│ Philadelphia, PA │ 2864104 │ 0.052185604991997495 │
│ Monterey, CA │ 109122 │ 0.05193269918073349 │
│ Fort Lauderdale, FL │ 1643990 │ 0.05149970498604006 │
│ Columbia, SC │ 226198 │ 0.05121619112458996 │
│ Valparaiso, FL │ 109304 │ 0.05119666251921247 │
│ Juneau, AK │ 127035 │ 0.05069469043964262 │
│ Boston, MA │ 3292009 │ 0.050523859442668594 │
│ Moline, IL │ 121352 │ 0.049920891291449665 │
│ Akron, OH │ 148005 │ 0.049815884598493294 │
└─────────────────────┴──────────┴──────────────────────┘
20 rows in set. Elapsed: 30.723 sec. Processed 176.67 million rows, 4.62 GB (5.75 million rows/s., 150.44 MB/s.)
最长飞行时间:
SELECT \
OriginCityName, \
DestCityName, \
count(*) AS flights, \
avg(AirTime) As duration \
FROM ontime \
GROUP BY \
OriginCityName, \
DestCityName \
ORDER BY duration DESC \
LIMIT 20;
┌─OriginCityName────────┬─DestCityName───┬─flights─┬───────────duration─┐
│ New York, NY │ Honolulu, HI │ 2074 │ 606.9170684667309 │
│ Newark, NJ │ Honolulu, HI │ 7219 │ 590.4765202936695 │
│ Washington, DC │ Honolulu, HI │ 1119 │ 579.7444146559428 │
│ Charlotte, NC │ Honolulu, HI │ 223 │ 563.4394618834081 │
│ Atlanta, GA │ Kahului, HI │ 173 │ 545.6184971098265 │
│ Cincinnati, OH │ Honolulu, HI │ 1176 │ 540.4447278911565 │
│ Detroit, MI │ Honolulu, HI │ 467 │ 535.8779443254818 │
│ Honolulu, HI │ New York, NY │ 2077 │ 525.0828117477131 │
│ Honolulu, HI │ Newark, NJ │ 7233 │ 518.7114613576663 │
│ Honolulu, HI │ Washington, DC │ 1119 │ 508.65415549597856 │
│ St. Louis, MO │ Kahului, HI │ 1093 │ 498.97987191216833 │
│ Dallas/Fort Worth, TX │ Lihue, HI │ 17 │ 497.52941176470586 │
│ Honolulu, HI │ Charlotte, NC │ 223 │ 492.3273542600897 │
│ Honolulu, HI │ Cincinnati, OH │ 1177 │ 484.4086661002549 │
│ Minneapolis, MN │ Honolulu, HI │ 5430 │ 477.9731123388582 │
│ Kahului, HI │ Atlanta, GA │ 173 │ 477.9364161849711 │
│ Honolulu, HI │ Detroit, MI │ 467 │ 469.3468950749465 │
│ Houston, TX │ Kahului, HI │ 761 │ 461.227332457293 │
│ Dallas/Fort Worth, TX │ Kona, HI │ 148 │ 461.0945945945946 │
│ Dallas/Fort Worth, TX │ Kahului, HI │ 6913 │ 460.49631129755534 │
└───────────────────────┴────────────────┴─────────┴────────────────────┘
20 rows in set. Elapsed: 62.137 sec. Processed 176.67 million rows, 8.54 GB (2.84 million rows/s., 137.39 MB/s.)
按航空公司进行划分的到达时间延迟分布:
SELECT \
Carrier, \
count() AS c, \
round(quantileTDigest(0.99)(DepDelay), 2) AS q \
FROM ontime \
GROUP BY Carrier \
ORDER BY q DESC;
┌─Carrier─┬────────c─┬──────q─┐
│ B6 │ 2991782 │ 191.22 │
│ NK │ 412396 │ 190.94 │
│ EV │ 6222018 │ 187.23 │
│ XE │ 2145095 │ 179.55 │
│ VX │ 371390 │ 178.31 │
│ YV │ 1704176 │ 178.03 │
│ DH │ 693047 │ 165.18 │
│ F9 │ 1120723 │ 162.58 │
│ FL │ 2485709 │ 156.03 │
│ 9E │ 1342097 │ 153.94 │
│ TZ │ 208420 │ 152.52 │
│ OO │ 8583371 │ 151.77 │
│ OH │ 1765828 │ 148.35 │
│ RU │ 1314294 │ 147.34 │
│ MQ │ 6877396 │ 145.5 │
│ CO │ 8784850 │ 139.17 │
│ EA │ 880824 │ 131.67 │
│ AS │ 4270919 │ 124.17 │
│ NW │ 10473832 │ 118.52 │
│ HP │ 3587974 │ 118.39 │
│ TW │ 3692615 │ 117.15 │
│ US │ 16084998 │ 113.19 │
│ PI │ 833073 │ 105.93 │
│ ML │ 70622 │ 102.34 │
│ PA │ 302766 │ 98.72 │
│ PS │ 83617 │ 95.84 │
│ AL │ 455873 │ 77.94 │
│ UA │ 17357913 │ 71.09 │
│ HA │ 935934 │ 61.49 │
│ AQ │ 154381 │ 60.14 │
│ AA │ 20571665 │ 9.02 │
│ DL │ 23240979 │ 4 │
│ WN │ 26648077 │ 4 │
└─────────┴──────────┴────────┘
33 rows in set. Elapsed: 160.932 sec. Processed 176.67 million rows, 1.06 GB (1.10 million rows/s., 6.59 MB/s.)
停止航班运营的航空公司:
SELECT \
Carrier, \
min(Year), \
max(Year), \
count() \
FROM ontime \
GROUP BY Carrier \
HAVING max(Year) < 2015 \
ORDER BY count() DESC;
┌─Carrier─┬─min(Year)─┬─max(Year)─┬──count()─┐
│ NW │ 1987 │ 2009 │ 10473832 │
│ CO │ 1987 │ 2011 │ 8784850 │
│ TW │ 1987 │ 2001 │ 3692615 │
│ HP │ 1987 │ 2005 │ 3587974 │
│ FL │ 2003 │ 2014 │ 2485709 │
│ XE │ 2006 │ 2011 │ 2145095 │
│ OH │ 2004 │ 2010 │ 1765828 │
│ YV │ 2006 │ 2013 │ 1704176 │
│ 9E │ 2007 │ 2013 │ 1342097 │
│ RU │ 2003 │ 2006 │ 1314294 │
│ EA │ 1987 │ 1990 │ 880824 │
│ PI │ 1987 │ 1989 │ 833073 │
│ DH │ 2003 │ 2005 │ 693047 │
│ AL │ 1987 │ 1988 │ 455873 │
│ PA │ 1987 │ 1991 │ 302766 │
│ TZ │ 2003 │ 2006 │ 208420 │
│ AQ │ 2000 │ 2008 │ 154381 │
│ PS │ 1987 │ 1988 │ 83617 │
│ ML │ 1991 │ 1991 │ 70622 │
└─────────┴───────────┴───────────┴──────────┘
19 rows in set. Elapsed: 8.625 sec. Processed 176.67 million rows, 706.70 MB (20.48 million rows/s., 81.93 MB/s.)
2015年最具趋向目的地城市:
SELECT \
DestCityName, \
sum(Year = 2014) AS c2014, \
sum(Year = 2015) AS c2015, \
c2015 / c2014 AS diff \
FROM ontime \
WHERE Year IN (2014, 2015) \
GROUP BY DestCityName \
HAVING (c2014 > 10000) AND (c2015 > 1000) AND (diff > 1) \
ORDER BY diff DESC;
┌─DestCityName───────────────────┬──c2014─┬──c2015─┬───────────────diff─┐
│ Dallas, TX │ 48294 │ 65633 │ 1.359030107259701 │
│ Fort Lauderdale, FL │ 64109 │ 79419 │ 1.2388120232728634 │
│ Minneapolis, MN │ 106202 │ 122751 │ 1.1558256906649593 │
│ Boise, ID │ 11228 │ 12819 │ 1.1416993231207695 │
│ Detroit, MI │ 105984 │ 118311 │ 1.116310009057971 │
│ Seattle, WA │ 108722 │ 121292 │ 1.1156159746877357 │
│ Kona, HI │ 10992 │ 12080 │ 1.098981077147016 │
│ Fort Myers, FL │ 26641 │ 29127 │ 1.0933148155099284 │
│ Orlando, FL │ 110409 │ 120028 │ 1.0871215208905072 │
│ Memphis, TN │ 15038 │ 16287 │ 1.083056257481048 │
│ West Palm Beach/Palm Beach, FL │ 22523 │ 24320 │ 1.0797851085556986 │
│ Oakland, CA │ 43266 │ 46325 │ 1.0707021679840985 │
│ Austin, TX │ 43095 │ 46092 │ 1.0695440306300035 │
│ Chicago, IL │ 375875 │ 402011 │ 1.0695337545726638 │
│ Boston, MA │ 110630 │ 118012 │ 1.0667269275964928 │
│ Las Vegas, NV │ 137058 │ 145900 │ 1.0645128339826935 │
│ Tampa, FL │ 64879 │ 69062 │ 1.0644738667365403 │
│ Cincinnati, OH │ 20769 │ 21944 │ 1.0565747026818817 │
│ New Orleans, LA │ 40490 │ 42472 │ 1.0489503581131143 │
│ Santa Ana, CA │ 39142 │ 40733 │ 1.040646875479025 │
│ Baltimore, MD │ 90845 │ 94105 │ 1.035885299135891 │
│ Anchorage, AK │ 16791 │ 17233 │ 1.0263236257518908 │
│ Atlanta, GA │ 369842 │ 379498 │ 1.0261084463095052 │
│ San Juan, PR │ 25900 │ 26529 │ 1.0242857142857142 │
│ Lihue, HI │ 11165 │ 11427 │ 1.0234661889834304 │
│ Kahului, HI │ 21953 │ 22461 │ 1.0231403452831047 │
│ Grand Rapids, MI │ 11513 │ 11767 │ 1.0220620168505168 │
│ Honolulu, HI │ 46310 │ 46937 │ 1.0135391923990498 │
│ New York, NY │ 207502 │ 210245 │ 1.0132191496949428 │
│ Newark, NJ │ 110221 │ 111486 │ 1.0114769417806044 │
│ Cleveland, OH │ 37478 │ 37801 │ 1.008618389455147 │
│ Buffalo, NY │ 18381 │ 18416 │ 1.0019041401447146 │
│ Providence, RI │ 12152 │ 12157 │ 1.0004114549045424 │
└────────────────────────────────┴────────┴────────┴────────────────────┘
33 rows in set. Elapsed: 5.572 sec. Processed 12.95 million rows, 312.16 MB (2.32 million rows/s., 56.02 MB/s.)
最受欢迎的季节性旅游目的地城市:
SELECT \
DestCityName, \
any(total), \
avg(abs((monthly * 12) - total) / total) AS avg_month_diff \
FROM \
( \
SELECT \
DestCityName, \
count() AS total \
FROM ontime \
GROUP BY DestCityName \
HAVING total > 100000 \
) ALL INNER JOIN \
( \
SELECT \
DestCityName, \
Month, \
count() AS monthly \
FROM ontime \
GROUP BY \
DestCityName, \
Month \
HAVING monthly > 10000 \
) USING (DestCityName) \
GROUP BY DestCityName \
ORDER BY avg_month_diff DESC \
LIMIT 20;
┌─DestCityName───────────────────┬─any(total)─┬───────avg_month_diff─┐
│ Juneau, AK │ 127029 │ 0.26276362090546257 │
│ Bozeman, MT │ 107007 │ 0.23356415935406094 │
│ Palm Springs, CA │ 241336 │ 0.23237312294891765 │
│ Fort Myers, FL │ 642191 │ 0.19487478543507045 │
│ Anchorage, AK │ 550641 │ 0.1817055032226078 │
│ Fairbanks, AK │ 131135 │ 0.13696318043746267 │
│ Valparaiso, FL │ 109145 │ 0.13496724540748545 │
│ Sarasota/Bradenton, FL │ 202931 │ 0.11884252939833408 │
│ Myrtle Beach, SC │ 120790 │ 0.11748607382351896 │
│ West Palm Beach/Palm Beach, FL │ 741018 │ 0.1156544105541296 │
│ Portland, ME │ 214450 │ 0.10123571928188389 │
│ Eugene, OR │ 114268 │ 0.09463716876115798 │
│ Seattle, WA │ 2968380 │ 0.07901751123508446 │
│ San Juan, PR │ 638327 │ 0.07704384534363527 │
│ Billings, MT │ 121895 │ 0.0706919890069322 │
│ Burlington, VT │ 149777 │ 0.06467615187912697 │
│ Fort Lauderdale, FL │ 1641563 │ 0.06102527489553147 │
│ Lihue, HI │ 181427 │ 0.06057992103343676 │
│ Savannah, GA │ 265786 │ 0.05747606470368392 │
│ Kona, HI │ 188276 │ 0.057143059480054104 │
└────────────────────────────────┴────────────┴──────────────────────┘
20 rows in set. Elapsed: 34.215 sec. Processed 353.34 million rows, 8.01 GB (10.33 million rows/s., 234.01 MB/s.)
注:你看到我上面有的执行结果在一分钟左右了都,觉的也不快啊。我这里是在Windows10(8G内存)下的VMware虚拟机下跑的,可能体现不出性能来。网上有的人在生产环境中配置高的服务器上跑的话,像我上面一分钟出结果的查询他们能够在一两秒左右
参考:
官方快速开始:https://clickhouse.yandex/#quick-start
官方使用指南:https://clickhouse.yandex/docs/en/single/index.html#create-table
俄语版:https://github.com/yandex/ClickHouse/blob/master/CHANGELOG_RU.md
英文版:https://github.com/yandex/ClickHouse/blob/master/CHANGELOG.md
战斗民族开源神器ClickHouse:一款适合于构建量化回测研究系统的高性能列式数据库(一):http://www.sohu.com/a/160303189_505915
战斗民族开源神器ClickHouse:一款适合于构建量化回测研究系统的高性能列式数据库(二)http://www.sohu.com/a/160527514_505915
新浪-高鹏-2017年11月:http://www.docin.com/p-2061139848.html?qq-pf-to=pcqq.temporaryc2c
彪悍开源的分析数据库-ClickHouse(知乎):https://zhuanlan.zhihu.com/p/22165241