bigquery使用教程
Google Sheets, Big Query, and Public Data Sets — Calculating Degree Days and K-Factor.
Google表格,大查询和公共数据集-计算学位日和K因子。
Have you ever wondered how home heating fuel companies know when to make a delivery? Even if you don’t get propane or home heating oil deliveries, you can still estimate your fuel usage over time by using the same formula used by energy companies by leveraging public weather datasets in BigQuery and your past heating bills.
您是否想过家用取暖燃料公司如何知道何时交货? 即使您没有收到丙烷或家庭取暖用油,也可以利用BigQuery中的公共天气数据集和您过去的取暖费,使用能源公司使用的相同公式来估算一段时间内的燃料使用量。
Much like tracking fuel economy (measured by miles per gallon) in your vehicle, energy companies use a formula that relies on historical weather data (measured in degree days) and a calculated value called the K-Factor — which represents the number of gallons of fuel burned per degree day (degree days per gallon). In this exercise, we’ll explore how to:
就像跟踪您车辆的燃油经济性(以英里/加仑为单位来衡量)一样,能源公司使用的公式依赖于历史天气数据(以度日为单位来衡量)和一个称为K因子的计算值-代表加仑数每度日燃烧的燃料(每加仑度日)。 在本练习中,我们将探索如何:
- Calculate degree days for a period using public weather datasets 使用公共天气数据集计算一个时期的学位日
- Calculate the K-Factor for your home by looking at past energy bills通过查看过去的电费单来计算您家中的K因子
汇总过去的交货数据(Assemble Past Delivery Data)
Even if you don’t track your heating fuel deliveries, you should able to download billing information from your provider. At a minimum, we’ll need:
即使您不跟踪加热燃料的交付,也应该能够从提供商那里下载账单信息。 至少,我们需要:
- Date of delivery 送货日期
- Number of gallons delivered加仑数
- Cost per gallon (optional)每加仑成本(可选)
Let’s put this data in a Google Sheet so we can then use it in BigQuery.
让我们将这些数据放在Google表格中,以便我们可以在BigQuery中使用它。
I started tracking this information in 2011 and have been using a Google Sheet to record the information above as well as a few other simple calculations (such as cost per month, gallons per day, etc). While Google Sheets is great for these simple calculations, I’ve not been able to find an easy way to incorporate weather data and do complex calculations.
我从2011年开始跟踪此信息,并一直使用Google表格来记录上述信息以及其他一些简单的计算(例如每月成本,每天加仑数等)。 虽然Google表格非常适合进行这些简单的计算,但我仍然找不到一种简单的方法来合并天气数据并进行复杂的计算。
从BigQuery查询Google表格数据 (Query Google Sheet Data from BigQuery)
Now that we’ve recorded delivery data in a Google Sheet, we’ll need to make the Sheet available in BigQuery by querying Google Drive data. We’ll create a new table in the home data dataset.
现在,我们已经在Google表格中记录了投放数据,我们需要通过查询Google云端硬盘数据来使表格在BigQuery中可用。 我们将在家庭数据集中创建一个新表。
Click CREATE TABLE.
单击创建表。
Under Source, click the drop-down to Create table from: and select Drive.
在“源”下,单击从以下位置创建表的下拉列表:,然后选择驱动器。
Paste in the Share Link from Google Sheets in the Select Drive URI field and select Google Sheets from the File format: dropdown. Complete the form by entering the following information:
在“选择驱动器URI”字段中粘贴Google表格的共享链接,然后从“文件格式:”下拉列表中选择“ Google表格”。 通过输入以下信息来完成表单:
Table Name: heating_oil
表名称:heating_oil
Schema: note Name, Type, and Mode in example above (for 3 columns in Google Sheet)
架构:在上面的示例中记下Name , Type和Mode (对于Google Sheet中的3列)
Advanced: Headers Rows to Skip: 1
高级:要跳过的标题行:1
Let’s confirm that we can query the data from Google Drive:
让我们确认我们可以从Google云端硬盘查询数据:
SELECT * FROM `bq-jake.homedata.heating_oil`
准备交货数据 (Preparing Delivery Data)
Since we’ll be calculating the number of days since the last delivery, let’s better organize the delivery dates so that we can calculate consumption and degree days between the current delivery and last delivery. To pair the last delivery date (previous row) with the current date, we’ll use the LAG function.
由于我们将计算自上次交付以来的天数,因此让我们更好地组织交付日期,以便我们可以计算当前交付与最后一次交付之间的消耗和度数天数。 要将最后一个交货日期(上一行)与当前日期配对,我们将使用LAG函数。
SELECT delivery_date,
LAG(delivery_date)
OVER (ORDER BY delivery_date) as previous_delivery,
FROM `bq-jake.homedata.heating_oil`
WHERE delivery_date is not null
ORDER BY delivery_date
To project out to today’s date from the last delivery, let’s add today’s date to the list. We’ll need this later when we calculate a delivery for “today”.
要从最后一次交付中推测出今天的日期,让我们将今天的日期添加到列表中。 我们稍后将在计算“今天”的交付量时需要此订单。
with get_deliveries as (
SELECT delivery_date,
LAG(delivery_date)
OVER (ORDER BY delivery_date) as previous_delivery,
FROM `bq-jake.homedata.heating_oil`
WHERE delivery_date is not null
ORDER BY delivery_date
)
-- append Today's date to end of list so we can estimate for today
,append_today as (
SELECT CURRENT_DATE('America/New_York') as delivery_date,
max(delivery_date) as previous_delivery
from get_deliveries
)SELECT * FROM get_deliveries
UNION ALL
SELECT * FROM append_today
Note: Today’s Date is 2020–09–20 注意:今天的日期是2020–09–20
Finally, let’s calculate the number of days between deliveries and add it as an additional column. We’ll use the DATE_DIFF function to calculate the number of days between deliveries and then the GENERATE_DATE_ARRAY function to list all of the dates excluding the next delivery date (removed with DATE_SUB):
最后,让我们计算两次交货之间的天数,并将其添加为附加列。 我们将使用DATE_DIFF函数计算两次交货之间的天数,然后使用GENERATE_DATE_ARRAY函数列出除下一个交货日期(已被DATE_SUB删除)以外的所有日期:
with get_deliveries as (
SELECT delivery_date,
LAG(delivery_date) OVER (ORDER BY delivery_date) as previous_delivery,
FROM `bq-jake.homedata.heating_oil`
WHERE delivery_date is not null
ORDER BY delivery_date
)
-- append Today's date to end of list so we can estimate for today
,append_today as (
SELECT CURRENT_DATE('America/New_York') as delivery_date,
max(delivery_date) as previous_delivery
from get_deliveries
)
,all_deliveries as (
SELECT * FROM get_deliveries
UNION ALL
SELECT * FROM append_today
)SELECT delivery_date,
previous_delivery,
DATE_DIFF(delivery_date, previous_delivery, DAY) as CALC_days, -- Number of days between deliveries
GENERATE_DATE_ARRAY(previous_delivery,DATE_SUB(delivery_date, INTERVAL 1 DAY), INTERVAL 1 DAY) as CALC_dates, -- list of dates between delivieries (exclude next delivery date)
from all_deliveries
We now see the additional column CALC_days with the number of days between deliveries. We also have an additional column named CALC_dates which is an array of dates starting at the previous delivery date going through the (day before the) delivery date (since it was excluded with DATE_SUB).
现在,我们看到附加列CALC_days ,其中包含两次交货之间的天数。 我们还有一个名为CALC_dates的附加列,该列是从上一个交付日期开始到该交付日期(该日期之前)之间的日期的数组(因为DATE_SUB排除了该日期)。
In order to join this data more easily, let’s flatten the array of dates between deliveries (CALC_dates) using the UNNEST command.
为了更轻松地加入此数据,让我们使用UNNEST命令展平两次交货之间的日期数组( CALC_dates )。
with get_deliveries as (
SELECT delivery_date,
LAG(delivery_date) OVER (ORDER BY delivery_date) as previous_delivery,
FROM `bq-jake.homedata.heating_oil`
WHERE delivery_date is not null
ORDER BY delivery_date
)
-- append Today's date to end of list so we can estimate for today
,append_today as (
SELECT CURRENT_DATE('America/New_York') as delivery_date,
max(delivery_date) as previous_delivery
from get_deliveries
)
,all_deliveries as (
SELECT * FROM get_deliveries
UNION ALL
SELECT * FROM append_today
)
,calc_days as (
SELECT delivery_date,
previous_delivery,
DATE_DIFF(delivery_date, previous_delivery, DAY) as CALC_days, -- Number of days between deliveries
-- GENERATE_DATE_ARRAY(previous_delivery,delivery_date, INTERVAL 1 DAY) as CALC_dates, -- list of dates between delivieries
GENERATE_DATE_ARRAY(previous_delivery,DATE_SUB(delivery_date, INTERVAL 1 DAY), INTERVAL 1 DAY) as CALC_dates, -- list of dates between delivieries (exclude next delivery date)
from all_deliveries
)-- unnest days by delivery date so we can join to weather data
SELECT delivery_date, previous_delivery, dates_flattened
from calc_days, UNNEST(CALC_dates) as dates_flattened
查找附近的气象站 (Find a nearby Weather Station)
We’ll start by gathering historical weather data from a nearby weather station. I’ve outlined these steps in the following Medium article:
我们将从收集附近气象站的历史天气数据开始。 我在下面的“中型”文章中概述了这些步骤:
Finding the Closest Weather Stations — BigQuery Public Datasets
查找最近的气象站-BigQuery公共数据集
Note the following identifiers:
请注意以下标识符:
stn — Station number (WMO/DATSAV3 number). Ex: 725037
stn —站号(WMO / DATSAV3号)。 例如: 725037
wban — historical “Weather Bureau Air Force Navy” number. Ex: ’94745
wban-历史“气象局空军海军”编号。 例如:' 94745
Once you’ve identified a nearby weather station with quality data, we’ll need to construct a query to gather the daily mean temperature reading for the duration of our analysis. Since the historical weather data is a large dataset, broken across several large tables (by year), we’ll need to be careful about how and how often we query this data.
在用质量数据确定附近的气象站后,我们将需要构建查询以收集分析期间的每日平均温度读数。 由于历史天气数据是一个大型数据集,分为多个大表(按年份),因此我们需要注意查询数据的方式和频率。
To query multiple years at once, we’ll take advantage of querying wildcard tables (tables of similar name and structure). For this query, we’ll only include the tables for the years 2011–2020 using the _TABLE_SUFFIX option in the WHERE clause:
要一次查询多个年份,我们将利用查询通配符表(名称和结构相似的表)的优势。 对于此查询,我们将仅使用WHERE子句中的_TABLE_SUFFIX选项包括2011–2020年的表:
SELECT CAST(CONCAT(year,'-',mo,'-',da) as DATE) as DATE,
temp
FROM
`bigquery-public-data.noaa_gsod.gsod20*`
WHERE
max != 9999.9 # code for missing data
AND _TABLE_SUFFIX BETWEEN '11'
AND '20'
AND stn = '725037'
AND wban = '94745'
ORDER BY DATE
The query quickly returned the > 3500 rows (one value per day) that we were expecting, but the query processed 1.7GB of data. It’ll be very expensive to run this every time, so let’s store it as a temp table by clicking the SAVE RESULTS button, then BigQuery Table. I saved this data is a temp dataset as a table named gsod_dates
该查询快速返回了我们期望的> 3500行(每天一个值),但是该查询处理了1.7GB的数据。 每次运行都会非常昂贵,因此让我们通过单击SAVE RESULTS按钮,然后单击BigQuery Table将其存储为临时表。 我保存的数据是一个临时数据集,名为gsod_dates
计算学位日 (Calculating Degree Days)
A degree day is calculated by taking the mean temperature in a given day and subtracting from the base temperature of 65. For example, on a given winter day where the mean temperature for the day was 40 degrees, the number of degree days would be 65–40=25.
度日是通过获取给定日的平均温度并减去65的基本温度来计算的。例如,在给定的冬日中,该日的平均温度为40度,则度日的数量为65 –40 = 25。
To complicate things a bit further, there is an adjustment for systems that also produce hot water. Since the mean temperature in summer months is typically above 65 (resulting in zero degree days), we need to compensate for hot water production from the heating system on those days. In the CASE statement below we compensate for hot water production based on a published conversation chart.
为了使事情更加复杂,还对也会产生热水的系统进行了调整。 由于夏季的平均温度通常高于65(导致零度日),因此我们需要补偿当日由加热系统产生的热水。 在下面的CASE语句中,我们根据已发布的会话图表补偿热水的产生。
with get_deliveries as (
SELECT delivery_date,
LAG(delivery_date) OVER (ORDER BY delivery_date) as previous_delivery,
FROM `bq-jake.homedata.heating_oil`
WHERE delivery_date is not null
ORDER BY delivery_date
)
-- append Today's date to end of list so we can estimate for today
,append_today as (
SELECT CURRENT_DATE('America/New_York') as delivery_date,
max(delivery_date) as previous_delivery
from get_deliveries
)
,all_deliveries as (
SELECT * FROM get_deliveries
UNION ALL
SELECT * FROM append_today
)
,calc_days as (
SELECT delivery_date,
previous_delivery,
DATE_DIFF(delivery_date, previous_delivery, DAY) as CALC_days, -- Number of days between deliveries
-- GENERATE_DATE_ARRAY(previous_delivery,delivery_date, INTERVAL 1 DAY) as CALC_dates, -- list of dates between delivieries
GENERATE_DATE_ARRAY(previous_delivery,DATE_SUB(delivery_date, INTERVAL 1 DAY), INTERVAL 1 DAY) as CALC_dates, -- list of dates between delivieries (exclude next delivery date)
from all_deliveries
)
,flatten_dates as (
-- unnest days by delivery date so we can join to weather data
SELECT delivery_date, previous_delivery, dates_flattened
from calc_days, UNNEST(CALC_dates) as dates_flattened
)
,GSOD_dates as (
SELECT * FROM `bq-jake.temp.gsod_dates`
)SELECT
delivery_date,
previous_delivery,
CAST(temp as Numeric) as mean_temp,
CASE
WHEN 65-CAST(temp as Numeric) > 0 THEN 65-CAST(temp as Numeric)
ELSE 0
END as CALC_degree_days,
CAST(CASE
WHEN CAST(temp as Numeric) >= 62 THEN 6
WHEN CAST(temp as Numeric) >= 58 THEN 5
WHEN CAST(temp as Numeric) >= 54 THEN 4
WHEN CAST(temp as Numeric) >= 50 THEN 3
WHEN CAST(temp as Numeric) >= 46 THEN 2
WHEN CAST(temp as Numeric) >= 43 THEN 1
ELSE 0.0
END as Numeric)as CALC_hot_water
-- http://www.degreeday.com/faqs.aspx
FROM flatten_dates
LEFT JOIN GSOD_dates
ON DATE = dates_flattened
Let’s wrap up this data by combining the calculated degree days and compensation for hot water production and then summing across delivery dates.
让我们结合计算出的度数天数和热水生产的补偿量,然后对交付日期进行求和来包装这些数据。
with get_deliveries as (
SELECT delivery_date,
LAG(delivery_date) OVER (ORDER BY delivery_date) as previous_delivery,
FROM `bq-jake.homedata.heating_oil`
WHERE delivery_date is not null
ORDER BY delivery_date
)-- append Today's date to end of list so we can estimate for today
,append_today as (
SELECT CURRENT_DATE('America/New_York') as delivery_date,
max(delivery_date) as previous_delivery
from get_deliveries
)
,all_deliveries as (
SELECT * FROM get_deliveries
UNION ALL
SELECT * FROM append_today
)
,calc_days as (
SELECT delivery_date,
previous_delivery,
DATE_DIFF(delivery_date, previous_delivery, DAY) as CALC_days, -- Number of days between deliveries
-- GENERATE_DATE_ARRAY(previous_delivery,delivery_date, INTERVAL 1 DAY) as CALC_dates, -- list of dates between delivieries
GENERATE_DATE_ARRAY(previous_delivery,DATE_SUB(delivery_date, INTERVAL 1 DAY), INTERVAL 1 DAY) as CALC_dates, -- list of dates between delivieries (exclude next delivery date)
from all_deliveries
)
,flatten_dates as (
-- unnest days by delivery date so we can join to weather data
SELECT delivery_date, previous_delivery, dates_flattened
from calc_days, UNNEST(CALC_dates) as dates_flattened
)
,GSOD_dates as (
SELECT * FROM `bq-jake.temp.gsod_dates`
)
,get_degree_days as (
SELECT
delivery_date,
previous_delivery,
CAST(temp as Numeric) as mean_temp,
CASE
WHEN 65-CAST(temp as Numeric) > 0 THEN 65-CAST(temp as Numeric)
ELSE 0
END as CALC_degree_days,
CAST(CASE
WHEN CAST(temp as Numeric) >= 62 THEN 6
WHEN CAST(temp as Numeric) >= 58 THEN 5
WHEN CAST(temp as Numeric) >= 54 THEN 4
WHEN CAST(temp as Numeric) >= 50 THEN 3
WHEN CAST(temp as Numeric) >= 46 THEN 2
WHEN CAST(temp as Numeric) >= 43 THEN 1
ELSE 0.0
END as Numeric)as CALC_hot_water
-- http://www.degreeday.com/faqs.aspx
FROM flatten_dates
LEFT JOIN GSOD_dates
ON DATE = dates_flattened
)SELECT delivery_date, previous_delivery, SUM(CALC_degree_days + CALC_hot_water) as degree_days
from get_degree_days
GROUP BY delivery_date, previous_delivery
计算K因子 (Calculating the K-Factor)
Now that we’ve summed the degree days between deliveries, we need to add back in the gallons delivered (from our Google Sheet brought into BigQuery). To determine the K-Factor for a given delivery, all we need to do is divide the number of degree days since the last delivery by the number of gallons delivered: degree_days/gallons_delivered.
既然我们已经总结了两次交付之间的度数天,我们就需要重新添加交付的加仑(从BigQuery中带入的Google表格中)。 要确定给定交付的K因子,我们要做的就是将自上次交付以来的度数天数除以交付的加仑数:degree_days / gallons_delivered。
Additionally, we’ll sneak in today’s date between the last delivery in the CASE statement.
此外,我们将在CASE语句中最后一次交付之间的今天日期开始潜行。
with get_deliveries as (
SELECT delivery_date,
LAG(delivery_date) OVER (ORDER BY delivery_date) as previous_delivery,
FROM `bq-jake.homedata.heating_oil`
WHERE delivery_date is not null
ORDER BY delivery_date
)
-- append Today's date to end of list so we can estimate for today
,append_today as (
SELECT CURRENT_DATE('America/New_York') as delivery_date,
max(delivery_date) as previous_delivery
from get_deliveries
)
,all_deliveries as (
SELECT * FROM get_deliveries
UNION ALL
SELECT * FROM append_today
)
,calc_days as (
SELECT delivery_date,
previous_delivery,
DATE_DIFF(delivery_date, previous_delivery, DAY) as CALC_days, -- Number of days between deliveries
-- GENERATE_DATE_ARRAY(previous_delivery,delivery_date, INTERVAL 1 DAY) as CALC_dates, -- list of dates between delivieries
GENERATE_DATE_ARRAY(previous_delivery,DATE_SUB(delivery_date, INTERVAL 1 DAY), INTERVAL 1 DAY) as CALC_dates, -- list of dates between delivieries (exclude next delivery date)
from all_deliveries
)
,GSOD_dates as (
SELECT * FROM `bq-jake.temp.gsod_dates`
)
,get_degree_days as (
SELECT
delivery_date,
previous_delivery,
CAST(temp as Numeric) as mean_temp,
CASE
WHEN 65-CAST(temp as Numeric) > 0 THEN 65-CAST(temp as Numeric)
ELSE 0
END as CALC_degree_days,
CAST(CASE
WHEN CAST(temp as Numeric) >= 62 THEN 6
WHEN CAST(temp as Numeric) >= 58 THEN 5
WHEN CAST(temp as Numeric) >= 54 THEN 4
WHEN CAST(temp as Numeric) >= 50 THEN 3
WHEN CAST(temp as Numeric) >= 46 THEN 2
WHEN CAST(temp as Numeric) >= 43 THEN 1
ELSE 0.0
END as Numeric)as CALC_hot_water
-- http://www.degreeday.com/faqs.aspx
FROM flatten_dates
LEFT JOIN GSOD_dates
ON DATE = dates_flattened
)
,sum_degree_days as (
SELECT delivery_date, previous_delivery, SUM(CALC_degree_days + CALC_hot_water) as degree_days
from get_degree_days
GROUP BY delivery_date, previous_delivery
)SELECT CASE
WHEN a.delivery_date is NULL THEN CURRENT_DATE('America/New_York') -- so we can estimate for today's date
ELSE a.delivery_date
END as delivery_date,
a.gallons,a.cost_per_gallon,
c.previous_delivery,
b.degree_days,
degree_days/gallons as Kfactor
FROM sum_degree_days b, all_deliveries c
LEFT JOIN `bq-jake.homedata.heating_oil` a
ON a.delivery_date = b.delivery_date
WHERE c.previous_delivery = b.previous_delivery
In order to project “today’s” delivery, we’ll calculate a mean K-Factor based on our delivery and usage history. We’ll simply sum all of the K-Factors that we’ve calculated per deliver and divide by the number of deliveries to estimate a mean K-Factor to associate with “today’s” delivery.
为了预测“今天的”交付量,我们将根据我们的交付和使用历史记录来计算平均K因子。 我们将对每次交付计算出的所有K因子进行简单求和,然后除以交付数量,以估算与“今天”交付相关的平均K因子。
As a bonus, we’ll create a comments field to explain what’s happening in each row.
另外,我们将创建一个注释字段来解释每一行中发生的事情。
with get_deliveries as (
SELECT delivery_date,
LAG(delivery_date) OVER (ORDER BY delivery_date) as previous_delivery,
FROM `bq-jake.homedata.heating_oil`
WHERE delivery_date is not null
ORDER BY delivery_date
)
-- append Today's date to end of list so we can estimate for today
,append_today as (
SELECT CURRENT_DATE('America/New_York') as delivery_date,
max(delivery_date) as previous_delivery
from get_deliveries
)
,all_deliveries as (
SELECT * FROM get_deliveries
UNION ALL
SELECT * FROM append_today
)
,calc_days as (
SELECT delivery_date,
previous_delivery,
DATE_DIFF(delivery_date, previous_delivery, DAY) as CALC_days, -- Number of days between deliveries
-- GENERATE_DATE_ARRAY(previous_delivery,delivery_date, INTERVAL 1 DAY) as CALC_dates, -- list of dates between delivieries
GENERATE_DATE_ARRAY(previous_delivery,DATE_SUB(delivery_date, INTERVAL 1 DAY), INTERVAL 1 DAY) as CALC_dates, -- list of dates between delivieries (exclude next delivery date)
from all_deliveries
)
,flatten_dates as (
-- unnest days by delivery date so we can join to weather data
SELECT delivery_date, previous_delivery, dates_flattened
from calc_days, UNNEST(CALC_dates) as dates_flattened
)
,GSOD_dates as (
SELECT * FROM `bq-jake.temp.gsod_dates`
)
,get_degree_days as (
SELECT
delivery_date,
previous_delivery,
CAST(temp as Numeric) as mean_temp,
CASE
WHEN 65-CAST(temp as Numeric) > 0 THEN 65-CAST(temp as Numeric)
ELSE 0
END as CALC_degree_days,
CAST(CASE
WHEN CAST(temp as Numeric) >= 62 THEN 6
WHEN CAST(temp as Numeric) >= 58 THEN 5
WHEN CAST(temp as Numeric) >= 54 THEN 4
WHEN CAST(temp as Numeric) >= 50 THEN 3
WHEN CAST(temp as Numeric) >= 46 THEN 2
WHEN CAST(temp as Numeric) >= 43 THEN 1
ELSE 0.0
END as Numeric)as CALC_hot_water
-- http://www.degreeday.com/faqs.aspx
FROM flatten_dates
LEFT JOIN GSOD_dates
ON DATE = dates_flattened
)
,sum_degree_days as (
SELECT delivery_date, previous_delivery, SUM(CALC_degree_days + CALC_hot_water) as degree_days
from get_degree_days
GROUP BY delivery_date, previous_delivery
)
,all_history as (
SELECT CASE
WHEN a.delivery_date is NULL THEN CURRENT_DATE('America/New_York') -- so we can estimate for today's date
ELSE a.delivery_date
END as delivery_date,
a.* EXCEPT(delivery_date),
c.previous_delivery,
b.* EXCEPT(delivery_date, previous_delivery),
degree_days/gallons as Kfactor
FROM sum_degree_days b, all_deliveries c
LEFT JOIN `bq-jake.homedata.heating_oil` a
ON a.delivery_date = b.delivery_date
WHERE c.previous_delivery = b.previous_delivery
)
,calc_kfactor as (
SELECT CURRENT_DATE('America/New_York') as DATE,
SUM(Kfactor) / COUNT(Kfactor) as mean_Kfactor
from all_history
)
,put_it_all_together as (
SELECT delivery_date,
previous_delivery,
CASE
WHEN gallons is not null THEN gallons
ELSE degree_days / mean_Kfactor
END as gallons,
cost_per_gallon,
degree_days,
CASE
WHEN Kfactor > 0 THEN Kfactor
WHEN Kfactor is null THEN mean_Kfactor
ELSE null
END as Kfactor,
CASE
WHEN company is null THEN CONCAT('Kfactor estimated from mean Kfactor, gallons calculated to be delivered on ', delivery_date)
ELSE CONCAT(gallons,' gallons were delivered on ',delivery_date,' with ',degree_days,' degree days since last delivery on ',previous_delivery)
END as Comments
FROM all_history
LEFT JOIN calc_kfactor
ON DATE = delivery_date
)SELECT * FROM put_it_all_together
ORDER BY delivery_date DESC
结论 (Conclusion)
And there you have it — a guide to estimating your fuel usage over time! Now, the next time the fuel oil delivery truck shows up, I can wow my delivery driver by accurately predicting the number of gallons he or she is about to deliver!
在那里,您可以找到随时间推移估算燃油使用量的指南! 现在,下一次燃油运输卡车出现时,我可以通过准确地预测他或她将要运输的加仑数来向运输驾驶员赞叹!
Full code on GitHub.
GitHub上的完整代码。
翻译自: https://codeburst.io/using-bigquery-to-track-and-estimate-home-heating-oil-deliveries-2caa6a15eead
bigquery使用教程