我们在上一篇中介绍了 Select 语句,接下来我们将使用 Where 子句,对数据进行筛选。
使用Jupyter Notebook 运行 SQL 语句需安装 ipython-sql
%sql 以及 %%sql 为在 Notebook 中运行 SQL 语句,在 SQLite 命令行或 SQLite Stiduo 中不需要 %sql 或 %%sql
载入 SQL 以及连接 SQLite:
%load_ext sql
%sql sqlite:///DataBase/weather_stations.db
'Connected: @DataBase/weather_stations.db'
本文将使用 weather_stations.db 数据库,其中包含了 STATION_DATA 表。
首先查看 STATION_DATA 表中的数据:
%sql select * from station_data limit 0,10; -- 筛选前十行
* sqlite:///DataBase/weather_stations.db
Done.
station_number | report_code | year | month | day | dew_point | station_pressure | visibility | wind_speed | temperature | precipitation | snow_depth | fog | rain | hail | thunder | tornado |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
143080 | 34DDA7 | 2002 | 12 | 21 | 33.8 | 987.4 | 3.4 | 0.2 | 36 | 0 | None | 1 | 1 | 1 | 1 | 1 |
766440 | 39537B | 1998 | 10 | 1 | 72.7 | 1014.6 | 5.9 | 6.7 | 83.3 | 0 | None | 0 | 0 | 0 | 0 | 0 |
176010 | C3C6D5 | 2001 | 5 | 18 | 55.7 | None | 7.3 | 4.3 | 69.1 | 0 | None | 0 | 0 | 0 | 0 | 0 |
125600 | 145150 | 2007 | 10 | 14 | 33 | None | 6.9 | 2.5 | 39.7 | 0 | None | 0 | 0 | 0 | 0 | 0 |
470160 | EF616A | 1967 | 7 | 29 | 65.6 | None | 9.2 | 1.2 | 72.4 | 0.04 | None | 0 | 0 | 0 | 0 | 0 |
821930 | 1F8A7B | 1953 | 6 | 18 | 72.8 | 1007.1 | 12.4 | 3.6 | 81.3 | 0 | None | 0 | 0 | 0 | 0 | 0 |
478070 | D028D8 | 1981 | 6 | 27 | 73.4 | None | 7.9 | 3 | 77 | 1.93 | None | 0 | 0 | 0 | 0 | 0 |
719200 | C74611 | 1978 | 2 | 5 | -4.4 | 962.9 | 14.9 | 13.3 | 1.6 | 0 | 9.8 | 0 | 0 | 0 | 0 | 0 |
477460 | 737090 | 1962 | 8 | 14 | 72.3 | 1009.6 | 24.1 | 5.1 | 84.5 | 0 | None | 0 | 0 | 0 | 0 | 0 |
598550 | C5C66E | 2006 | 10 | 15 | 72.9 | None | 14.2 | 1.7 | 82 | 0 | None | 0 | 0 | 0 | 0 | 0 |
假如我们只对 STATION_DATA 表中 2010 年的数据感兴趣,则使用 Where 子句是一个非常直接的方法。通过这个查询,你可以只返回 year 中只等于 2010 的记录:
%%sql
select * from station_data
where year == 2010
limit 0,3; -- 由于数据太多,我们只展示前 3 条记录
* sqlite:///DataBase/weather_stations.db
Done.
station_number | report_code | year | month | day | dew_point | station_pressure | visibility | wind_speed | temperature | precipitation | snow_depth | fog | rain | hail | thunder | tornado |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
719160 | BAB974 | 2010 | 1 | 22 | -22.8 | 1014.2 | None | 10.2 | -18.5 | 0 | 9.4 | 0 | 0 | 0 | 0 | 0 |
766870 | 7C0938 | 2010 | 3 | 22 | 48 | 871.2 | 4.4 | 1.5 | 50.8 | 0.11 | None | 1 | 1 | 1 | 1 | 1 |
134624 | 11CEA1 | 2010 | 2 | 17 | 46 | None | 3.4 | 2.6 | 46 | None | None | 0 | 0 | 0 | 0 | 0 |
同样的,你也可以使用 !< 或者 <> 来筛选信息:
%%sql
select * from station_data
where year != 2010
limit 0,3; -- 由于数据太多,我们只展示前 3 条记录
* sqlite:///DataBase/weather_stations.db
Done.
station_number | report_code | year | month | day | dew_point | station_pressure | visibility | wind_speed | temperature | precipitation | snow_depth | fog | rain | hail | thunder | tornado |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
143080 | 34DDA7 | 2002 | 12 | 21 | 33.8 | 987.4 | 3.4 | 0.2 | 36 | 0 | None | 1 | 1 | 1 | 1 | 1 |
766440 | 39537B | 1998 | 10 | 1 | 72.7 | 1014.6 | 5.9 | 6.7 | 83.3 | 0 | None | 0 | 0 | 0 | 0 | 0 |
176010 | C3C6D5 | 2001 | 5 | 18 | 55.7 | None | 7.3 | 4.3 | 69.1 | 0 | None | 0 | 0 | 0 | 0 | 0 |
我们也可以使用 between 条件来筛选范围:
%%sql
select * from station_data
where year between 2005 and 2010
limit 0,3; -- 由于数据太多,我们只展示前 3 条记录
* sqlite:///DataBase/weather_stations.db
Done.
station_number | report_code | year | month | day | dew_point | station_pressure | visibility | wind_speed | temperature | precipitation | snow_depth | fog | rain | hail | thunder | tornado |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
125600 | 145150 | 2007 | 10 | 14 | 33 | None | 6.9 | 2.5 | 39.7 | 0 | None | 0 | 0 | 0 | 0 | 0 |
598550 | C5C66E | 2006 | 10 | 15 | 72.9 | None | 14.2 | 1.7 | 82 | 0 | None | 0 | 0 | 0 | 0 | 0 |
941830 | 229317 | 2007 | 4 | 19 | 66.5 | 994.9 | None | 4 | 76.3 | 0 | None | 0 | 0 | 0 | 0 | 0 |
一个 between 条件相当于表达了大于等于和小于等于,即 and 条件:
%%sql
select * from station_data
where year >=2005 and year <=2010
limit 0,3; -- 由于数据太多,我们只展示前 3 条记录
* sqlite:///DataBase/weather_stations.db
Done.
station_number | report_code | year | month | day | dew_point | station_pressure | visibility | wind_speed | temperature | precipitation | snow_depth | fog | rain | hail | thunder | tornado |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
125600 | 145150 | 2007 | 10 | 14 | 33 | None | 6.9 | 2.5 | 39.7 | 0 | None | 0 | 0 | 0 | 0 | 0 |
598550 | C5C66E | 2006 | 10 | 15 | 72.9 | None | 14.2 | 1.7 | 82 | 0 | None | 0 | 0 | 0 | 0 | 0 |
941830 | 229317 | 2007 | 4 | 19 | 66.5 | 994.9 | None | 4 | 76.3 | 0 | None | 0 | 0 | 0 | 0 | 0 |
也可以通过 or 条件筛选记录:
%%sql
select * from station_data
where Month==3
or Month==6
or Month==9
or Month==12
limit 0,3;
* sqlite:///DataBase/weather_stations.db
Done.
station_number | report_code | year | month | day | dew_point | station_pressure | visibility | wind_speed | temperature | precipitation | snow_depth | fog | rain | hail | thunder | tornado |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
143080 | 34DDA7 | 2002 | 12 | 21 | 33.8 | 987.4 | 3.4 | 0.2 | 36 | 0 | None | 1 | 1 | 1 | 1 | 1 |
821930 | 1F8A7B | 1953 | 6 | 18 | 72.8 | 1007.1 | 12.4 | 3.6 | 81.3 | 0 | None | 0 | 0 | 0 | 0 | 0 |
478070 | D028D8 | 1981 | 6 | 27 | 73.4 | None | 7.9 | 3 | 77 | 1.93 | None | 0 | 0 | 0 | 0 | 0 |
这看起来有点麻烦,我们可以使用 in 来同样筛选记录:
%%sql
select * from station_data
where Month in (3,6,9,12)
limit 0,3;
* sqlite:///DataBase/weather_stations.db
Done.
station_number | report_code | year | month | day | dew_point | station_pressure | visibility | wind_speed | temperature | precipitation | snow_depth | fog | rain | hail | thunder | tornado |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
143080 | 34DDA7 | 2002 | 12 | 21 | 33.8 | 987.4 | 3.4 | 0.2 | 36 | 0 | None | 1 | 1 | 1 | 1 | 1 |
821930 | 1F8A7B | 1953 | 6 | 18 | 72.8 | 1007.1 | 12.4 | 3.6 | 81.3 | 0 | None | 0 | 0 | 0 | 0 | 0 |
478070 | D028D8 | 1981 | 6 | 27 | 73.4 | None | 7.9 | 3 | 77 | 1.93 | None | 0 | 0 | 0 | 0 | 0 |
或者这样写:
%%sql
select * from station_data
where Month % 3 == 0
limit 0,3;
* sqlite:///DataBase/weather_stations.db
Done.
station_number | report_code | year | month | day | dew_point | station_pressure | visibility | wind_speed | temperature | precipitation | snow_depth | fog | rain | hail | thunder | tornado |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
143080 | 34DDA7 | 2002 | 12 | 21 | 33.8 | 987.4 | 3.4 | 0.2 | 36 | 0 | None | 1 | 1 | 1 | 1 | 1 |
821930 | 1F8A7B | 1953 | 6 | 18 | 72.8 | 1007.1 | 12.4 | 3.6 | 81.3 | 0 | None | 0 | 0 | 0 | 0 | 0 |
478070 | D028D8 | 1981 | 6 | 27 | 73.4 | None | 7.9 | 3 | 77 | 1.93 | None | 0 | 0 | 0 | 0 | 0 |
如果你不想要 3,6,9,12 月份的数据,你可以使用 not in:
%%sql
select * from station_data
where Month not in (3,6,9,12)
limit 0,3;
* sqlite:///DataBase/weather_stations.db
Done.
station_number | report_code | year | month | day | dew_point | station_pressure | visibility | wind_speed | temperature | precipitation | snow_depth | fog | rain | hail | thunder | tornado |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
766440 | 39537B | 1998 | 10 | 1 | 72.7 | 1014.6 | 5.9 | 6.7 | 83.3 | 0 | None | 0 | 0 | 0 | 0 | 0 |
176010 | C3C6D5 | 2001 | 5 | 18 | 55.7 | None | 7.3 | 4.3 | 69.1 | 0 | None | 0 | 0 | 0 | 0 | 0 |
125600 | 145150 | 2007 | 10 | 14 | 33 | None | 6.9 | 2.5 | 39.7 | 0 | None | 0 | 0 | 0 | 0 | 0 |
我们已经举了几个将 where 用于数字字段的例子,对于文字字段,方法也是大同小异的,同样可以使用 =, AND, OR 和 IN 。不同的是,对于文字,需要使用单引号:
%%sql
select * from station_data
where report_code == '513A63';
* sqlite:///DataBase/weather_stations.db
Done.
station_number | report_code | year | month | day | dew_point | station_pressure | visibility | wind_speed | temperature | precipitation | snow_depth | fog | rain | hail | thunder | tornado |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
702223 | 513A63 | 2010 | 1 | 22 | -23.1 | None | 10 | 0.8 | -15.6 | 0 | None | 0 | 0 | 0 | 0 | 0 |
注意 report_code 的格式是 text(而不是number),我们需要加上单引号 ‘513A63’ ,如果没有单引号,SQL将会误认为 513A63 是一列而不是一个值,这将会造成错误。
单引号适用于所有的文字操作,包括 IN 操作:
%%sql
select * from station_data
where report_code in ('513A63', '1F8A7B', 'EF616A')
limit 0,3;
* sqlite:///DataBase/weather_stations.db
Done.
station_number | report_code | year | month | day | dew_point | station_pressure | visibility | wind_speed | temperature | precipitation | snow_depth | fog | rain | hail | thunder | tornado |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
470160 | EF616A | 1967 | 7 | 29 | 65.6 | None | 9.2 | 1.2 | 72.4 | 0.04 | None | 0 | 0 | 0 | 0 | 0 |
821930 | 1F8A7B | 1953 | 6 | 18 | 72.8 | 1007.1 | 12.4 | 3.6 | 81.3 | 0 | None | 0 | 0 | 0 | 0 | 0 |
702223 | 513A63 | 2010 | 1 | 22 | -23.1 | None | 10 | 0.8 | -15.6 | 0 | None | 0 | 0 | 0 | 0 | 0 |
在使用 Where 和 Select 时,还有一些很有用的文字操作和函数,比如 length() 函数可以计算长度,来返回 report_code 不等于 6 的记录:
%%sql
select * from station_data
where length(report_code) != 6;
另一个常见的操作符为一个通配符加上一个 like 表达,% 为任意长度的字符、_ 为任意单字符。如果你想要找到 report_code 中以 “A” 开头的记录,可以使用 ‘A%’:
%%sql
select * from station_data
where report_code like 'A%'
limit 0,3;
* sqlite:///DataBase/weather_stations.db
Done.
station_number | report_code | year | month | day | dew_point | station_pressure | visibility | wind_speed | temperature | precipitation | snow_depth | fog | rain | hail | thunder | tornado |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
484750 | A38C90 | 1988 | 6 | 24 | 72.6 | None | 8.7 | 3.1 | 87.5 | 0 | None | 0 | 0 | 0 | 0 | 0 |
985310 | A79DEC | 2007 | 7 | 31 | 77.6 | None | 11.8 | 3.4 | 82.5 | 0 | None | 0 | 0 | 0 | 0 | 0 |
724505 | A49553 | 2005 | 4 | 28 | 42.7 | None | 6.8 | 11.2 | 55.4 | 0.42 | None | 0 | 0 | 0 | 0 | 0 |
如果你想要寻找以 “B” 开头并且第三个字母是 “C” 的记录,你可以使用下划线(_)来作为第二个位置:
%%sql
select * from station_data
where report_code like 'B_C%'
limit 0,3;
* sqlite:///DataBase/weather_stations.db
Done.
station_number | report_code | year | month | day | dew_point | station_pressure | visibility | wind_speed | temperature | precipitation | snow_depth | fog | rain | hail | thunder | tornado |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
999999 | B6C2DE | 1966 | 2 | 8 | 38.8 | 992.9 | 15.2 | 5.5 | 52.5 | 0 | None | 0 | 0 | 0 | 0 | 0 |
60110 | B8CB27 | 1997 | 1 | 20 | 41.7 | 1008.3 | 13.1 | 19.1 | 44.7 | 0.04 | None | 0 | 0 | 0 | 0 | 0 |
64080 | BECB51 | 1982 | 8 | 8 | 59 | None | 2.5 | 11 | 65.5 | 0 | None | 0 | 0 | 0 | 0 | 0 |
布尔值是 true 或 false 值。在某些数据库中,使用 1、0 来代替 true 和 false,还有一些数据库(如MySQL)允许你直接使用 true 和 false,比如:
%%sql
select * from station_data
where tornado == true and hail == true
limit 0,3;
* sqlite:///DataBase/weather_stations.db
Done.
station_number | report_code | year | month | day | dew_point | station_pressure | visibility | wind_speed | temperature | precipitation | snow_depth | fog | rain | hail | thunder | tornado |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
143080 | 34DDA7 | 2002 | 12 | 21 | 33.8 | 987.4 | 3.4 | 0.2 | 36 | 0 | None | 1 | 1 | 1 | 1 | 1 |
724320 | 207979 | 1988 | 3 | 4 | 33.1 | 999.4 | 3.1 | 9.3 | 35.1 | 0.23 | None | 1 | 1 | 1 | 1 | 1 |
743920 | 2ABE7D | 1996 | 5 | 21 | 57.6 | None | 5.8 | 7.5 | 70 | 0 | None | 1 | 1 | 1 | 1 | 1 |
SQLite 之前好像不支持直接使用 true 和 false ,只能使用 1 和 0,但经测试现在也可以了
在 SQLite 中也可以使用这种形式:
%%sql
select * from station_data
where tornado == 1 and hail == 1
limit 0,3;
* sqlite:///DataBase/weather_stations.db
Done.
station_number | report_code | year | month | day | dew_point | station_pressure | visibility | wind_speed | temperature | precipitation | snow_depth | fog | rain | hail | thunder | tornado |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
143080 | 34DDA7 | 2002 | 12 | 21 | 33.8 | 987.4 | 3.4 | 0.2 | 36 | 0 | None | 1 | 1 | 1 | 1 | 1 |
724320 | 207979 | 1988 | 3 | 4 | 33.1 | 999.4 | 3.1 | 9.3 | 35.1 | 0.23 | None | 1 | 1 | 1 | 1 | 1 |
743920 | 2ABE7D | 1996 | 5 | 21 | 57.6 | None | 5.8 | 7.5 | 70 | 0 | None | 1 | 1 | 1 | 1 | 1 |
如果你正在查找为 true 的值,你甚至可以不使用 == 1 的表达,因为它的格式已经是布尔值了,因此你也可以这样写:
%%sql
select * from station_data
where tornado and hail
limit 0,3;
* sqlite:///DataBase/weather_stations.db
Done.
station_number | report_code | year | month | day | dew_point | station_pressure | visibility | wind_speed | temperature | precipitation | snow_depth | fog | rain | hail | thunder | tornado |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
143080 | 34DDA7 | 2002 | 12 | 21 | 33.8 | 987.4 | 3.4 | 0.2 | 36 | 0 | None | 1 | 1 | 1 | 1 | 1 |
724320 | 207979 | 1988 | 3 | 4 | 33.1 | 999.4 | 3.1 | 9.3 | 35.1 | 0.23 | None | 1 | 1 | 1 | 1 | 1 |
743920 | 2ABE7D | 1996 | 5 | 21 | 57.6 | None | 5.8 | 7.5 | 70 | 0 | None | 1 | 1 | 1 | 1 | 1 |
同样,对于 false 也有两种表达:
%%sql
select * from station_data
where tornado == 0 and hail == 1;
* sqlite:///DataBase/weather_stations.db
Done.
station_number | report_code | year | month | day | dew_point | station_pressure | visibility | wind_speed | temperature | precipitation | snow_depth | fog | rain | hail | thunder | tornado |
---|
%%sql
select * from station_data
where not tornado and hail;
* sqlite:///DataBase/weather_stations.db
Done.
station_number | report_code | year | month | day | dew_point | station_pressure | visibility | wind_speed | temperature | precipitation | snow_depth | fog | rain | hail | thunder | tornado |
---|
你可能发现了某一些列包含了缺失值(null values),比如 station_pressure 和 snow_depth。Null 无法使用 == 来查找,你需要使用 IS NULL 或 IS NOT NULL 来浮现缺失值。因此,为了到所有没有 snow_depth 数据的记录,你可以使用以下查询:
%%sql
select * from station_data
where snow_depth IS NULL
limit 0,3;
* sqlite:///DataBase/weather_stations.db
Done.
station_number | report_code | year | month | day | dew_point | station_pressure | visibility | wind_speed | temperature | precipitation | snow_depth | fog | rain | hail | thunder | tornado |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
143080 | 34DDA7 | 2002 | 12 | 21 | 33.8 | 987.4 | 3.4 | 0.2 | 36 | 0 | None | 1 | 1 | 1 | 1 | 1 |
766440 | 39537B | 1998 | 10 | 1 | 72.7 | 1014.6 | 5.9 | 6.7 | 83.3 | 0 | None | 0 | 0 | 0 | 0 | 0 |
176010 | C3C6D5 | 2001 | 5 | 18 | 55.7 | None | 7.3 | 4.3 | 69.1 | 0 | None | 0 | 0 | 0 | 0 | 0 |
如果你想要将缺失值替换为一个确切的值,你可以使用 coalesce() 函数,如将降雨量为缺失值的数据替换为 0 ,同时筛选其中小于等于 0 的数据:
%%sql
select * from station_data
where coalesce(precipitation, 0) <= 0.5
limit 0,3;
* sqlite:///DataBase/weather_stations.db
Done.
station_number | report_code | year | month | day | dew_point | station_pressure | visibility | wind_speed | temperature | precipitation | snow_depth | fog | rain | hail | thunder | tornado |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
143080 | 34DDA7 | 2002 | 12 | 21 | 33.8 | 987.4 | 3.4 | 0.2 | 36 | 0 | None | 1 | 1 | 1 | 1 | 1 |
766440 | 39537B | 1998 | 10 | 1 | 72.7 | 1014.6 | 5.9 | 6.7 | 83.3 | 0 | None | 0 | 0 | 0 | 0 | 0 |
176010 | C3C6D5 | 2001 | 5 | 18 | 55.7 | None | 7.3 | 4.3 | 69.1 | 0 | None | 0 | 0 | 0 | 0 | 0 |
coalesce() 函数不仅可以用于 where 子句中, 还可以用于 select 语句中,如:
%%sql
select report_code, coalesce(precipitation, 0) as rainfall
from station_data
limit 0,3;
* sqlite:///DataBase/weather_stations.db
Done.
report_code | rainfall |
---|---|
34DDA7 | 0 |
39537B | 0 |
C3C6D5 | 0 |
[1] Thomas Nield.Getting Started with SQL[M].US: O’Reilly, 2016: 29-37
相关文章:
SQL | 目录
SQLite | SQLite 与 Pandas 比较篇之一
SQLite | Select 语句
SQLite | Group by 与 Order by 子句
SQLite | CASE 子句
SQLite | Join 语句
SQLite | 数据库设计与 Creat Table 语句
SQLite | Insert、Delete、Updata 与 Drop 语句