notebook_sql

这是我自己学习SQL的笔记,教程是来自B站的视频,主讲人是在Youtube非常火的MOSH

课程链接

1.1 What is SQL

A DATABASE is a collection of data stored in a format that can easily be accessed.

image-20201213161550381
1.2 TYPE OF DATABASE
Relational database & No-SQL

Relational Database Management System: MySQL, SQL Server, Oracle

image-20201213162609879

No-SQL: no relationship between tables

NoSQL system don’t understand SQL

2.1 Create the databases
-- USE sql_store; 使用哪个数据库
SELECT * FROM customers
WHERE customer_id =1
ORDER BY first_name;
2.2 THE SELECT Statement
SELECT
      last_name,
      first_name,
      points,
      (points + 10) * 100 AS discount_factor
                      -- 遵循数学表达式的优先性
                      -- AS "discount facotr" 如果字母间有空格必须加上""
FROM customers;
2.3 THE SELECT CLAUSE
SELECT DISTINCT state 
                -- DISTINCT 表示在state中不显示重复的
FROM customers;
EXERCISE
SELECT 
      name,
      unit_price,
      unit_price * 1.1 AS new_price
FROM customers;
2.4 THE WHERE CLAUSE
SELECT *
FROM customers
WHERE birth_date > '1990-01-01';
              -- state!= 'VA'
              -- points > 3000
              -- =/<=/=/!=
EXERCISE
-- 查看今年的订单
SELECT *
FROM orders
WHERE order_date >= '2019-01-01';
2.5 THE AND/OR/NOT
SELECT *
FROM customers
-- WHERE birth_date > '1990-01-01' OR (points > 1000 AND state = 'VA')
                             -- OR points > 1000
WHERE NOT (birth_date > '1990-01-01' OR points > 1000);
EXERCISE
-- exercise
SELECT *
FROM order_items
WHERE order_id = 6 AND (unit_price * quantity) >30;
2.6 THE IN OPERATOR
SELECT *
FROM customers
              -- 在SQL语言中我们不能将字符穿和布尔表达式进行结合 
              --  OR 'GA' OR 'FL'
WHERE state = 'VA' OR state = 'GA' OR state = 'FL';  
-- WHERE state NOT IN ('VA','FL','GA')
EXERCISE
-- EXERCISE 
SELECT *
FROM products
WHERE quantity_in_stock IN (49,38,72);
2.7 THE BETWEEN OPERATOR
SELECT *
FROM customers
-- WHERE points>= 1000 AND points<= 3000
-- 用between语法来表示
WHERE points BETWEEN 1000 AND 3000;
EXERCISE
SELECT *
FROM customers
WHERE birth_date BETWEEN  '1990-01-01'AND '2000-01-01';
2.8 THE LIKE OPERATOR
SELECT *
FROM customers
WHERE last_name LIKE -- "%b%"  -- %来表示任意字符
                            -- 名字中有
					"_____y"  -- 名字中第_____个字符是y
EXERCISE
-- exercise address匹配
SELECT *
FROM customers
WHERE address LIKE '%TRAIL%'  -- BTWEEN IN 是精准定位而like表示模糊定位
      OR address LIKE '%avenue%'
-- EXERCISE
SELECT *
FROM customers
WHERE phone LIKE '%9';
2.9 THE REGEXP OPERATOR
SELECT *
FROM customers
WHERE last_name LIKE '%field%';
SELECT *
FROM customers
WHERE last_name REGEXP 'field$|mac|rose'; 

– ^表示在字符串开头
– $ 表示在字符串结尾
– | 可以同时匹配两个字符串,是或的意思

SELECT *
FROM customers
WHERE last_name REGEXP '[gim]e';

Output: ge/ie/me

SELECT *
FROM customers
WHERE last_name REGEXP '[a-h]e';

Output: ae/be/…/he

EXERCISE

FIRSTNAME中带ELKA或AMBUR

-- EXERCISE REGEXP1
SELECT *
FROM customers
WHERE FIRST_name REGEXP 'ELKA|AMBUR';

LASTNAME中以EY或ON结尾

-- EXERCISE REGEXP2
SELECT *
FROM customers
WHERE last_name REGEXP 'EY$|ON$';

lastname中以my开头并且里面带se

-- EXERCISE REGEXP3
SELECT *
FROM customers
WHERE last_name REGEXP '^MY|SE';

名字中带BR或BU

-- EXERCISE REGEXP4
SELECT *
FROM customers
WHERE last_name REGEXP 'B[RU]';

正则表达式

模式 描述
^ 匹配输入字符串的开始位置。如果设置了 RegExp 对象的 Multiline 属性,^ 也匹配 ‘\n’ 或 ‘\r’ 之后的位置。
$ 匹配输入字符串的结束位置。如果设置了RegExp 对象的 Multiline 属性,$ 也匹配 ‘\n’ 或 ‘\r’ 之前的位置。
. 匹配除 “\n” 之外的任何单个字符。要匹配包括 ‘\n’ 在内的任何字符,请使用像 ‘[.\n]’ 的模式。
[…] 字符集合。匹配所包含的任意一个字符。例如, ‘[abc]’ 可以匹配 “plain” 中的 ‘a’。
[^…] 负值字符集合。匹配未包含的任意字符。例如, ‘[^abc]’ 可以匹配 “plain” 中的’p’。
p1|p2|p3 匹配 p1 或 p2 或 p3。例如,‘z|food’ 能匹配 “z” 或 “food”。’(z|f)ood’ 则匹配 “zood” 或 “food”。
* 匹配前面的子表达式零次或多次。例如,zo* 能匹配 “z” 以及 “zoo”。* 等价于{0,}。
+ 匹配前面的子表达式一次或多次。例如,‘zo+’ 能匹配 “zo” 以及 “zoo”,但不能匹配 “z”。+ 等价于 {1,}。
{n} n 是一个非负整数。匹配确定的 n 次。例如,‘o{2}’ 不能匹配 “Bob” 中的 ‘o’,但是能匹配 “food” 中的两个 o。
{n,m} m 和 n 均为非负整数,其中n <= m。最少匹配 n 次且最多匹配 m 次。
\ 将下一个字符标记为或特殊字符、或原义字符、或向后引用、或八进制转义符。例如, ‘n’ 匹配字符 ‘n’。’\n’ 匹配换行符。序列 ‘\’ 匹配 “”,而 ‘(’ 则匹配 “(”。
2.10 THE IS NULL OPERATOR
SELECT *
FROM customers
WHERE phone IS NOT NULL;
EXERCISE
-- EXERCISE 
SELECT *
FROM orders
WHERE shipped_date IS NULL;
2.11 THE ORDER BY CLAUSE

MYSQL中可以用任何列排列数据,不管那列是不是在select子句中

SELECT first_name, last_name
FROM customers
ORDER BY birth_date; 

排序可以根据自己起的列排序

SELECT first_name, last_name, 10 AS points
FROM customers
ORDER BY points, first_name; 

1–represents first_name

2–represents last_name

最好避免使用

SELECT first_name, last_name, 10 AS points
FROM customers
ORDER BY 12; 
image-20201213171656506
exercise

order_id=2 并且根据每一项目的总价格进行排序

image-20201213172120335
SELECT *
FROM order_items
WHERE order_id =2 
ORDER BY quantity * unit_price DESC;
SELECT 
      order_id, 
      product_id, 
      quantity, 
      unit_price,
      (quantity * unit_price) AS Total_price
FROM order_items
WHERE order_id =2 
ORDER BY Total_price DESC;
2.12 THE LIMIT CLAUSE
SELECT *
FROM customers
LIMIT 3

OFFSET

– page 1: 1-3

–page 2: 4-6

–page 3: 7-9

如果想直接获取第三页的数据,需要跳过前面的6条记录并选择3条记录

SELECT *
FROM customers
LIMIT 6,3;

LIMIT 6,3 --跳过前面的6条记录,并选择6条记录后的3条记录

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-MWpalSSO-1610167958984)(C:\Users\Changnie\AppData\Roaming\Typora\typora-user-images\image-20201213173050617.png)]

EXERCISE

查找积分最多的前三名用户

-- Get the top three loyal customers
SELECT *
FROM customers
ORDER BY points DESC
LIMIT 3;
3.1 INNER JOINS
SELECT *
FROM orders
INNER JOIN customers  
	ON customers.customer_id = orders.customer_id;
SELECT order_id, first_name, last_name
FROM orders
INNER JOIN customers  
	ON customers.customer_id = orders.customer_id;

Error Code: 1052. Column ‘customer_id’ in field list is ambiguous 0.000 sec

SELECT order_id, customer_id, first_name, last_name
FROM orders
INNER JOIN customers  
	ON customers.customer_id = orders.customer_id;

Reason: orders & customers表格都有customer_id 这个列,MySQL无法确定我们想从哪张表选取这列

How to solve it?

SELECT order_id, orders.customer_id, first_name, last_name
FROM orders
INNER JOIN customers  
	ON customers.customer_id = orders.customer_id;

在customer_id前面标注是哪个表中的columns

用o代替orders,用c代替customers

SELECT order_id, o.customer_id, first_name, last_name
FROM orders o
INNER JOIN customers c  
	ON c.customer_id = o.customer_id;
EXERCISE
SELECT order_id,p.product_id,name,quantity, o.unit_price
FROM order_items o
JOIN products p
      ON o.product_id = p.product_id;

Output

image-20201213203609127
3.2 Joining Across Databases

将分散在多个数据库的表中的列合并起来

image-20201213203821972

尝试将sql_inventory中的products与sql_store order_items链接到一起

SELECT *
FROM order_items o
JOIN sql_inventory.products p
     ON o.product_id = p.product_id;

因为是USE sql_store所以sql_inventory.products p

Another Situation(USE sql_inventory)

USE sql_inventory;
SELECT *
FROM products p
JOIN sql_store.order_items oi
    ON  p.product_id= oi.product_id;
3.3 Self Joins

使表自己和自己进行连结,查看每个员工和他的manager

image-20201213205057236
USE sql_hr;

SELECT *
FROM employees e
JOIN employees m  -- m for manager
     ON e.reports_to = m.employee_id;

Output

employee_id first_name last_name job_title salary reports_to office_id employee_id first_name last_name job_title salary reports_to office_id
33391 D’arcy Nortunen Account Executive 62871 37270 1 37270 Yovonnda Magrannell Executive Secretary 63996 10
37851 Sayer Matterson Statistician III 98926 37270 1 37270 Yovonnda Magrannell Executive Secretary 63996 10
40448 Mindy Crissil Staff Scientist 94860 37270 1 37270 Yovonnda Magrannell Executive Secretary 63996 10
56274 Keriann Alloisi VP Marketing 110150 37270 1 37270 Yovonnda Magrannell Executive Secretary 63996 10
63196 Alaster Scutchin Assistant Professor 32179 37270 2 37270 Yovonnda Magrannell Executive Secretary 63996 10
67009 North de Clerc VP Product Management 114257 37270 2 37270 Yovonnda Magrannell Executive Secretary 63996 10
67370 Elladine Rising Social Worker 96767 37270 2 37270 Yovonnda Magrannell Executive Secretary 63996 10
68249 Nisse Voysey Financial Advisor 52832 37270 2 37270 Yovonnda Magrannell Executive Secretary 63996 10
72540 Guthrey Iacopetti Office Assistant I 117690 37270 3 37270 Yovonnda Magrannell Executive Secretary 63996 10
72913 Kass Hefferan Computer Systems Analyst IV 96401 37270 3 37270 Yovonnda Magrannell Executive Secretary 63996 10
75900 Virge Goodrum Information Systems Manager 54578 37270 3 37270 Yovonnda Magrannell Executive Secretary 63996 10
76196 Mirilla Janowski Cost Accountant 119241 37270 3 37270 Yovonnda Magrannell Executive Secretary 63996 10
80529 Lynde Aronson Junior Executive 77182 37270 4 37270 Yovonnda Magrannell Executive Secretary 63996 10
80679 Mildrid Sokale Geologist II 67987 37270 4 37270 Yovonnda Magrannell Executive Secretary 63996 10
84791 Hazel Tarbert General Manager 93760 37270 4 37270 Yovonnda Magrannell Executive Secretary 63996 10
95213 Cole Kesterton Pharmacist 86119 37270 4 37270 Yovonnda Magrannell Executive Secretary 63996 10
96513 Theresa Binney Food Chemist 47354 37270 5 37270 Yovonnda Magrannell Executive Secretary 63996 10
98374 Estrellita Daleman Staff Accountant IV 70187 37270 5 37270 Yovonnda Magrannell Executive Secretary 63996 10
115357 Ivy Fearey Structural Engineer 92710 37270 5 37270 Yovonnda Magrannell Executive Secretary 63996 10

在已连接的表中选取特定列

SELECT 
      e.employee_id,
      e.first_name,
      m.first_name AS manager
FROM employees e
JOIN employees m  -- m for manager
     ON e.reports_to = m.employee_id;
3.4 Joining Multiple Tables

对多张表进行

USE sql_store;

SELECT 
     o.order_id,
     o.order_date,
     c.first_name,
     c.last_name,
     os.name AS status
FROM orders o
JOIN customers c
     ON o.customer_id = c.customer_id 
JOIN order_statuses os
     ON o.status = os.order_status_id;

Output

image-20201213210921335
EXERCISE
USE sql_invoicing;

SELECT 
      p.date,
      p.amount,
      c.name,
      c.address,
      c.city,
      c.phone,
      pm.name as payment_method
FROM payments p
JOIN clients c
	ON p.client_id = c.client_id
JOIN payment_methods pm
    ON p.payment_method = pm.payment_method_id;

Output

image-20201213212041444
3.5 Compound Join Conditions

使用两个都含有重复值的columns一起唯一识别每一个订单项目

image-20201213212414449

在表设计过程中,order_id & product_id被设计成复合主键

USE sql_store;

SELECT *
FROM order_items oi
JOIN order_item_notes oin
     ON oi.order_id = oin.order_id
     AND oi.product_id = oin.product_id;
3.6 隐式连接语法Implicit Join Syntax
SELECT *
FROM orders o
JOIN customers c
     ON o.customer_id = c.customer_id;

-- IMPLICIT JOIN SYNTAX(NOT SUGGEST)
SELECT *
FROM orders o,customers c
WHERE o.customer_id = c.customer_id;

尽量使用显示连接法

3.7 Outer Joins
LEFT JOIN
SELECT 
	c.customer_id,
    c.first_name,
    o.order_id
FROM customers c
LEFT JOIN orders o
     ON c.customer_id = o.customer_id
ORDER BY c.customer_id;

当使用left join时,所有左表的记录都会被返回,不管条件正确还是错误

Output

image-20201214211238756
RIGH JOIN
SELECT 
	c.customer_id,
    c.first_name,
    o.order_id
FROM customers c
RIGHT JOIN orders o
     ON c.customer_id = o.customer_id
ORDER BY c.customer_id;

Output

image-20201214211052254
DIFFERENCE BETWEEN LEFT JOIN & RIGHT JOIN

LEFT JOIN: 以左表为目标进行索引

RIGHT JOIN: 以右表为目标进行索引

EXERCISE

对product和order进行外连接,要求:即使product_id = 7美欧quantity也要返回查询

SELECT 
      p.product_id,
      p.name,
      oi.quantity
FROM products p
LEFT JOIN order_items oi
        ON p.product_id = oi.product_id;
image-20201214212240075
3.8 OUTER JOIN BETWEEN MULTIPLE TABLES

多张表的外连接

LEFT JOIN
LEFT JOIN
customers
orders
shippers
SELECT 
	c.customer_id,
    c.first_name,
    o.order_id,
    sh.name AS shipper
FROM customers c
LEFT JOIN orders o
     ON c.customer_id = o.customer_id
LEFT JOIN shippers sh
     ON o.shipper_id = sh.shipper_id
ORDER BY c.customer_id;

customers

image-20201214214133008

orders

image-20201214214203670

shipper

image-20201214214304205

Output

image-20201214214046524
EXERCISE
INNER JOIN
LEFT JOIN
LEFT JOIN
orders
customers
shippers
orders_statuses

因为customers和orders只有customer_id&first_name两列相同,所以可以INNER JOIN

SELECT 
      o.order_date,
      o.order_id,
      c.first_name,
      sh.name AS shipper,
      os.order_status_id AS status
FROM orders o
JOIN customers c
          ON o.customer_id = c.customer_id
LEFT JOIN shippers sh
         ON o.shipper_id = sh.shipper_id
LEFT JOIN order_statuses os
         ON sh.shipper_id = os.order_status_id;
image-20201214220856228
3.9 Self Outer Joins

表子链接:要求能看到他们的manager同时也显示manager

employee_id first_name last_name job_title salary reports_to office_id
33391 D’arcy Nortunen Account Executive 62871 37270 1
37270 Yovonnda Magrannell Executive Secretary 63996 10
37851 Sayer Matterson Statistician III 98926 37270 1
40448 Mindy Crissil Staff Scientist 94860 37270 1
56274 Keriann Alloisi VP Marketing 110150 37270 1
63196 Alaster Scutchin Assistant Professor 32179 37270 2
67009 North de Clerc VP Product Management 114257 37270 2
67370 Elladine Rising Social Worker 96767 37270 2
68249 Nisse Voysey Financial Advisor 52832 37270 2
72540 Guthrey Iacopetti Office Assistant I 117690 37270 3
72913 Kass Hefferan Computer Systems Analyst IV 96401 37270 3
75900 Virge Goodrum Information Systems Manager 54578 37270 3
76196 Mirilla Janowski Cost Accountant 119241 37270 3
80529 Lynde Aronson Junior Executive 77182 37270 4
80679 Mildrid Sokale Geologist II 67987 37270 4
84791 Hazel Tarbert General Manager 93760 37270 4
95213 Cole Kesterton Pharmacist 86119 37270 4
96513 Theresa Binney Food Chemist 47354 37270 5
98374 Estrellita Daleman Staff Accountant IV 70187 37270 5
115357 Ivy Fearey Structural Engineer 92710 37270 5
SELECT 
      e.employee_id,
      e.first_name,
      m.first_name AS manager
FROM employees e
LEFT JOIN employees m
     ON e.reports_to = m.employee_id;

Output

image-20201215221119208
3.10 The Using Clause

如果两个内连接的表中,有列名称是完全一样的,可以用一个更简洁的USING子句替换ON子句

SELECT 
      o.order_id,
      c.first_name
FROM orders o
JOIN customers c
    ON o.customer_id = c.customer_id;
SELECT 
      o.order_id,
      c.first_name,
      sh.name AS shipper
FROM orders o
JOIN customers c
    USING (customer_id)

USING key word只能在不同表中的列名字完全一样的情况下使用

双主键表格
SELECT *
FROM order_items oi
JOIN order_item_notes oin
     on oi.order_id = oin.order_id 
     AND oi.product_id = oin.product_id;

USING语句优化后

SELECT *
FROM order_items oi
JOIN order_item_notes oin
     USING (order_id,product_id);
EXERCISE
SELECT 
      p.date,
      c.name AS client,
      p.amount,
      pm.name AS payment_method
FROM clients c
JOIN payments p
     USING client_id
JOIN payment_methods pm
     on p.payment_method = pm.payment_method_id;
3.11 Natural Joins

使用自然连接,不具体打上列名,数据引擎会自动基于共同的列连接

不建议使用,因为有时候会产生意外结果

SELECT 
     o.order_id,
     c.first_name
FROM orders o
NATURAL JOIN customers c
3.12 Cross Joins

对于cross join连接来说,其实使用的就是笛卡尔连接;

如果不带WHERE条件子句,它将会返回被连接的两个表的笛卡尔积,返回结果的行数等于两个表行数的乘积;

Cross Joins是不需要ON和USING这种关键字的。

Cross Joins

显式语法

SELECT 
      c.first_name AS customer,
      p.name AS product
FROM customers c
CROSS JOIN products p
ORDER BY c.first_name 
Example

有一个型号表,比如小/中/大,还有一个颜色表,比如红/蓝/绿,想要将所有的型号和所有的颜色组合,此时就可以用交叉连接

隐式语法

SELECT 
      c.first_name AS customer,
      p.name AS product
FROM customers c, product p
ORDER BY c.first_name;
EXERCISE

Do a cross join between shippers and products

explicit syntax

SELECT *
FROM customers c
CROSS JOIN products p
ORDER BY c.first_name

implicit syntax

SELECT *
FROM customers c, products p
ORDER BY c.first_name;
3.13 Unions

将两个不同的查询语句进行联结

SELECT 
      order_id,
      order_date,
      'Activity' AS status
FROM orders
WHERE order_date >= '2019-01-01'
UNION
SELECT 
      order_id,
      order_date,
      'Archived' AS status
FROM orders
WHERE order_date < '2019-01-01';
SELECT first_name
FROM customers
UNION
SELECT name 
FROM shippers;

Error Code: 1222. The used SELECT statements have a different number of columns

SELECT first_name,last_name
FROM customers
UNION
SELECT name 
FROM shippers;
EXERCISE

将积分为2k以下的顾客挑出来为Bronze, >2K的顾客挑出来成为Silver

SELECT 
      customer_id,
      first_name,
      points,
      'Bronze' AS type
FROM customers
WHERE points < 2000
UNION
SELECT 
      customer_id,
      first_name,
      points,
      'Silver' AS type
FROM customers
WHERE points > 2000;
image-20201216212205025
4.1 COLUMN ATTRIBUTION

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-YBOhSDUQ-1610167958986)(C:\Users\Changnie\AppData\Roaming\Typora\typora-user-images\image-20210103220415085.png)]

VARCHAR(50)–可变字符缩写,最多可以放50个字符variable不会自动填充

CHAR(50)–当输入字符不够50时,MySQL会再插入45个空格符填满

PK–Primary key 这列的值唯一标识了每一位顾客

NN–Not Null

AI–自动递增,通常被用在主键列;每次我们在表中插入一条新记录,我们让MySQL或者数据库引擎在列中插入一个值,其实就是在最后一行加入一个顾客id,就会在新纪录的同时将顾客id增加1

4.2 Inserting a Single Row
INSERT INTO customers (
      first_name,
      last_name,
      birth_date,
      address,
      city,
      state)
VALUES (
        'John',
        'Smith',
        NULL,
        'ADDRESS',
        'city',
        'CA')

我们用默认关键字,让MySQL生成一个顾客id的唯一值

在SQL里字符串和日期值都要带上引号

可以以任何的顺序去排列,只要两个()中的值对应

4.2 Inserting Multiple Rows
INSERT INTO shippers(name)
VALUES ('Shipper1'),
       ('Shipper2'),
	   ('Shipper3')

Output

image-20210108234257145
EXERCISE

–Insert three rows in the products table

INSERT INTO products(
   name,
   quantity_in_stock,
   unit_price
)
VALUES(
    'zy',
    18,
    27
)
image-20210108234556291
4.4 Inserting Hierarchical Rows

如何往多张表插入数据

一个订单可以有多个产品,orders表是描述有什么订单,order_items表是描述订单有哪些产品

怎么插入一笔订单以及它对应的所有项目

INSERT INTO orders(customer_id,order_date,status)
VALUES (1,'2019-01-02',1)
5.1 AGGREGATE FUNCTIONS

MySQL中内置的一些函数:

MAX()
MIN()
AVG()
SUM()
COUNT()
SELECT 
      MAX(invoice_total) AS highest,
      MIN(invoice_total) AS lowest,
      AVG(invoice_total) AS average,
      SUM(invoice_total) AS total,
      COUNT(invoice_total) AS number_of_invoices,
      COUNT(payment_date) AS count_of_payments,
      -- COUNT()只会返回非空payment_dates的记录数
      COUNT(*) AS total_records
      -- 得到表格中所有记录条目,不管是不是空值
FROM invoices;

排除重复的记录,需要在函数前面+DISTINCT

SELECT 
      MAX(invoice_total) AS highest,
      MIN(invoice_total) AS lowest,
      AVG(invoice_total) AS average,
      SUM(invoice_total) AS total,
      COUNT(invoice_total) AS number_of_invoices,
      COUNT(client_id) AS total_number_clients,
      COUNT(DISTINCT client_id) AS number_of_client
FROM invoices;
EXERCISE
SELECT 
      'First half of 2019' AS date_range,
      SUM(invoice_total) AS total_sales, 
      SUM(payment_total) AS total_payments, 
      SUM(invoice_total - payment_total) AS what_we_expect
      -- ()里面的必须是表格中有的,不能是自己起的
FROM invoices i 
WHERE invoice_date BETWEEN '2019-01-01' AND '2019-06-30'
UNION
SELECT 
      'First half of 2019' AS date_range,
      SUM(invoice_total) AS total_sales, 
      SUM(payment_total) AS total_payments, 
      SUM(invoice_total - payment_total) AS what_we_expect
FROM invoices i 
WHERE invoice_date BETWEEN '2019-07-01' AND '2019-12-31'
UNION
SELECT 
      'First half of 2019' AS date_range,
      SUM(invoice_total) AS total_sales, 
      SUM(payment_total) AS total_payments, 
      SUM(invoice_total - payment_total) AS what_we_expect
FROM invoices i 
WHERE invoice_date BETWEEN '2019-01-01' AND '2019-12-31'
5.2 THE GROUP BY CLAUSE

对一列或者多列进行数据分组

SELECT 
	client_id,
    SUM(invoice_total) AS total_sales
FROM invoices
GROUP BY client_id

Output

默认状态下,数据是按照group by子句中指定的列排序的

image-20201219193857667

使用order by进行排列

SELECT 
	client_id,
    SUM(invoice_total) AS total_sales
FROM invoices
GROUP BY client_id
ORDER BY total_sales ASC

使用where增加筛选

SELECT 
	client_id,
    SUM(invoice_total) AS total_sales
FROM invoices
WHERE invoice_date > '2019-07-01'
GROUP BY client_id
ORDER BY total_sales ASC;

group by子句永远在from和where子句之后,在order by之前

补充:SQL语句查询顺序

(8) SELECT (9)DISTINCT<Select_list>
(1) FROM <left_table> (3) <join_type>JOIN<right_table>
(2) ON<join_condition>
(4) WHERE<where_condition>
(5) GROUP BY<group_by_list>
(6) WITH {CUBE|ROLLUP}
(7) HAVING<having_condtion>
(10) ORDER BY<order_by_list>
(11) LIMIT<limit_number>

我们可以看出,SELECT子句是必选的,其它子句如WHERE子句、GROUP BY子句等是可选的。
一个SELECT语句中,子句的顺序是固定的。必须严格按照上述的顺序书写。

所有的查询语句都是从FROM开始执行的,在执行过程中,每个步骤都会为下一个步骤生成一个虚拟表,这个虚拟表将作为下一个执行步骤的输入。

  1. 先对FROM子句中的两个表执行一个笛卡尔乘,此时生成虚拟表 virtual table 1(选择相对小的表做基础表)
  2. 然后是应用ON条件筛选器,将ON中的逻辑表达式将应用到 virtual table 1中的各个行,筛选出满足ON中的逻辑表达式的行,生成虚拟表 virtual table 2
  3. 根据连接方式进行进一步的操作。如果是OUTER JOIN,那么这一步就将添加外部行
  • LEFT OUTER JOIN就把左表在第二步中筛选掉的行添加进来
  • RIGHT OUTER JOIN就将右表在第二步中筛选掉的行添加进来 这样生成虚拟表 virtual table 3

如果 FROM子句中的表数目大于2,那么就将virtual table 3和第三个表连接从而计算笛卡尔积,生成虚拟表,该过程就是一个重复1-3的步骤,最终得到一个新的虚拟表virtual table 3

  1. 应用WHERE筛选器,对上一步生产的virtual table 3用WHERE筛选器筛选,生成虚拟表virtual table 4
    在这有个比较重要的细节需要提一下,如果我们有一个condition需要去筛选,应该在在ON条件筛选器还是用WHERE筛选器指定condition逻辑表达式呢?
    ON和WHERE的最大区别在于,如果在ON应用逻辑表达式那么在第三步OUTER JOIN中还可以把移除的行再次添加回来,而WHERE的移除的不可挽回的
  2. GROUP BY子句将具有相同属性的row组合成为一组,得到虚拟表virtual table 5
    如果应用了GROUP BY,那么后面的所有步骤都只能得到的virtual table 5的列或者是聚合函数,并且分组的时候是将列中唯一的值分成一组,同时只为每一组返回一行记录,这一点请牢记。
  3. 应用CUBE或者ROLLUP选项,为virtual table 5生成超组,生成virtual table 6. 这个暂时还没了解,先放到这里吧。
  4. 应用HAVING筛选器,生成virtual table 7;HAVING筛选器是唯一一个用来筛选组的筛选器
  5. 处理SELECT子句。将virtual table 7中的并且在Select_list中的列筛选出来,生成virtual table 8
  6. 应用DISTINCT子句,virtual table 8中移除相同的行,生成virtual table 9
    事实上如果应用了GROUP BY子句,那么DISTINCT是多余的,原因同样在于,分组的时候是将列中唯一的值分成一组,同时只为每一组返回一行记录,那么所以的记录都将是不相同的。
  7. 应用ORDER BY子句。按照order_by_condition排序virtual table 10,此时返回的一个游标,而不是虚拟表。SQL是基于集合的,集合不会预先对行进行排序,它只是成员的逻辑集合,成员的顺序是无关紧要的。对表进行排序的查询可以返回一个对象,这个对象包含特定的物理顺序的逻辑组织。这个对象就叫游标。正因为返回值是游标,那么使用ORDER BY子句查询不能应用于表达式。

分组数据以查看每个state和city的总销量

SELECT 
	state,
    city,
    SUM(invoice_total) AS total_sales
FROM invoices i
JOIN clients s USING(client_id)
GROUP BY state,city;

Output

image-20201219200738099
EXERCISE
SELECT 
      date,
      pm.name,
      SUM(amount) AS total_payments
FROM payments p
LEFT JOIN payment_methods pm
ON p.payment_method = pm.payment_method_id 
GROUP BY date,payment_method
ORDER BY date ASC

Output

image-20201219201954069
5.3 THE HAVING CLAUSE

HAVING–对分组后的数据进行筛选

–为什么不用WHERE

–因为WHERE必须在group by之前,但是此时还没有对数据进行分组

SELECT 
     client_id,
     SUM(invoice_total) AS total_sales
FROM invoices
GROUP BY client_id

Output

image-20201219202652063

Error Code: 1054. Unknown column ‘invoice_sales’ in ‘where clause’ 0.016 sec

SELECT 
     client_id,
     SUM(invoice_total) AS total_sales
FROM invoices
WHERE invoice_sales > 500
GROUP BY client_id
5.4 THE ROLLUP OPERATOR

rollup运算符只能应用于聚合值的列 (仅在MySQL中存在)

SELECT 
     state,
     city,
     SUM(invoice_total) AS total_sales
FROM invoices i
JOIN clients c USING (client_id)
GROUP BY state,city WITH ROLLUP

Output

image-20201224102735348

ROLLUP运算符对每一组的总计做了计算,当进行分列分组并运用rollup运算符,会看到每个组及整个结果集的汇总值

EXERCISE
SELECT 
      pm.name,
      SUM(amount) AS total
FROM payments p
JOIN payment_methods pm
     ON p.payment_method = pm.payment_method_id
GROUP BY payment_method WITH ROLLUP

ERROR: 使用rollup运算符的时候,不能在group by子句中使用列别名,必须输入列的实际名称

SELECT 
      pm.name,
      SUM(amount) AS total
FROM payments p
JOIN payment_methods pm
     ON p.payment_method = pm.payment_method_id
GROUP BY pm.name WITH ROLLUP

Output

image-20201224104902582
6.1 WRITING COMPLEX QUERIES

子查询是在另一个SQL语句中的选择语句

重置数据库删除之前添加的记录

6.2 SUBQUERIES

– Find products that are more
– expensive than Lettuce (id = 3)

SELECT *
FROM products
WHERE unit_price > (
    SELECT unit_price
    FROM products
    WHERE product_id = 3
)

获取id=3的价格

WHERE unit_price > (
    SELECT unit_price
    FROM products
    WHERE product_id = 3
)

也可以在FROM子句或者SELECT子句中编写子查询

EXERCISE

–在sql_hr数据库中,找到收入高于平均收入的演员

SELECT *
FROM employees
WHERE salary > (
    SELECT 
    avg(salary)
    FROM employees
)
6.3 THE IN OPERATOR

在store数据库中查找没有被订购过的产品

SELECT *
FROM products
WHERE product_id NOT IN(
      SELECT DISTINCT product_id
      FROM order_items
)
EXERCISE

–Find clients without invoices

SELECT *
FROM clients
WHERE client_id NOT IN (
      SELECT DISTINCT client_id
      FROM invoices
)
6.4 SUBQUERIES VS JOINS

–Find clients without invoices

1.Subqueries

SELECT *
FROM clients
WHERE client_id NOT IN (
      SELECT DISTINCT client_id
      FROM invoices
)

2.Joins

SELECT *
FROM clients
LEFT JOIN invoices USING (client_id)
WHERE invoice_id IS NULL
EXERCISE

– Find customers who have ordered lettuce (id = 3)

– Select customer_id, first_name, last_name

Output

image-20201224142430030

1.Subqueries

SELECT 
      customer_id,
      first_name,
      last_name
FROM customers
WHERE customer_id IN (
    SELECT o.customer_id
    FROM order_items oi
    JOIN orders o USING (order_id)
    WHERE product_id = 3
)

2.Joins

SELECT 
      DISTINCT customer_id,
      first_name,
      last_name
FROM customers 
LEFT JOIN orders USING (customer_id)
LEFT JOIN order_items USING (order_id)
WHERE product_id =3 
6.5 THE ALL KEYWORD

The All Keyword

SELECT *
FROM invoices
WHERE invoice_total > ALL(150,130,167,...)

如果发票总额>ALL括号里的所有值,那么那行的值才会被返回;如果不添加ALL,MySQL无法比对多个值

– Select invoices larger than all invoices of

– Client 3

SELECT *
FROM invoices
WHERE invoice_total> (
      SELECT MAX(invoice_total)
      FROM invoices
      WHERE client_id = 3
)
6.6 THE ANY KEYWORD

使用这些关键字可以得到发票总额高于这段子查询返回的任意值的行

– Select clients with at least two invoices

SELECT *
FROM clients
WHERE client_id IN(
   SELECT  client_id
   FROM invoices 
   GROUP BY client_id
   HAVING COUNT(*)>=2
)

–Use ANY keyword

SELECT *
FROM clients
WHERE client_id = ANY(
   SELECT  client_id
   FROM invoices 
   GROUP BY client_id
   HAVING COUNT(*)>=2
)

如果客户id等于这段查询返回值里的任何一个,那位客户就会被返回到最终结果

6.7 CORRELATED SUBQUERIES

– Select employees whose salary is above the average in their office

Solutions:

– for each employee

– calculate the avg salary for employee.office

– return the employee if salary > avg

SELECT *
FROM employees
WHERE salary > (
    SELECT AVG(salary)
    FROM employees
    WHERE office_id = e.office_id
)

内外查询都用到了employees,为了获取在同一部门的员工,给office_id列加了一个表名的前缀

EXERCISE

–GET invoices that are larger than the client’s average invoice amount

SELECT invoice_id
FROM invoices i
WHERE invoice_total > (
     SELECT AVG(invoice_total)
     FROM invoices
     WHERE client_id = i.client_id
)
6.8 THE EXISTS OPERATOR

– Select clients that have an invoice

使用exists运算符来查看发票里是否存在符合这个条件的行

SELECT *
FROM clients c
WHERE EXISTS (
     SELECT client_id
     FROM invoices
     WHERE client_id = c.client_id 
)

当我们使用EXISTS运算符,子查询并没有给外查询返回一个结果,它会返回一个指令说明这个子查询中是否有符合这个搜索条件的行;只要它找到这个表中有一条匹配这个条件的记录,它就会返回TRUE给EXISTS运算符,然后这个EXISTS运算符就会在最终结果里添加当前记录也就是当前客户

EXERCISE

–Find product which has never been purchased

SELECT *
FROM products p
WHERE NOT EXISTS (
     SELECT product_id
     FROM order_items
     WHERE product_id = p.product_id
)
6.9 SUBQUERIES IN THE SELECTIONN CLAUSE
SELECT 
    invoice_id,
    invoice_total,
    (SELECT AVG(invoice_total)
	 FROM invoices) AS invoice_average,
     invoice_total - (SELECT invoice_average) AS difference
FROM invoices

如果只用AVG(invoice_total)则只会显示一行

image-20201230155703988
EXERCISE
SELECT 
      c.client_id,
      name,
      (SELECT SUM(invoice_total)
        FROM invoices
        WHERE client_id = c.client_id) AS total_sales,
	  (SELECT AVG(invoice_total) FROM invoices) AS average,
	  (SELECT total_sales - average) AS difference
FROM clients c
image-20201230164026712
6.10 Subqueries in the FROM Clause

将上节课中EXERCISE中的表格当成是新表格

FROM (
  SELECT 
      c.client_id,
      name,
      (SELECT SUM(invoice_total)
        FROM invoices
        WHERE client_id = c.client_id) AS total_sales,
	  (SELECT AVG(invoice_total) FROM invoices) AS average,
	  (SELECT total_sales - average) AS difference
  FROM clients c
)AS total_summary
WHERE total_sales is NOT NULL

使用FROM语句可以让查询变得更加复杂,我们可以将其作为视图存储在数据库中,然后我们可以把这个视图取名为销售信息汇总,它会大大简化我们的查询

7.1 NUMERIC FUNCTIONS
ROUND(3.145,1)        --保留1位小数
TRUNCATE(5.7345,4)    --用来截断数字
CEILING(5.2)          --返回大于或等于这个数字的最小整数6
FLOOR(5.2)            --返回小于或等于这个数字的最大整数5
ABS(-5.5)             --用来计算绝对值
RAND()                --用来生成介于0-1之间的随机值

further information: search on web

mysql numeric functions

7.2 String Functions
SELECT LENGTH('sky'),
	   UPPER('sky'),
       LOWER('SKY'),
       LTRIM('      sky'), --移除字符串左侧的空白字符或其他预定义字符
       RTRIM('sky     '),  --移除字符串右侧的空白字符或其他预定义字符
       TRIM('   sky   ')   --它会删除所有前导或者尾随空格
       LEFT('Kingdergarten', 4), --提取出前4个字符
       RIGHT('Kingdergarten', 6), --提取出后6个字符
       substring('Kingdergarten',3,5), --从第3个位置往后取5个字符ngder
       LOCATE('n','Kingdergarten'), --查找kindergarten里n的位置/searching不区分大小写
       --如果查找不存在的字符,返回值为0(与其他编程语言返回值为-1不同)
       REPLACE('Kingdergarten','garten','garden'), --kindergarden
       CONCAT('first','last') --合并两个字符firstlast

further information: search on web

mysql string functions

7.3 Date Functions
SELECT NOW(),CURDATE(),CURTIME()
image-20201231122804768
SELECT YEAR(NOW()),MONTH(NOW()),DAY(NOW()),HOUR(NOW()),minute(NOW())
image-20201231123026692
SELECT MONTHNAME(NOW()),DAYNAME(NOW())
image-20201231123217379
SELECT EXTRACT(DAY FROM NOW())
image-20201231124137093
EXERCISE

获取当前年份的订单

SELECT *
FROM orders
WHERE YEAR(order_date) = YEAR(NOW())
7.4 Formatting Dates and Times
SELECT DATE_FORMAT(NOW(),'%M %D %Y'),
	   DATE_FORMAT(NOW(),'%m %d %y'),
       TIME_FORMAT(NOW(), '%H:%I %PM')
image-20201231131158596

Further information: mysql date format string

7.5 Calculating Dates and Times

在日期基础上增加一天或者一个小时或者计算两个日期的间隔

SELECT DATE_ADD(NOW(),INTERVAL 1 DAY),
       DATE_ADD(NOW(),INTERVAL -1 DAY),
       DATE_ADD(NOW(),INTERVAL 1 YEAR)
image-20201231133842990

计算两个日期的间隔DATEDIFF

SELECT DATEDIFF('2019-01-05','2019-01-01')
image-20201231134622338

注意:这个函数只返回天数的间隔,而不是小时或分钟

返回从零点计算的秒数TIME_TO_SEC

SELECT TIME_TO_SEC('09:00'),
       TIME_TO_SEC('09:00') - TIME_TO_SEC('09:02')
image-20201231135613504
7.6 The IFNULL and COALESCE Functions

IFNULL()

SELECT 
      order_id,
      IFNULL(shipper_id,'Not assigned') AS shipper
FROM orders

如果shipper_id为null则返回‘Not assigned’后面的字符

COALESCE()

SELECT 
      order_id,
      COALESCE(shipper_id,comments,'Not assigned') AS shipper
FROM orders

DIFFERENCE

IFNULL函数里可以用其他内容替换空值;而coalesce函数里我们提供一堆值,这个函数会返回这堆值中的第一个非空值

EXERCISE

将first_name 和 last_name 进行拼接

SELECT 
      CONCAT(first_name,' ',last_name) AS customer,
      IFNULL(phone,'Unknown')
FROM customers
7.7 The IF Function

有时会需要测试条件,并根据条件的成立与否返回不同值

IF(expression,first,second)

如果这个表达式被判定为真,这个函数会返回first,否则就会返回second;这可以是任何值,可以是字符串,可以是空值,可以是数字,可以是日期

SELECT 
     order_id,
     order_date,
     IF(
        YEAR(order_date) = YEAR(NOW()),
        'Active',
        'Archived'
     ) AS status
FROM orders
EXERCISE
image-20210101111701304
SELECT 
     product_id,
     name,
     COUNT(*) AS orders,
     IF(
        COUNT(*)>1,
        'Many times',
        'Once'
     ) AS 'frequency'
FROM products
JOIN order_items USING (product_id)
GROUP BY product_id
7.8 The CASE Operator

在IF()函数中,只有两种情况,但是在CASE()函数中,可以有无数种情况

SELECT 
     order_id,
     order_date,
     CASE
         WHEN YEAR(order_date) = YEAR(NOW()) THEN 'Active'
         WHEN YEAR(order_date) = YEAR(NOW()) -1 THEN 'Last Year'
         WHEN YEAR(order_date) < YEAR(NOW()) -1 THEN 'Archived'
         ELSE 'Future'
	 END AS category
FROM orders
EXERCISE

根据customer的分数对他们进行等级划分

SELECT 
      CONCAT(first_name,' ',last_name),
      points,
      CASE
          WHEN points > 3000 THEN 'Gold'
          WHEN points>2000& points <3000 THEN 'Silver'
          WHEN points <2000 THEN 'Bronze'
	  END AS 'category'
FROM customers

der_id,
IFNULL(shipper_id,‘Not assigned’) AS shipper
FROM orders


如果shipper_id为null则返回‘Not assigned’后面的字符

**COALESCE()**

```sql
SELECT 
      order_id,
      COALESCE(shipper_id,comments,'Not assigned') AS shipper
FROM orders

DIFFERENCE

IFNULL函数里可以用其他内容替换空值;而coalesce函数里我们提供一堆值,这个函数会返回这堆值中的第一个非空值

EXERCISE

将first_name 和 last_name 进行拼接

SELECT 
      CONCAT(first_name,' ',last_name) AS customer,
      IFNULL(phone,'Unknown')
FROM customers
7.7 The IF Function

有时会需要测试条件,并根据条件的成立与否返回不同值

IF(expression,first,second)

如果这个表达式被判定为真,这个函数会返回first,否则就会返回second;这可以是任何值,可以是字符串,可以是空值,可以是数字,可以是日期

SELECT 
     order_id,
     order_date,
     IF(
        YEAR(order_date) = YEAR(NOW()),
        'Active',
        'Archived'
     ) AS status
FROM orders
EXERCISE
image-20210101111701304
SELECT 
     product_id,
     name,
     COUNT(*) AS orders,
     IF(
        COUNT(*)>1,
        'Many times',
        'Once'
     ) AS 'frequency'
FROM products
JOIN order_items USING (product_id)
GROUP BY product_id
7.8 The CASE Operator

在IF()函数中,只有两种情况,但是在CASE()函数中,可以有无数种情况

SELECT 
     order_id,
     order_date,
     CASE
         WHEN YEAR(order_date) = YEAR(NOW()) THEN 'Active'
         WHEN YEAR(order_date) = YEAR(NOW()) -1 THEN 'Last Year'
         WHEN YEAR(order_date) < YEAR(NOW()) -1 THEN 'Archived'
         ELSE 'Future'
	 END AS category
FROM orders
EXERCISE

根据customer的分数对他们进行等级划分

SELECT 
      CONCAT(first_name,' ',last_name),
      points,
      CASE
          WHEN points > 3000 THEN 'Gold'
          WHEN points>2000& points <3000 THEN 'Silver'
          WHEN points <2000 THEN 'Bronze'
	  END AS 'category'
FROM customers

你可能感兴趣的:(python数据分析,mysql)