sql基础回顾

Write a SQL query that returns the first 10 rows from recent_grads.

SELECT * FROM recent_grads LIMIT 10

Write a SQL query that returns the majors where females were a minority.
Only return the Major and ShareWomen columns (in that order) and don't limit the number of rows returned.

SELECT Major,ShareWomen FROM recent_grads 
WHERE ShareWomen < 0.5

SELECT * FROM recent_grads
WHERE Major_category = 'Engineering' AND ShareWomen > 0.5

Write a SQL query that returns:

  • all majors with majority female and
  • all majors had a median salary greater than 50000.

Only include the following columns in the results and in this order:

  • Major
  • Major_category
  • Median
  • ShareWomen
SELECT Major ,Major_category,Median,ShareWomen FROM recent_grads
WHERE Median > 50000 AND ShareWomen > 0.5

Write a SQL query that returns the first 20 majors that either:

  • have a Median salary greater than or equal to 10,000, or
  • have less than or equal to 1,000 Unemployed people
    Only include the following columns in the results and in this order:
  • Major
  • Median
  • Unemployed
SELECT Major,Median,Unemployed FROM recent_grads
WHERE Median > 10000 OR Unemployed <= 1000
LIMIT 20

Run the query we explored above, which returns all majors that:

  • fell under the category of Engineering and
  • either
    • had mostly women graduates
    • or had an unemployment rate below 5.1%, which was the rate in August 2015

Only include the following columns in the results and in this order:

  • Major
  • Major_category
  • ShareWomen
  • Unemployment_rate
SELECT Major,Major_category,ShareWomen,Unemployment_rate FROM recent_grads
WHERE (Major_category = 'Engineering') AND (Unemployment_rate < 0.051 OR ShareWomen > 0.5)

ordering use order by

Write a query that returns all majors where:

  • ShareWomen is greater than 0.3
  • and Unemployment_rate is less than .1
    Only include the following columns in the results and in this order:
  • Major,
  • ShareWomen,
  • Unemployment_rate

Order the results in descending order by the ShareWomen column.

SELECT Major,
ShareWomen,Unemployment_rate 
FROM recent_grads
WHERE ShareWomen > 0.3 
AND Unemployment_rate < 0.1
ORDER BY ShareWomen DESC

Write a query that returns the Engineering or Physical Sciences majors in asecending order of unemployment rates.

  • The results should only contain the Major_category, Major, and Unemployment_rate columns.

SELECT Major_category,
Major,
Unemployment_rate 
FROM recent_grads
WHERE 
Major_category = 'Engineering' OR
 Major_category = 'Physical Sciences'
ORDER BY Unemployment_rate

count

Write a query that returns the number of majors with mostly male students.
Use all caps in the SELECT clause so our answer checking will match - COUNT(Major).

SELECT  COUNT(Major) FROM  recent_grads WHERE ShareWomen < 0.5

min&max

Write a query that returns the Engineering major with the lowest median salary.

  • We only want the Major, Major_category, and MIN(Median) columns in the result.
SELECT Major,Major_category ,MIN(Median) FROM recent_grads
WHERE Major_category="Engineering"

sum&avg

Write a query that computes the sum of the Total column. - Return only the total number of students integer value.

Select SUM(Total) from  recent_grads

Write a query that computes the average of the Total column, the minimum of the Men column, and the maximum of the Women column, in that specific order.

  • Make sure that all of the aggregate functions are capitalized (SUM() not sum(), etc), so our results match yours.
SELECT AVG(Total),
MIN(Men),
MAX(Women)
FROM recent_grads

as

We can specify an arbitrary phrase as a string using quotation marks:

SELECT COUNT(*) as "Total Students" FROM recent_grads

Even better, we can drop AS entirely and just add the name next to the original column:

SElECT COUNT(*) "Total Students" FROM recent_grads

Write a query that returns, in the following order:

  • the number of rows as Number of Students
  • the maximum value of Unemployment_rate as Highest Unemployment Rate
SELECT COUNT(*) "Number of Students",
MAX (Unemployment_rate) 
'Highest Unemployment Rate'
FROM recent_grads

distinct (unique)

We can return all of the unique values in a column using the DISTINCT statement.

SELECT DISTINCT Major_category FROM recent_grads

Write a query that returns the number of unique values in the Major, Major_category, and Major_code columns. Use the following aliases in the following order:

  • For the unique value count of the Major column, use the alias unique_majors.
  • For the unique value count of the Major_category column, use the alias unique_major_categories.
  • For the unique value count of the Major_code column, use the alias unique_major_codes
SELECT 
COUNT(DISTINCT(Major)) 
unique_majors,
COUNT(DISTINCT(Major_category)) 
unique_major_categories,
COUNT(DISTINCT(Major_code)) 
unique_major_codes
FROM recent_grads

arithmetic

SQL supports the standard arithmetic operators: *, +, -, and /

Write a query that computes the difference between the 25th and 75th percentile of salaries for all majors.

  • Return the Major column first, using the default column name.
  • Return the Major_category column second, using the default column name.
  • Return the compute difference between the 25th and 75th percentile third, using the alias quartile_spread.
  • Order the results from lowest to highest and only return the first 20 results.
SELECT Major,
Major_category,
(P75th-P25th) quartile_spread
FROM recent_grads
ORDER BY quartile_spread
LIMIT 20


group by

Use the SELECT statement to select the following columns and aggregates in a query:

  • Major_category
  • AVG(ShareWomen)

Use the GROUP BY statement to group the query by the Major_category column.

SELECT Major_category,AVG(ShareWomen)
2
FROM recent_grads
3
GROUP BY Major_category


having

Sometimes we want to select a subset of rows after performing a GROUP BY query. On the last screen, for instance, we may have wanted to select only those rows where share_employed is greater than .8. We can't use the WHERE clause to do this because share_employed isn't a column in recent_grads; it's actually a virtual column generated by the GROUP BY statement.

When we want to filter on a column generated by a GROUP BY query, we can use the HAVING statement. Here's an example:

SELECT Major_category, AVG(Employed) / AVG(Total) AS share_employed 
FROM recent_grads 
GROUP BY Major_category 
HAVING share_employed > .8;

Find all of the major categories where the share of graduates with low-wage jobs is greater than .1.

  • Use the SELECT statement to select the following columns and aggregates in a query:
    • Major_category
    • AVG(Low_wage_jobs) / AVG(Total) as share_low_wage
  • Use the GROUP BY statement to group the query by the Major_category column.
  • Use the HAVING statement to restrict the selection to rows where share_low_wage is greater than .1.
SELECT Major_category,
AVG(Low_wage_jobs)/AVG(Total) AS
share_low_wage
FROM recent_grads
GROUP BY Major_category
HAVING share_low_wage >.1

round() function

On the last screen, the percentages in our results were very long and hard to read (e.g., 0.16833085991095678). We can use the SQL ROUND function in our query to round them.

Write a SQL query that returns the following columns of recent_grads (in the same order):

  • ShareWomen rounded to 4 decimal places
  • Major_category

Limit the results to 10 rows.

SELECT ROUND(ShareWomen, 4),
Major_category 
FROM recent_grads 
LIMIT 10

nesting functions

Use the SELECT statement to select the following columns and aggregates in a query:

  • Major_category
  • AVG(College_jobs) / AVG(Total) as share_degree_jobs
    • Use the ROUND function to round share_degree_jobs to 3 decimal places.

Group the query by the Major_category column.
Only select rows where share_degree_jobs is less than .3.

SELECT Major_category,
ROUND(AVG(College_jobs)/AVG(Total),3) AS
share_degree_jobs
FROM recent_grads
GROUP BY Major_category
HAVING share_degree_jobs < .3

casting

We can use the PRAGMA TABLE_INFO() statement by itself to return the type, along with some other information, for each column:

PRAGMA TABLE_INFO(recent_grads)

We need to instead use the CAST() function to the Float type:

SELECT CAST(Women as Float) / CAST(Total as Float) FROM recent_grads limit 5

Write a query that divides the sum of the Women column by the sum of the Total column, aliased as SW.

Group the results by Major_category and order by SW.

The results should only contain the Major_category and SW columns, in that order.

SELECT Major_category, 

Cast(SUM(Women) as Float)/Cast(SUM(Total) as Float) SW

FROM recent_grads 

GROUP BY Major_category 

ORDER BY SW

Which rows are above the average for the ShareWomen column?

SELECT * FROM recent_grads
where sharewomen > (SELECT AVG(sharewomen)) FROM recent_grads

subqueries

To determine which majors are above the average for the ShareWomen column, we need to:

  • first determine the average value for the ShareWomen column
  • then select and filter the rows that are greater than the average value

The designers of SQL, a declarative programming language, want it's users to focus on expressing computations over explicitly defining, setting, and juggling variables.

A subquery is a query nested within another query. Here's a template for a SQL statement where the subquery resides in the WHERE clause.
The subquery is run first.
subquery needs to be a full query (contain SELECT and FROM clauses, etc).
a subquery must always be contained within parentheses ().

Write a query that returns the majors that are below the average for Unemployment_rate. The results should:

  • only contain the Major and Unemployment_rate columns
  • be sorted in ascending order by Unemployment_rate
SELECT Major,Unemployment_rate 
FROM recent_grads
WHERE Unemployment_rate < (
    SELECT AVG(Unemployment_rate) 
    FROM recent_grads
)
ORDER BY  Unemployment_rate

Subquery In SELECT

Write a SQL statement that computes the proportion (as a float value) of rows that contain above average values for the ShareWomen.

The results should only return the proportion, aliased as proportion_abv_avg.

SELECT CAST( COUNT(*) AS float
           )/CAST ( (
               SELECT COUNT (*) 
               FROM recent_grads )
                   AS float)
AS proportion_abv_avg            
FROM recent_grads
WHERE ShareWomen > 
(SELECT AVG (ShareWomen
            )FROM recent_grads)

Returning Multiple Results In Subqueries

Using the IN operator, we can specify a list of values that we want to match against in the WHERE clause.
All rows that match exactly will be returned. The following query returns the rows where Major_category equals either Business or Engineering:

SELECT Major, Major_category FROM recent_grads
WHERE Major_category IN ('Business', 'Engineering')
LIMIT 7

Instead of returning the rows where Major_category equals one of 2 specific values, we can write a subquery that returns the Major_category with the 5 highest group level sums for the Total column:

SELECT Major_category FROM recent_grads
GROUP BY Major_category
ORDER BY SUM(Total) DESC
LIMIT 5

Write a query that returns the Major and Major_category columns for the rows where:

  • Major_category is one of the 5 highest group level sums for the Total column
SELECT Major,Major_category
FROM recent_grads
WHERE Major_category IN
(SELECT Major_category 
 FROM recent_grads
group by Major_category
order by SUM(Total)
limit 5)

Write a query that returns the average ratio (Sample_size/Total)) for all of the majors.

  • You'll need to cast both columns to the float type.
  • Use the alias avg_ratio for the average ratio.
SELECT AVG
(
    cast(Sample_size as float)/ 
    cast(Total as float)
)  avg_ratio
from recent_grads

Practice Integrating A Subquery With The Outer Query

Now that we have a subquery that calculates the average ratio (of Sample_size to Total), we can return the rows that exceed this average.

Write a query that:

  • selects the Major, Major_category, and the computed ratio columns
  • filters to just the rows where ratio is greater than avg_ratio:
    • recall that this value is the result of the subquery from the last screen: select AVG(cast(Sample_size as float)/cast(Total as float)) avg_ratio from recent_grads
select Major,
Major_category,
cast(Sample_size as float)/ 
cast(Total as float)
ratio
from recent_grads
where ratio > (select 
               AVG(cast(Sample_size as float)/cast(Total as float)) 
               avg_ratio 
               from recent_grads)

你可能感兴趣的:(sql基础回顾)