list comprehensions

Learn about list comprehensions and the None type while finding common names for U.S. legislators.

1. The Data Set

In the previous mission, we worked with legislators.csv, which contains information on every person who has served in the U.S. Congress. We cleaned up some missing data and added a column for birth year.

We'll continue to work with the same data set in this mission. Here's a preview of it in CSV format:


last_name,first_name,birthday,gender,type,state,party,birth_year
Bassett,Richard,1745-04-02,M,sen,DE,Anti-Administration,1745
Bland,Theodorick,1742-03-21,M,rep,VA,1742
Burke,Aedanus,1743-06-16,M,rep,SC,1743
Carroll,Daniel,1730-07-22,M,rep,MD,1730

In this mission, we'll use the data to find the most common names among U.S. legislators of each gender. Before diving into this, we'll explore some critical concepts, such as enumeration.


2. Enumerate

There are many situations where we'll need to iterate over multiple lists in tandem, such as this one:

animals = ["Dog", "Tiger", "SuperLion", "Cow", "Panda"]
viciousness = [1, 5, 10, 10, 1]
for animal in animals:
    print("Animal")
    print(animal)
    print("Viciousness")

In the example above, we have two lists. The second list describes the viciousness of the animals in the first list. A Dog has a viciousness level of 1, and a SuperLion has a viciousness level of 10. We want to retrieve the position of the item in animals the loop is currently on, so we can use it to look up the corresponding value in the viciousness list.

Unfortunately, we can't just loop through animals, and then tap into the second list. Python has anenumerate() function that can help us with this, though. The enumerate()function allows us to have two variables in the body of a for loop -- an index, and the value.


for i ,animal in enumerate(animals):
    print('animal index')
    print(i)
    print('animal')
    print(animal)
    

On every iteration of the loop, the value for i will become the value of the index in animals that corresponds to that iteration. animal will take on the value in animalsthat corresponds to the index i.

Here's another example of how we can use the enumerate() function to iterate over multiple lists in tandem:

animals = ["Dog", "Tiger", "SuperLion", "Cow", "Panda"]
viciousness = [1, 5, 10, 10, 1]
for i, animal in enumerate(animals):
    print("Animal")
    print(animal)
    print("Viciousness")
    print(viciousness[i])

In this example, we use the index variablei to index theviciousness list, and print the viciousness value that corresponds to the same index in animals.


  • Enumerate the ships list using a for loop and the enumerate() function.
  • For each iteration of the loop:
    • Print the item from ships at the current index.
    • Print the item from carsat the current index.
ships = ["Andrea Doria", "Titanic", "Lusitania"]
cars = ["Ford Edsel", "Ford Pinto", "Yugo"]

for i ,ship in enumerate(ships):
    print(ship)
    print(cars[i])


3. Adding Columns

We can even use the enumerate() function to add columns to lists of lists. For example, here's some starter code:


door_count = [4, 4]
cars = [
        ["black", "honda", "accord"],
        ["red", "toyota", "corolla"]
       ]

We can add a column to cars by appending a value to each inner list:


for i ,car in enumerate(cars):
    car.append(door_count[i])

In the code above, we:

  • Use the enumerate() function to loop across each item in cars.
  • Find the corresponding value in door_count that has the index i (the same index as the current item in cars).
  • Add the value in door_count with index i to car.
  • After the code runs, each row in carswill have a door_count column.

Let's reinforce what we've learned by completing an exercise.


  • Loop through each row in things using the enumerate() function.
  • Append the item in trees that has the same index (as the current thing) to the end of each row inthings.
  • After the code runs, things should have an extra column.

things = [
    ["apple", "monkey"], 
          ["orange", "dog"], 
          ["banana", "cat"]]

trees = ["cedar", "maple", "fig"]

for i ,thing in enumerate(things):
    thing.append(trees[i])
    
print(things)

* * *

4. List Comprehensions

We've written many short for loops to manipulate lists. Here's an example:


animals = ["Dog", "Tiger", "SuperLion", "Cow", "Panda"]

animal_lengths = []
for animal in animals:
    animal_lengths.append(len(animal))

This comprehension consists of the list operation len(animal), the loop variable animal, and the list that we're iterating over, animals.

Logically, the list comprehension:

  • Loops through each element in the animals list and assigns the current element to animal
  • Finds the length of each animalstring
  • Generates a new list that contains all of the lengths as elements
  • Assigns the new list to animal_lengths

List comprehensions are much more compact notation, and can save space when you need to write multiple for loops.


  • Use list comprehension to create a new list called apple_prices_doubled, where you multiply each item in apple_prices by 2.
  • Use list comprehension to create a new list called apple_prices_lowered, where you subtract 100 from each item inapple_prices.
apple_prices = [100, 101, 102, 105]

apple_prices_doubled = [price*2 for price in apple_prices]
apple_prices_lowered = [price-100  for price in apple_prices]
print(apple_prices_doubled)
print(apple_prices_lowered)


5. Counting Female Names

Let's count how many times each female first name occurs in legislators. To limit our count to names from the modern era, we'll only look at those that appear after 1940. While names like Theodorick were common prior to 1940, they're rare today.

Here's a preview of what this dictionary will look like:


{
    'Nancy': 1, 
    'Sandy': 1, 
    'Carolyn': 1, 
    'Melissa': 2, 
    'Jo Ann': 2,
    ...
}

Now, let's work on creating it!


  • Create an empty dictionary called name_counts.
  • Loop through each row inlegislators.
  • If the gender column of the row equalsF and the year column is greater than 1940:
    • Assign the first_name column of the row to the variable name.
    • If name is in name_counts:
      • Add 1 to the value associatedwithname inname_counts.
    • If nameisn't in name_counts:
      • Set the value associated withname inname_counts to 1.
  • When the loop finishes, name_counts should contain each unique name in the first_name column of legislators as a key, and the corresponding number of times it appeared as the value.


import csv
legislators = list(csv.reader(open('legislators_add_year.csv',)))

name_counts ={}

for row in legislators:
    if row[3]=='F' and int(row[7])>=1940:
        name = row[1]
        if name in name_counts:
            name_counts[name] +=1
        else:
            name_counts[name]=1
name_counts

6 None

Let's say we're trying to find the maximum value in a list. We might write some code that looks like this:

values = [50,60,70]
max_value = 0
for i in values:
    if i> max_value:
        max_value =i

We setmax_value to a low value so that everything's greater than it. But what if we changed the values list slightly?


values = [-50, -80, -100]
max_value = 0
for i in values:
    if i > max_value:
        max_value = i

In the above scenario,max_value is 0 when the loop finishes. This is wrong, because 0 isn't in values; it's just a placeholder we used to initialize max_value.

We can resolve this kind of issue using the None object, which has a special data type called NoneType.

The Noneobject indicates that the variable has no value. Rather than using the normal double equals sign (==) to check whether a value equals None, we use the variable is None syntax.

The is comparison operator checks for object equality. Using is instead of == prevents some custom classes from resolving to True when compared with None. We'll explore how to use operators with the None object in greater depth during a later mission. For now, let's see what the variable is None syntax looks like:

values = [-50, -80, -100]
max_value = None
for i in values:
    if max_value is None or i > max_value:
        max_value = i

In the example above, we:

  • Initialize max_value to None.
  • Loop through each item in values.
  • Check whether max_value equals None using the max_value is Nonesyntax.
  • If max_valueequals None, or if i > max_value, then we assign the value ofi to max_value.
  • At the end of the loop,max_value will equal -50, which is the largest value in values.

7. Comparing with None

Comparing a value to None will usually generate an error. This is actually helpful when we're writing code, because it prevents unexpected variables from being None. For example, this code will cause an error:


a = None
a > 10

Therefore, when a value could potentially be None, and we want to compare it to another value, we should always include code that checks whether it actually is Nonefirst.

We can use two Boolean statements joined by or to do this. Here's an example:

max_value is None or i > max_value

The Python interpreter will evaluate the two statements in order. If the first statement is True, it won't evaluate the second one. This saves time, since when one statement is True, the whole or conditional is True.

The following code will assign True tob ifais None, or if ais greater than 10:

a = None
b = a is None or a > 10

The same logic applies to an and statement. Because both conditions have to be True, if the first one isFalse, the Python interpreter won't evaluate the second one. The example below shows how to write an and statement involvingNonethat won't return an error. It will assign True to b if a does not equal None and a is greater than 10:


a = None
b = a is not None and a > 10

Let's give this a try in our next exercise!


  • Loop through each value in values.
  • Check whether the value is notNone, and if it's greater than 30.
  • Append the result of the check to checks.
  • When finished, checks should be a list of Booleans indicating whether or not the corresponding items invalues are not None and greater than 30.
values = [None, 10, 20, 30, None, 50]
checks = []

checks = [a is not None and a > 30 for a in values]
checks


8. Highest Female Name Count

name_counts is a dictionary where the keys are female first names from legislators, and the values are the number of times the names occured after 1940.

In order to extract the most common names from this dictionary, we need to determine the highest totals inname_counts. Once we know the totals, we can find the keys for them.

We can iterate through all of the keys in a dictionary like this:

fruits = {
        "apple": 2,
        "orange": 5,
        "melon": 10
    }

for fruit in fruits:
    rating = fruits[fruit]

In the loop above, we iterate through each key in fruits. We can access the corresponding value using fruits[fruit].

Let's identify the highest totals in the next exercise.


  • Set max_valueto None.

  • Loop through the keys inname_counts.

  • Assign the value associated with the key to count.

  • If max_value is None, or count is greater than max_value:

    • Assign count to max_value.
  • At the end of the loop, max_value will contain the largest value inname_counts.

max_value =None
for name_count in name_counts:
    count =  name_counts[name_count]
    if max_value is None or count> max_value:
        max_value = count
        
max_value



9. The Items Method

The code we used on the previous screen to access the keys and values in a dictionary was slightly awkward. We can simplify this process with the items() method, which allows us to iterate through keys and values at the same time.


fruits = {
    "apple": 2,
    "orange": 5,
    "melon": 10
}

for fruit, rating in fruits.items():
    print(rating)

The items() method makes our code clearer and more compact.


  • Use the items() method to iterate through the keys and values inplant_types.
  • Print each key in plant_types.
  • Print each value in plant_types.

plant_types = {"orchid": "flower", "cedar": "tree", "maple": "tree"}

for plant ,types in plant_types.items():
    print (types)
    print(plant)
    print (plant + '!')

10 Finding the Most Common Female Names

As we learned on a previous screen, the most common female names occur two times in name_counts. Therefore, we want to extract any keys in name_counts that have the value 2.

  • Loop through the keys inname_counts.
  • If any value in name_counts equals 2, append its key to top_female_names.
  • When you're finished, top_female_names will be a list of the most common names of female legislators.

######method 1
top_female_names =[]

for names,counts in name_counts.items():
    if counts==2:
        top_female_names.append(names)
print(top_female_names)        

#############method 2
top_female_names_1 =[]
top_female_names_1=[names 
                    for names,counts_1 
                    in name_counts.items() 
                    if counts_1  ==2
                   ]

print (top_female_names)

11. Finding the Most Common Male Names

Now that we know how to find the most common female names, we can repeat the same process for male names.


  • Create a dictionary called male_name_counts.
  • Loop through legislators.
    • Count how many times each name with "M" in the gender column and a birth year after 1940 occurs.

    • Store the results inmale_name_counts.

  • Find the highest value in male_name_counts and assign it tohighest_male_count.
  • Append any keys from male_name_counts with a value equal to highest_male_countto top_male_names.
male_name_counts ={}
top_male_names =[]

for row in legislators:
    if row[3] == 'M' and int(row[7]) > 1940:
        name =row[1]
        if name in male_name_counts:
            male_name_counts[name] +=1
        else:
            male_name_counts[name] =1
            
highest_male_count = None
for name ,count in male_name_counts.items():
    if highest_male_count is  None or  count > highest_male_count:
        highest_male_count = count
        
for name, count in male_name_counts.items():
    if count == highest_male_count:
        top_male_names.append(name)
        
print (male_name_counts)
print(top_male_names)

你可能感兴趣的:(list comprehensions)