Learn about list comprehensions and the None type while finding common names for U.S. legislators.
1. The Data Set
In the previous mission, we worked with legislators.csv
, which contains information on every person who has served in the U.S. Congress. We cleaned up some missing data and added a column for birth year.
We'll continue to work with the same data set in this mission. Here's a preview of it in CSV format:
last_name,first_name,birthday,gender,type,state,party,birth_year
Bassett,Richard,1745-04-02,M,sen,DE,Anti-Administration,1745
Bland,Theodorick,1742-03-21,M,rep,VA,1742
Burke,Aedanus,1743-06-16,M,rep,SC,1743
Carroll,Daniel,1730-07-22,M,rep,MD,1730
In this mission, we'll use the data to find the most common names among U.S. legislators of each gender. Before diving into this, we'll explore some critical concepts, such as enumeration.
2. Enumerate
There are many situations where we'll need to iterate over multiple lists in tandem, such as this one:
animals = ["Dog", "Tiger", "SuperLion", "Cow", "Panda"]
viciousness = [1, 5, 10, 10, 1]
for animal in animals:
print("Animal")
print(animal)
print("Viciousness")
In the example above, we have two lists. The second list describes the viciousness of the animals in the first list. A Dog has a viciousness level of 1, and a SuperLion has a viciousness level of 10. We want to retrieve the position of the item in animals the loop is currently on, so we can use it to look up the corresponding value in the viciousness list.
Unfortunately, we can't just loop through animals, and then tap into the second list. Python has anenumerate()
function that can help us with this, though. The enumerate()
function allows us to have two variables in the body of a for loop -- an index, and the value.
for i ,animal in enumerate(animals):
print('animal index')
print(i)
print('animal')
print(animal)
On every iteration of the loop, the value for i
will become the value of the index in animals
that corresponds to that iteration. animal
will take on the value in animals
that corresponds to the index i
.
Here's another example of how we can use the enumerate()
function to iterate over multiple lists in tandem:
animals = ["Dog", "Tiger", "SuperLion", "Cow", "Panda"]
viciousness = [1, 5, 10, 10, 1]
for i, animal in enumerate(animals):
print("Animal")
print(animal)
print("Viciousness")
print(viciousness[i])
In this example, we use the index variablei
to index theviciousness
list, and print the viciousness value that corresponds to the same index in animals
.
- Enumerate the
ships
list using a for loop and theenumerate()
function. - For each iteration of the loop:
- Print the item from
ships
at the current index. - Print the item from
cars
at the current index.
- Print the item from
ships = ["Andrea Doria", "Titanic", "Lusitania"]
cars = ["Ford Edsel", "Ford Pinto", "Yugo"]
for i ,ship in enumerate(ships):
print(ship)
print(cars[i])
3. Adding Columns
We can even use the enumerate()
function to add columns to lists of lists. For example, here's some starter code:
door_count = [4, 4]
cars = [
["black", "honda", "accord"],
["red", "toyota", "corolla"]
]
We can add a column to cars
by appending a value to each inner list:
for i ,car in enumerate(cars):
car.append(door_count[i])
In the code above, we:
- Use the
enumerate()
function to loop across each item incars
. - Find the corresponding value in
door_count
that has the indexi
(the same index as the current item incars
). - Add the value in
door_count
with indexi
to car. - After the code runs, each row in
cars
will have adoor_count
column.
Let's reinforce what we've learned by completing an exercise.
- Loop through each row in
things
using theenumerate()
function. - Append the item in
trees
that has the same index (as the currentthing
) to the end of each row inthings
. - After the code runs,
things
should have an extra column.
things = [
["apple", "monkey"],
["orange", "dog"],
["banana", "cat"]]
trees = ["cedar", "maple", "fig"]
for i ,thing in enumerate(things):
thing.append(trees[i])
print(things)
* * *
4. List Comprehensions
We've written many short for loops to manipulate lists. Here's an example:
animals = ["Dog", "Tiger", "SuperLion", "Cow", "Panda"]
animal_lengths = []
for animal in animals:
animal_lengths.append(len(animal))
This comprehension consists of the list operation len(animal)
, the loop variable animal
, and the list that we're iterating over, animals.
Logically, the list comprehension:
- Loops through each element in the
animals
list and assigns the current element toanimal
- Finds the length of each
animal
string - Generates a new list that contains all of the lengths as elements
- Assigns the new list to
animal_lengths
List comprehensions are much more compact notation, and can save space when you need to write multiple for loops.
- Use list comprehension to create a new list called
apple_prices_doubled
, where you multiply each item inapple_prices
by 2. - Use list comprehension to create a new list called
apple_prices_lowered
, where you subtract 100 from each item inapple_prices
.
apple_prices = [100, 101, 102, 105]
apple_prices_doubled = [price*2 for price in apple_prices]
apple_prices_lowered = [price-100 for price in apple_prices]
print(apple_prices_doubled)
print(apple_prices_lowered)
5. Counting Female Names
Let's count how many times each female first name occurs in legislators
. To limit our count to names from the modern era, we'll only look at those that appear after 1940. While names like Theodorick were common prior to 1940, they're rare today.
Here's a preview of what this dictionary will look like:
{
'Nancy': 1,
'Sandy': 1,
'Carolyn': 1,
'Melissa': 2,
'Jo Ann': 2,
...
}
Now, let's work on creating it!
- Create an empty dictionary called
name_counts
. - Loop through each row in
legislators
. - If the
gender
column of the row equalsF
and theyear
column is greater than 1940:- Assign the
first_name
column of the row to the variablename
. - If
name
is inname_counts
:- Add 1 to the value associated
with
name inname_counts
.
- Add 1 to the value associated
- If
name
isn't inname_counts
:- Set the value associated with
name
inname_counts
to 1.
- Set the value associated with
- Assign the
- When the loop finishes,
name_counts
should contain each unique name in thefirst_name
column oflegislators
as a key, and the corresponding number of times it appeared as the value.
import csv
legislators = list(csv.reader(open('legislators_add_year.csv',)))
name_counts ={}
for row in legislators:
if row[3]=='F' and int(row[7])>=1940:
name = row[1]
if name in name_counts:
name_counts[name] +=1
else:
name_counts[name]=1
name_counts
6 None
Let's say we're trying to find the maximum value in a list. We might write some code that looks like this:
values = [50,60,70]
max_value = 0
for i in values:
if i> max_value:
max_value =i
We setmax_value
to a low value so that everything's greater than it. But what if we changed the values list slightly?
values = [-50, -80, -100]
max_value = 0
for i in values:
if i > max_value:
max_value = i
In the above scenario,max_value
is 0 when the loop finishes. This is wrong, because 0 isn't in values; it's just a placeholder we used to initialize max_value
.
We can resolve this kind of issue using the None
object, which has a special data type called NoneType.
The None
object indicates that the variable has no value. Rather than using the normal double equals sign (==
) to check whether a value equals None
, we use the variable is None
syntax.
The is
comparison operator checks for object equality. Using is
instead of ==
prevents some custom classes from resolving to True when compared with None. We'll explore how to use operators with the None
object in greater depth during a later mission. For now, let's see what the variable is None
syntax looks like:
values = [-50, -80, -100]
max_value = None
for i in values:
if max_value is None or i > max_value:
max_value = i
In the example above, we:
- Initialize
max_value
toNone
. - Loop through each item in
values
. - Check whether
max_value
equalsNone
using themax_value is None
syntax. - If
max_value
equalsNone
, orif i > max_value
, then we assign the value ofi
tomax_value
. - At the end of the loop,
max_value
will equal -50, which is the largest value invalues
.
7. Comparing with None
Comparing a value to None will usually generate an error. This is actually helpful when we're writing code, because it prevents unexpected variables from being None. For example, this code will cause an error:
a = None
a > 10
Therefore, when a value could potentially be None
, and we want to compare it to another value, we should always include code that checks whether it actually is None
first.
We can use two Boolean statements joined by or to do this. Here's an example:
max_value is None or i > max_value
The Python interpreter will evaluate the two statements in order. If the first statement is True
, it won't evaluate the second one. This saves time, since when one statement is True
, the whole or
conditional is True
.
The following code will assign True
tob
ifa
is None, or if a
is greater than 10:
a = None
b = a is None or a > 10
The same logic applies to an and
statement. Because both conditions have to be True
, if the first one isFalse
, the Python interpreter won't evaluate the second one. The example below shows how to write an and
statement involvingNone
that won't return an error. It will assign True to b if a does not equal None and a is greater than 10:
a = None
b = a is not None and a > 10
Let's give this a try in our next exercise!
- Loop through each value in
values
. - Check whether the value is not
None
, and if it's greater than 30. - Append the result of the check to
checks
. - When finished,
checks
should be a list of Booleans indicating whether or not the corresponding items invalues
are not None and greater than 30.
values = [None, 10, 20, 30, None, 50]
checks = []
checks = [a is not None and a > 30 for a in values]
checks
8. Highest Female Name Count
name_counts
is a dictionary where the keys are female first names from legislators
, and the values are the number of times the names occured after 1940.
In order to extract the most common names from this dictionary, we need to determine the highest totals inname_counts
. Once we know the totals, we can find the keys for them.
We can iterate through all of the keys in a dictionary like this:
fruits = {
"apple": 2,
"orange": 5,
"melon": 10
}
for fruit in fruits:
rating = fruits[fruit]
In the loop above, we iterate through each key in fruits
. We can access the corresponding value using fruits[fruit]
.
Let's identify the highest totals in the next exercise.
Set
max_value
to None.Loop through the keys in
name_counts
.Assign the value associated with the key to
count
.-
If
max_value
isNone
, orcount
is greater thanmax_value
:- Assign count to
max_value
.
- Assign count to
At the end of the loop, max_value will contain the largest value in
name_counts
.
max_value =None
for name_count in name_counts:
count = name_counts[name_count]
if max_value is None or count> max_value:
max_value = count
max_value
9. The Items Method
The code we used on the previous screen to access the keys and values in a dictionary was slightly awkward. We can simplify this process with the items() method, which allows us to iterate through keys and values at the same time.
fruits = {
"apple": 2,
"orange": 5,
"melon": 10
}
for fruit, rating in fruits.items():
print(rating)
The items() method makes our code clearer and more compact.
- Use the
items()
method to iterate through the keys and values inplant_types
. - Print each key in
plant_types
. - Print each value in
plant_types
.
plant_types = {"orchid": "flower", "cedar": "tree", "maple": "tree"}
for plant ,types in plant_types.items():
print (types)
print(plant)
print (plant + '!')
10 Finding the Most Common Female Names
As we learned on a previous screen, the most common female names occur two times in name_counts
. Therefore, we want to extract any keys in name_counts
that have the value 2
.
- Loop through the keys in
name_counts
. - If any value in
name_counts
equals2
, append its key totop_female_names
. - When you're finished,
top_female_names
will be a list of the most common names of female legislators.
######method 1
top_female_names =[]
for names,counts in name_counts.items():
if counts==2:
top_female_names.append(names)
print(top_female_names)
#############method 2
top_female_names_1 =[]
top_female_names_1=[names
for names,counts_1
in name_counts.items()
if counts_1 ==2
]
print (top_female_names)
11. Finding the Most Common Male Names
Now that we know how to find the most common female names, we can repeat the same process for male names.
- Create a dictionary called
male_name_counts
. - Loop through
legislators
.Count how many times each name with "
M
" in the gender column and a birth year after1940
occurs.Store the results in
male_name_counts
.
- Find the highest value in
male_name_counts
and assign it tohighest_male_count
. - Append any keys from
male_name_counts
with a value equal tohighest_male_count
totop_male_names
.
male_name_counts ={}
top_male_names =[]
for row in legislators:
if row[3] == 'M' and int(row[7]) > 1940:
name =row[1]
if name in male_name_counts:
male_name_counts[name] +=1
else:
male_name_counts[name] =1
highest_male_count = None
for name ,count in male_name_counts.items():
if highest_male_count is None or count > highest_male_count:
highest_male_count = count
for name, count in male_name_counts.items():
if count == highest_male_count:
top_male_names.append(name)
print (male_name_counts)
print(top_male_names)