#2.1.10 ★Guided Project: Analyzing Thanksgiving Dinner.md

1. Introducing Thanksgiving Dinner Data

Instructions

  • Import the pandas package.

  • 使用pandas.read_csv()函数来读取thanksgiving.csv
    文件。

  • 确保指定关键字参数encoding="Latin-1",如CSV文件通常不编码。

  • 分配结果的变量data。

  • 显示的前几行data,看看行和列的样子。

  • In a separate notebook cell, display all of the column names to get a sense of what the data consists of.

    • 您可以使用pandas.DataFrame.columns属性显示的列名。
import pandas as pd
data = pd.read_csv("thanksgiving.csv", encoding="Latin-1")
data.head()
data.columns()

3. Using value_counts To Explore Main Dishes

input
print(data['What is typically the main dish at your Thanksgiving dinner?'].value_counts())
output
Turkey 859Other (please specify) 35Ham/Pork 29Tofurkey 20Chicken 12Roast beef 11I don't know 5Turducken 3Name: What is typically the main dish at your Thanksgiving dinner?, dtype: int64

4. Figuring Out What Pies People Eat

input
apple_isnull = pd.isnull(data['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Apple'])
pumpkin_isnull = pd.isnull(data['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pumpkin'])
pecan_isnull = pd.isnull(data['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pecan'])
ate_pies = apple_isnull & pumpkin_isnull & pecan_isnull
print(ate_pies.value_counts())
output
False 876True 182dtype: int64
# 说明有182个选项就没有选择三者pie的任意一种

5. Converting Age To Numeric

input

print(data['Age'].value_counts())

output

45 - 59 28660+ 26430 - 44 25918 - 29 216Name: Age, dtype: int64

input

def str_to_int(age_str):

    if pd.isnull(age_str):    # Use the isnull() function to check if the value is null. If it is, return None.

        return None

    age_str = age_str.split(' ')[0]# Split the string on the space character (), and extract the first item of the resulting list.

    age_str = age_str.replace('+', '') # Replace the + character in the result with an empty string to remove it.

    return int(age_str) # Use int() to convert the result to an integer.

data['int_age'] = data['Age'].apply(str_to_int) # Use the pandas.Series.apply() method to apply the function to each value in the Age column of data.

data['int_age'].describe() # Call the pandas.Series.describe() method on the int_age column of data, and display the result.

output

count 1025.000000mean 39.383415std 15.398493min 18.00000025% 30.00000050% 45.00000075% 60.000000max 60.000000Name: int_age, dtype: float64

6. Converting Income To Numeric

input
print(data['How much total combined money did all members of your HOUSEHOLD earn last year?'].value_counts())
output
$25,000 to $49,999 180Prefer not to answer 136$50,000 to $74,999 135$75,000 to $99,999 133$100,000 to $124,999 111$200,000 and up 80$10,000 to $24,999 68$0 to $9,999 66$125,000 to $149,999 49$150,000 to $174,999 40$175,000 to $199,999 27Name: How much total combined money did all members of your HOUSEHOLD earn last year?, dtype: int64
input
def income_to_int(income_str):
    if pd.isnull(income_str):  # Use the isnull() function to check if the value is null. If it is, return None.
        return None
    income_str = income_str.split(' ')[0] # Split the string on the space character (), and extract the first item of the resulting list.
    if income_str == 'Prefer':
        return None
    income_str = income_str.replace('$', '')
    income_str = income_str.replace(',', '')
    return int(income_str)

data['int_income'] = data['How much total combined money did all members of your HOUSEHOLD earn last year?'].apply(income_to_int)
print(data['int_income'].describe())
output
count 889.000000mean 74077.615298std 59360.742902min 0.00000025% 25000.00000050% 50000.00000075% 100000.000000max 200000.000000Name: int_income, dtype: float64

7. Correlating Travel Distance And Income

input
print(data[data['int_income'] < 150000]['How far will you travel for Thanksgiving?'].value_counts())
print('--------------------------------------------------')
print(data[data['int_income'] > 150000]['How far will you travel for Thanksgiving?'].value_counts())
output
Thanksgiving is happening at my home--I won't travel at all 281Thanksgiving is local--it will take place in the town I live in 203Thanksgiving is out of town but not too far--it's a drive of a few hours or less 150Thanksgiving is out of town and far away--I have to drive several hours or fly 55Name: How far will you travel for Thanksgiving?, dtype: int64--------------------------------------------------Thanksgiving is happening at my home--I won't travel at all 49Thanksgiving is local--it will take place in the town I live in 25Thanksgiving is out of town but not too far--it's a drive of a few hours or less 16Thanksgiving is out of town and far away--I have to drive several hours or fly 12Name: How far will you travel for Thanksgiving?, dtype: int64

8. Linking Friendship And Age

input
data.pivot_table(
    index = "Have you ever tried to meet up with hometown friends on Thanksgiving night?",
    columns = 'Have you ever attended a "Friendsgiving?"',
    values = 'int_age'
)
output

[图片上传中。。。(1)]#####input

data.pivot_table(
    index = 'Have you ever tried to meet up with hometown friends on Thanksgiving night?',
    columns = 'Have you ever attended a "Friendsgiving?"',
    values = 'int_income'
)
output

[图片上传中。。。(2)]

你可能感兴趣的:(#2.1.10 ★Guided Project: Analyzing Thanksgiving Dinner.md)