avengers

Dataset

本文的数据集是复仇者avengers.csv,是一个著名的和广受喜爱的宇宙超级英雄团队,最初的漫画系列发自于在1960年代。本文主要探索这些复仇者的生与死(作家喜欢写死写活一个复仇者)

  • 属性:-URL,Name/Alias, Appearances ,Current? ,Gender ,Probationary
    ,Introl Full/Reserve Avengers Intro ,Year ,Years since joining, Honorary ,Death1 ,Return1, Death2, Return2, Death3 ,Return3, Death4 ,Return4, Death5 ,Return5, Notes]

  • 过滤掉一些坏数据,比如说year小于1960年,因为这个漫画出资1960年,通过画关于year的直方图可以观察到这个问题。

import pandas as pd

avengers = pd.read_csv("avengers.csv")
avengers['Year'].hist()
true_avengers = avengers[avengers["Year"] > 1959]

avengers_第1张图片

  • 计算每个复仇者死亡的次数:
def clean_deaths(row):
    num_deaths = 0
    columns = ['Death1', 'Death2', 'Death3', 'Death4', 'Death5']

    for c in columns:
        death = row[c]
        if pd.isnull(death) or death == 'NO':
            continue
        elif death == 'YES':
            num_deaths += 1
    return num_deaths

true_avengers['Deaths'] = true_avengers.apply(lambda row: clean_deaths(row), axis=1)
  • 探索加入的时间是否正确
joined_accuracy_count  = int()
correct_joined_years = true_avengers[true_avengers['Years since joining'] == (2015 - true_avengers['Year'])]
joined_accuracy_count = len(correct_joined_years)

你可能感兴趣的:(avengers)