计算误差

  1. 每个数据集的实际列和预测列,使用 jud_data 找出数据集的误差比例,以及每种类型的误差百分比。

测试题目提示: 误差是指,任何时候,预测值不符合实际值的情况。此外,还有类型I和类型II的误差需要考虑。我们也知道我们可以通过最大化其他类型的误差来最小化某一种类型的误差。如果我们预测所有的个体都是无辜的,那么有多少有罪的人被贴上了错误的标签?同样,如果我们预测所有人都有罪,那么有多少无辜的人被贴上了错误的标签?

  1. 使用 par_data 找出数据集的误差比例,以及每种类型的误差百分比。
import numpy as np
import pandas as pd

jud_data = pd.read_csv('judicial_dataset_predictions.csv')
par_data = pd.read_csv('parachute_dataset.csv')
jud_data.head()
par_data.head()
  1. Above, you can see the actual and predicted columns for each of the datasets. Using the jud_data, find the proportion of errors for the dataset, and furthermore, the percentage of errors of each type. Use the results to answer the questions in quiz 1 below.
jud_data[jud_data['actual'] != jud_data['predicted']].shape[0]/jud_data.shape[0] # Number of errros
jud_data.query("actual == 'innocent' and predicted == 'guilty'").count()[0]/jud_data.shape[0] # Type 1
jud_data.query("actual == 'guilty' and predicted == 'innocent'").count()[0]/jud_data.shape[0] # Type 2
#If everyone was predicted as guilty, then every actual innocent 
#person would be a type I error.
# Type I = pred guilty, but actual = innocent
jud_data[jud_data['actual'] == 'innocent'].shape[0]/jud_data.shape[0]
  1. Above, you can see the actual and predicted columns for each of the datasets. Using the par_data, find the proportion of errors for the dataset, and furthermore, the percentage of errors of each type. Use the results to answer the questions in quiz 2 below.
par_data[par_data['actual'] != par_data['predicted']].shape[0]/par_data.shape[0] # Number of errros
par_data.query("actual == 'fails' and predicted == 'opens'").count()[0]/par_data.shape[0] # Type 1
par_data.query("actual == 'opens' and predicted == 'fails'").count()[0]/par_data.shape[0] # Type 2
#If every parachute was predicted to not open, 
#the proportion of Type II Errors made.  

# This would just be the total of actual opens in the dataset, 
# as we would label these all as not open, but actually they open

# Type II = pred no open, but actual = open
par_data[par_data['actual'] == 'opens'].shape[0]/par_data.shape[0]

你可能感兴趣的:(计算误差)