DS Interview Question--Missing Values

Q: During analysis, how do you treat missing values?

A: 

First, we need to know the pattern of missing data:1. Missing completely at random (MCAR): there is no pattern in the missing data on any variables. (The most and the best situation); 2. Missing at random (pattern not affect primary dependent variables);3. Missing not at random (pattern affect primary dependent variables)

And then we can choose different methods to deal with missing values:

Deletion: If we have enough observations and the missing data is random, we can delete the observations with missing values and don't introduce bias.

Imputation: 1. Replace missing values with mean/ median/ mode or set default value; 2. Replace missing data by building models(eg. Regression/ KNN, etc.)

Others: Complex methods like Multiple Imputation (MI), Hot Deck, etc.

Ignorance: Some models, like random forest, can deal with missing values by itself.


Interview questions are from DataAppLab (Wechat: Datalaus)

Jun.27th, 2017  Seattle

你可能感兴趣的:(DS Interview Question--Missing Values)