basic data analysis

screening the dataset

两个目的:1遗失的数据 check for missing data

2 奇怪的 和 错误的数据 

什么算是奇怪的数据?

consistency check 前后回答不一致的

filler questions 是? 

极端的数据 怎么算极端? 

如何做?

1 analyze frequencies 频率,- check missing data and extreme data ?

2 scatter plot 分布图 - check consistency

*不会 spss- scatter plot , select cases

对’坏数据‘做什么?

啥都不做

收集更多数据

assign missing value-

for not key variables, 填充平均数 substitute neutral values, usually the mean

impute values (根据附近的数值填充)

删掉

决定主要是取决于how many good repondents there are


analyzing dataset

levels of measurement 

assigning number ,spss-values

spss中的scale是指 metric data,包括interval和ratio。

nominal 类别

ordinal 排序

interval 评分什么的 1—10 

ratio 有含义的数据

数据检验statistical tests 就取决于 度量的类型 the level of measurement of a variable

types of statistical analyses

1描述分析descriptive analysis。总结样本,频率分析

2推断 inferential analysis,由样本推总体,假设检验 和 confidence intervals(可能存在一个模型啥的) ,one-sample

3比较分析 differences analysis , 比较两组或多组数据mean。differences among means. 

4关联分析 associative analysis,考察一个关系的strength and direction. cross-tabulations and correlations.

5预测 predictive analysis: regressions.

descriptive analysis

summarize data 总结样本

HOW 如何总结,(总结啥)? (一般来说 这些数据有意义吗)

-descriptive analysis 那一套 

1. location: mode , median ,mean

2.variability: (interquartile)range, variance , standard deviation (为啥有了方差还要标准差),coefficient of variation: =standard deviation/mean 

3.shape : skewness, kurtosis 

*注意:描述分析的意义depending on the level of measurement 

adjusting data 

re-specifying variables 啥意思? 

transforming scales -standardizing z-scores

weighing cases/ respondent (不经常用)啥意思? to account for representativeness.

hypothesis testing

1.two-sided tests (等于or不等)

Ho: 变量的参数是等于某值 the parameter (mean, proportion )of the variable is equal 

H1:the parameter of the variable is different

2.one-sided tests (大于小于)

Ho: 大于等于 or 小于等于

H1:< or >

结果可以有两种,一种是test statistic 另一种是p-value.(test statistic 越大,p-value就越小,Ho的可能性就越小) 见图 

所以,test statistic >critical value 就拒绝

p-value <0.05 拒绝 

spss中,p-value 显示为“Sig.”

p≤0.05,Ho is rejected → the parameter is significantly different from xx.

0.05<p≤0.1,Ho is rejected but marginally → the parameter is marginally significantly different from xx.

p >0.1, Ho is not rejected → the parameter is not statistically different from xx.

test statistic 

test statistic > critical value, Ho is rejected 

diagram 'when to use which test?'

图~

怎么用这张表? -3 questions:

1. what is the dependent variable?

2.what is the measurement level of the dependent variable? 

3.what and how many samples does the hypothesis involve? 

-one sample: 比较给定组的参数 (和某一值~)

-independent samples:比较两个组的参数。eg. man/woman, branded/unbranded

-related samples: compare the responses of the same individual amongst each other. 其实是同一个样本 对不同问题的回答 酱紫?

inferential analysis: one-sample tests. representativeness

推断是否具有代表性,和给定的某一值比较

Ho:mean in the population where the sample came from =2.28

首先,DV=household size ,DV measurement= ratio  sample: one sample (必要步骤)

所以(查看表格),用one sample t-test 

eg2:检验 房屋分布的比例是否和统计数据一致

首先,DV=sample household proportion, DV measurement= ordinal, sample =one sample 

所以用one sample Kolmogorov- smirnov (by hand or excel )

total population 中的cumulative percentage 和样本observed cumulative% 计算absolute difference 

test statistic = 最大的那个difference → K=xx

critical value at 5%=1.36 除以 根号下样本个数 =aa

K 大于 aa →Ho is rejected 显著不同

检验二分法中的比例 the proportion of a dichotomous variable (yes/no)

用Z-test (by hand)

differential analysis:two and more independent or related samples

表格的运用,见onenote

associative analysis: correlations

变量间的关系

when there are 2 variables 

both are metric(interval /ratio ), linear relationship , use pearson correlation coefficient 

one or both are ordinal, use spearman rank correlation coefficient 

r 属于[-1,1]

significant vs. substantive results.

significant 取决于1 “不同”或“相关”的strength、magnitude? 以及 2样本大小 sample size

sig是第一步,relevance是一个主观判断

sig difference or correlation 不能推断出substantive or relevant 

magnitude of the difference =% change in the response of one group from that of the comparision group 

你可能感兴趣的:(basic data analysis)