简单就是最好的.
keypoint:
1.
Individuals are the objects described in a set of data. Individuals are
sometimes people. When the objects that we want to study are notpeople, we often call them cases.
A variable is any characteristic of an individual. A variable can take different
values for different individuals.
2.
CATEGORICAL AND QUANTITATIVE VARIABLES
A categorical variable places an individual into one of two or more
groups or categories.
A quantitative variable takes numerical values for which arithmetic operations
such as adding and averaging make sense.
The distribution of a variable tells us what values it takes and how often
it takes these values.
3. rate
一个有趣的例子:
Accidents for passenger cars and motorcycles. The government’s
Fatal Accident Reporting System says that 27,102 passenger cars were involved
in fatal accidents in 2002. Only 3339 motorcycles had fatal accidents
that year.2 Does this mean that motorcycles are safer than cars? Not at all—
there are many more cars than motorcycles, so we expect cars to have a
higher count of fatal accidents.
A better measure of the dangers of driving is a rate, the number of fatal
accidents divided by the number of vehicles on the road. In 2002, passenger
cars had about 21 fatal accidents for each 100,000 vehicles registered. There
were about 67 fatal accidents for each 100,000 motorcycles registered. The
rate for motorcycles is more than three times the rate for cars. Motorcycles
are, as we might guess, much more dangerous than cars.
4. 图形
1. bar graph
2.pie graph
Pie charts require that you include all the categories that make up a whole.
3.stemplot
Stemplots do not work well for large data sets, where each stem must hold a large number of leaves.
--------------------------------------------------------------------------------------------------------------------------------------------------------
1. mean x; To find the mean x of a set of observations, add their values and divide
by the number of observations. If the n observations are x1, x2, . . . , xn,
their mean is
x = (x1 + x2 +· · ·+xn)/n
特征:the mean is sensitive to the influence of a few extreme observations
2.resistant measure:
Its value does not respond strongly to changes in a few observations, no matter
how large those changes may be.
71
3.mean and median
mean is average, median is typical
outliers 极端值
4. Minimum Q1 M Q3 Maximum boxplot.
5.THE 1.5 × IQR RULE FOR OUTLIERS
Call an observation a suspected outlier if it falls more than 1.5 × IQR
above the third quartile or below the first quartile.
6.s
s, like the mean x, is not resistant. A few outliers can make s very large.
The use of squared deviations renders s even more sensitive than x to a few extreme observations
7. 如何选择
CHOOSING A SUMMARY
The five-number summary is usually better than the mean and standard
deviation for describing a skewed distribution or a distribution with
strong outliers. Use x and s only for reasonably symmetric distributions
that are free of outliers.
8.Linear transformations do not change the shape of a distribution.
9.分布的线性转换
------------------------------------------------------------------------------
1.Density curves 密度曲线
One way to think of a density curve is as a smooth approximation to the irregular
bars of a histogram.
A density curve is a curve that
• is always on or above the horizontal axis and
• has area exactly 1 underneath it.
从密度曲线上得到的信息。
mode 众数: A mode of a distribution described by a density curve is a peak point of
the curve, the location where the curve is highest.
median 中间数: the median is the point with half the total area on each side.
mean 平均值: 质点
曾经纠结过的一段话:
A density curve is an idealized description of a distribution of data. For
example, the symmetric density curve in Figure 1.25 is exactly symmetric,
but the histogram of vocabulary scores is only approximately symmetric. We
therefore need to distinguish between the mean and standard deviation of the
density curve and the numbers x and s computed from the actual observations.
mean μ The usual notation for the mean of an idealized distribution is μ (the Greek
standard deviation σ letter mu). We write the standard deviation of a density curve as σ (the Greek
letter sigma).
2.Normal distributions.
The curve with the larger standard deviation is more spread out.
如何目测sigma: The points at which this change of curvature takes place are located at distance
σ on either side of the mean μ.(拐点)
THE 68–95–99.7 RULE
In the Normal distribution with mean μ and standard deviation σ:
• Approximately 68% of the observations fall within σ of the mean μ.
• Approximately 95% of the observations fall within 2σ of μ.
• Approximately 99.7% of the observations fall within 3σ of μ.
In fact, all Normal distributions are the same if we measure in units of size
σ about the mean μ as center.
Observations larger than the mean are positive when standardized, and observations smaller than the mean
are negative.