【统计-ch2】

1. ASSOCIATION BETWEEN VARIABLES

Two variables measured on the same cases are associated if knowing the
value of one of the variables tells you something about the values of the

other variable that you would not know without this information.


2. scatterplot

A scatterplot shows the relationship between two quantitative variables
measured on the same individuals.


3.

Form: Linear relationships, where the points show a straight-line pattern, are
an important form of relationship between two variables. Curved relationships
and clusters are other forms to watch for.
Direction: If the relationship has a clear direction, we speak of either positive
association (high values of the two variables tend to occur together) or negative
association (high values of one variable tend to occur with low values of
the other variable).
Strength: The strength of a relationship is determined by how close the points
in the scatterplot lie to a simple form such as a line.


4.The standardized height says how many standard deviations above or below the mean
a person’s height lies.

The formula for correlation helps us see that r is positive when there is a positive
association
between the variables. Height and weight, for example, have
a positive association. People who are above average in height tend to also be
above average in weight. Both the standardized height and the standardized
weight for such a person are positive. People who are below average in height
tend also to have below-average weight. Then both standardized height and
standardized weight are negative.

5.

Correlation requires that both variables be quantitative, so that it makes sense to do the arithmetic indicated by the formula for r.

Positive r indicates positive association between the variables, and negative r indicates negative association.

Correlation measures the strength of only the linear relationship between two variables. Correlation does not describe curved relationships between vari-ables, no matter how strong they are.

Like the mean and standard deviation, the correlation is not resistant: r is strongly affected by a few outlying observations


6.regression line

A regression line is a straight line that describes how a response variable
y changes as an explanatory variable x changes. We often use a regression
line to predict the value of y for a given value of x. Regression,
unlike correlation, requires that we have an explanatory variable and a
response variable.


7.LEAST-SQUARES REGRESSION LINE

The least-squares regression line of y on x is the line that makes the
sum of the squares of the vertical distances of the data points from the
line as small as possible.


8.

y = b0 + b1x

b1 = r sy / sx

b0 = y − b1x

The slope and intercept of the least- squares line depend on the units of measurement—you can’t conclude anything
from their size.

We can describe regression entirely
in terms of the basic descriptive measures x, sx, y, sy, and r. If both x and y are
standardized variables, so that their means are 0 and their standard deviations
are 1, then the regression line has slope r and passes through the origin.


The square of the correlation, r2, is the fraction of the variation in the
values of y that is explained by the least-squares regression of y on x.  多少人是符合这个关系的。

the residuals from the least-squares line have a special property: the mean of the
least-squares residuals is always zero.


An outlier is an observation that lies outside the overall pattern of the
other observations. Points that are outliers in the y direction of a scatterplot
have large regression residuals, but other outliers need not have
large residuals.


We did not need the distinction between outliers and influential observations
in Chapter 1. A single large salary that pulls up the mean salary x for a
group of workers is an outlier because it lies far above the other salaries. It is
also influential because the mean changes when it is removed. In the regression
setting, however, not all outliers are influential. Because influential observations
draw the regression line toward themselves, we may not be able to spot
them by looking for large residuals.???


Correlation measures only linear association, and fitting a straight line makes
sense only when the overall pattern of the relationship is linear


Correlation and least-squares regression are not resistant


LURKING VARIABLE
A lurking variable is a variable that is not among the explanatory or
response variables in a study and yet may influence the interpretation of
relationships among those variables.


2.

When both variables are
two-way table categorical, the raw data are summarized in a two-way table that gives counts
of observations for each combination of values of the two categorical variables.

你可能感兴趣的:(【统计-ch2】)