R语言操作(UDA)-- Part5: 钻石与价格预测

5.1 创建一个散点图,查看carat和price之间是否存在线性关系

Let's start by examining two variables in the data set.
The scatterplot is a powerful tool to help you understand
the relationship between two continuous variables.

We can quickly see if the relationship is linear or not.
In this case, we can use a variety of diamond
characteristics to help us figure out whether
the price advertised for any given diamond is
reasonable or a rip-off.

Let's consider the price of a diamond and it's carat weight.
Create a scatterplot of price (y) vs carat weight (x).

Limit the x-axis and y-axis to omit the top 1% of values.


ggplot(aes(x = carat, y = price), data = diamonds) +
geom_point() +
xlim(0, quantile(diamonds$carat, 0.99)) +
ylim(0, quantile(diamonds$price, 0.99))


R语言操作(UDA)-- Part5: 钻石与价格预测_第1张图片

可见,carat和price之间呈现非线性关系,类似于指数函数;并且随着carat的尺寸增加,散点更加离散。

我们加上一条线性趋势线再来看:
ggplot(aes(x = carat, y = price), data = diamonds) +
geom_point() +
xlim(0, quantile(diamonds$carat, 0.99)) +
ylim(0, quantile(diamonds$price, 0.99)) +
geom_point(fill = I('#F79420'), color = I('black'), shape = 21) +
stat_smooth(method = 'lm')


R语言操作(UDA)-- Part5: 钻石与价格预测_第2张图片

你可能感兴趣的:(R语言操作(UDA)-- Part5: 钻石与价格预测)