r语言和rstudio
With increased computing power comes increased access to large amounts of freely accessible data. People are tracking their lives with productivity, calorie, fitness and sleep trackers. Governments are publishing survey data left and right, and companies conduct audience testing that needs analyzing. There’s a lot of data out there even now, ready to be grabbed and looked at.
随着计算能力的提高,对大量免费访问的数据的访问也越来越多。 人们通过生产力,卡路里,健身和睡眠追踪器来追踪生活。 政府正在左右发布调查数据,公司进行需要分析的受众测试。 即使到现在, 仍然有很多数据可供读取和查看。
In this tutorial, we’ll look at the basics of the R programming language – a language built solely for statistical computing. I won’t bore you with Wikipedia definitions – instead, let’s dive right into it. In this introduction, we’ll cover the installation of the default IDE and language, and its data types.
在本教程中,我们将研究R编程语言的基础-R编程语言是专门为统计计算而构建的。 我不会对Wikipedia的定义感到厌烦–相反,让我们直接研究它。 在本简介中,我们将介绍默认IDE和语言及其数据类型的安装。
R is both a programming language and a software environment, which means it’s fully self-contained. There are two steps to getting it installed:
R既是一种编程语言,又是一种软件环境,这意味着它是完全独立的。 有两个安装步骤:
download and install the latest R: http://www.r-project.org/
下载并安装最新的R: http : //www.r-project.org/
download and install RStudio, the R IDE: http://www.rstudio.com/
下载并安装R Studio R Studio: http ://www.rstudio.com/
Both are free, both open source. R will be installed as the underlying engine that powers RStudio’s computations, while RStudio will provide sample data, command autocompletion, help files, and an effective interface for getting things done quickly. You could write R code in simple text files as in most other languages, but that’s really not recommended given how many commands there are and how complex things can quickly get.
两者都是免费的,都是开源的。 R将被安装为支持RStudio计算的基础引擎,而RStudio将提供示例数据,命令自动完成,帮助文件以及有效地完成任务的有效界面。 您可以像使用其他大多数语言一样,在简单的文本文件中编写R代码,但实际上不建议这样做,因为有多少个命令以及可以快速获得复杂的东西。
After you’ve installed the tools, launch R Studio.
安装工具后,启动R Studio。
Let’s briefly explain the GUI. There are four main parts. I’ll explain the default order, though note that this can be changed in Settings/Preferences -> Pane Layout.
让我们简要解释一下GUI。 有四个主要部分。 我将说明默认顺序,不过请注意,可以在“设置/首选项”->“窗格布局”中更改此默认顺序。
The top left quadrant is the editor. It’s where you write R code you want to keep for later – functions, classes, packages, etc. This is, for all intents and purposes, identical to every other code editor’s main window. Apart from some self-explanatory buttons, and others that needn’t concern you at this starting point, there is also a “Source on Save” checkbox. This means “Load contents of file into my console’s runtime every time I save the file”. You should have this on at all times, it makes your development flow faster by one click.
左上象限是编辑器。 在这里您可以编写要保留的R代码-函数,类,程序包等。就所有目的和目的而言,这与其他代码编辑器的主窗口相同。 除了一些不言自明的按钮,以及在此开始时不需要您关注的其他按钮之外,还有一个“保存时来源”复选框。 这意味着“每次保存文件时,都将文件内容加载到控制台的运行时中”。 您应该始终启用它,一键即可加快您的开发流程。
The lower left quadrant is the console. It’s a REPL for R in which you can test out your ideas, datasets, filters, and functions. This is where you’ll be spending most of your time in the beginning – it’s here that you verify an idea you had works before copying it over into the editor above. This is also the environment into which your R files will be sourced on Save (see above), so whenever you develop a new function in an R file above, it automatically becomes available in this REPL. We’ll be spending a lot of time in the REPL in the remainder of this article.
左下象限是控制台。 这是R的REPL ,您可以在其中测试您的想法,数据集,过滤器和函数。 这是您一开始将花费大部分时间的地方–在这里,您可以验证自己的想法,然后再将其复制到上面的编辑器中。 这也是R文件将在Save上获取源的环境(请参见上文),因此,每当您在上面的R文件中开发新功能时,此功能就会自动在此REPL中可用。 在本文的其余部分中,我们将在REPL上花费大量时间。
The top right quadrant has two tabs: environment and history.
右上象限有两个选项卡:环境和历史记录。
Environment refers to the console environment (see above) and will list, in detail, every single symbol you defined in the console (whether via sourcing or directly). That is, if you have a function available in the REPL, it will be listed in the environment. If you have a variable, or a dataset, it will be listed there. This is where you can also import custom datasets manually and make them instantly available in the console, if you don’t feel like typing out the commands to do so. You can also inspect the environment of other packages you installed and loaded (more on packages at a later time). Play around with it – you can’t break anything.
环境是指控制台环境(请参见上文),将详细列出您在控制台中定义的每个符号(无论是通过采购还是直接)。 也就是说,如果您在REPL中具有可用的功能,它将在环境中列出。 如果您有变量或数据集,它将在此处列出。 如果您不想键入命令,也可以在此处手动导入自定义数据集,并使它们在控制台中立即可用。 您还可以检查已安装和加载的其他软件包的环境(稍后会介绍更多有关软件包的信息)。 试一试–您不会破坏任何东西。
History lists every single console command you executed since the last project started. It is saved into a hidden .Rhistory
file in your project’s folder. If you don’t choose to save your environment after a session, the history won’t be saved.
历史记录列出了自上一个项目启动以来您执行的每个控制台命令。 它被保存到项目文件夹中的隐藏.Rhistory
文件中。 如果您没有选择在会话后保存环境,则不会保存历史记录。
The bottom right panel is the misc panel, and contains five separate tabs. The first one, Files, is self-explanatory. The Plots tab will contain the graphs you generated with R. It is there you can zoom, export, configure and inspect your charts and plots. The Packages tab lets you install additional packages into R. A brief description is next to each available package, though there are many more than those listed there. We’ll go through package repositories in a later post. The Help tab lets you search the incredibly extensive help directory and will automatically open whenever you call help on a command in the console (help is called by prepending a command name with a question mark, like so: ?data.frame
). Finally, the Viewer is essentially RStudio’s built-in browser. Yes, you can develop web apps with R and even launch locally hosted web apps within it.
右下方的面板是杂项面板,包含五个单独的选项卡。 第一个文件Files是不言自明的。 “图形”选项卡将包含您使用R生成的图形。您可以在其中缩放,导出,配置和检查图表。 “ 软件包”选项卡使您可以将其他软件包安装到R中。每个可用软件包旁边都有简要说明,尽管这里列出的内容不止这些。 我们将在以后的文章中浏览软件包存储库。 通过“ 帮助”选项卡,您可以搜索范围广泛的帮助目录,并且只要您在控制台中的命令上调用帮助,便会自动打开(通过在命令名前加上问号来调用帮助,例如: ?data.frame
)。 最后, 查看器本质上是RStudio的内置浏览器 。 是的,您可以使用R开发Web应用程序,甚至可以在其中启动本地托管的Web应用程序。
In the text below, whenever I mention using a command, assume that this means punching it into the console. So, if I say “We look at the help for data frames with ?data.frame
“, you do this:
在下面的文本中,每当我提到使用命令时,都假定这意味着将其打入控制台。 因此,如果我说“我们使用?data.frame
查看数据帧的帮助”,则可以这样做:
RStudio comes with some datasets for new users to play around with. To use a built-in dataset, we load it with the data
function, and supply an argument corresponding to the set we want. To see all the available built-in sets, punch in data()
, without an argument.
RStudio附带了一些数据集,供新用户使用。 要使用内置数据集,请使用data
函数加载该data
集,并提供与所需集合相对应的参数。 要查看所有可用的内置集,请打入data()
,不带参数。
Looking at the list of available datasets, let’s load a very small one for starters:
查看可用数据集的列表,让我们为入门者加载一个很小的数据集:
data("women")
You should see the women
variable appear in the Environment panel, though its second field says
. A promise in this case merely means “The data will be there when you actually need it”. We told R to load this set, but we haven’t actually used it anywhere, so it didn’t feel the need to load it fully into memory. Let’s tell R we need it. In the console, print out the entire set by simply calling:
您应该看到women
变量出现在Environment面板中,尽管它的第二个字段是
。 在这种情况下,promise只是意味着“数据将在您真正需要时就在那儿”。 我们告诉R加载此集合,但实际上并没有在任何地方使用它,因此它没有必要将其完全加载到内存中。 告诉R我们需要它。 在控制台中,只需调用以下命令,即可打印出整个设置:
women
This is equivalent to:
这等效于:
print(women)
Note: We’ll be using the former approach, simply because it’s less typing. Remember – in R, the last value that is typed out without being an expression (like assigning or summing something) is what gets auto-printed to the console.
注意:我们将使用前一种方法,只是因为它的键入较少。 切记–在R中,最后一个不是表达式(例如,分配或求和)的值就是自动打印到控制台的值。
The numbers will be produced in the console, and the Environment entry for women
should change. You should be able to see the data in the environment panel now, too, by clicking the blue expand arrow next to the variable name.
这些数字将在控制台中显示,并且women
的“环境”条目应更改。 单击变量名称旁边的蓝色展开箭头,您现在也应该能够在环境面板中查看数据。
This set only has 15 entries and as such offers nothing of value, but it’s good enough for playing around in.
这组只有15个条目,因此没有任何价值,但是足够玩。
To further study the set you’re dealing with, there are several functions to keep in mind (demonstration of each can be seen below explanations):
为了进一步研究您正在处理的集合,需要牢记几个功能(每个示例的说明都可以在下面的说明中看到):
nrow
/ ncol
will list the number of rows / columns respectively
nrow
/ ncol
将分别列出行/列数
summary
will output a summary about the set’s columns. In the case of the women
set, we have two numeric columns (both columns are numeric, or in other words, each column is a numeric vector – more on data types and vectors later) and R knows that when you ask it for an analysis of a numeric vector, it should give you the typical values for such collections: the minimum value in the set, the mean (average) between the minimum and the mean, the mean (average of all values), the mean between the mean and the maximum, and the maximum, the largest number in the column. It does this for both height and width. For different types of vectors (like ones where every element is a word instead of a number) the output is different.
summary
将输出有关集合列的摘要。 对于women
集合,我们有两个数字列(两列都是数字列,换句话说,每列都是数字向量-有关数据类型和向量的更多信息),R知道当您要求它进行分析时数值向量,它应该为您提供此类集合的典型值:集合中的最小值,最小值和平均值之间的平均值(平均值),平均值(所有值的平均值),平均值和平均值之间的平均值列中的最大值和最大值。 它对高度和宽度都执行此操作。 对于不同类型的向量(例如,每个元素都是一个单词而不是一个数字的向量),输出是不同的。
str
str is a different kind of summary. In fact, str
stands for “structure” and it outputs a summary of a data set’s structure. In our case, it will tell us that it is a “data.frame” (a special data type we’ll explain later) with 15 obs (observations or rows) and 2 variables (or columns). It then proceeds to list all the columns in the data frame with some (but not all) of their values, just so we get a grasp on the kind of values we’re dealing with.
str
str是另一种摘要。 实际上, str
代表“结构”,它输出数据集结构的摘要。 在我们的例子中,它将告诉我们这是一个“ data.frame”(一种特殊的数据类型,我们将在后面解释),它有15个obs(观察值或行)和2个变量(或列)。 然后,它继续列出数据框中的所有列及其一些(但不是全部)值,只是为了让我们了解要处理的值的种类。
dim
gives you the dimensions of a data set. Calling dim(women)
gives us 15 2
which means 15 rows and 2 columns. length
can be used to count the number of vertical elements in a set – in vectors (see below) this is the number of elements, in data sets like women
, this is the number of columns.
dim
为您提供数据集的尺寸。 调用dim(women)
给我们15 2
,这意味着15行2列。 length
可用于计算集合中垂直元素的数量–在向量中(见下文),这是元素的数量,在诸如women
类的数据集中,这是列数。
> nrow(women)
[1] 15
> ncol(women)
[1] 2
> summary(women)
height weight
Min. :58.0 Min. :115.0
1st Qu.:61.5 1st Qu.:124.5
Median :65.0 Median :135.0
Mean :65.0 Mean :136.7
3rd Qu.:68.5 3rd Qu.:148.0
Max. :72.0 Max. :164.0
> str(women)
'data.frame': 15 obs. of 2 variables:
$ height: num 58 59 60 61 62 63 64 65 66 67 ...
$ weight: num 115 117 120 123 126 129 132 135 139 142 ...
> dim(women)
[1] 15 2
You’ll be using these functions a lot, so I recommend you get familiar with them. Load some of the other data sets and inspect them like this. No need to know them by heart – this post and the help files will always be around for reference, but it’s nice to be fluent in them anyway.
您将使用这些功能很多 ,所以我建议你熟悉他们。 加载其他一些数据集并像这样检查它们。 无需内心地了解它们-这篇文章和帮助文件将始终作为参考,但是无论如何还是能流利地使用它们是很高兴的。
R has some typical atomic data types you already know about from other languages, but also provides some more statistics-inclined ones. Let’s briefly go through them. While explaining these types, I’ll talk about assigning them. Assigning in R is done with the “left arrow” operator or <-
, as in:
R具有您已经从其他语言中了解到的一些典型的原子数据类型,但是还提供了一些倾向于统计的数据类型。 让我们简要地介绍一下它们。 在解释这些类型时,我将讨论分配它们。 R中的分配是通过“左箭头”运算符或<-
,如下所示:
myString <- "Hello World"
R is, however, very forgiving and will let you use the =
assignment operator in top level environments like the console, if you don’t feel like typing out the arrow every time.
但是,R非常宽容,如果您不想每次都键入箭头,可以在控制台等顶级环境中使用=
赋值运算符 。
myString = "Hello World"
I suggest you get used to the arrow, though, you won’t get very far without it.
我建议您习惯箭头,但是如果没有它,您不会走得太远。
To check the type (or class) of a variable, the class
function can be used (though str
from above does almost the same thing): class(myString)
.
要检查变量的类型(或类),可以使用class
函数(尽管上面的str
几乎做同样的事情): class(myString)
。
Atomic classes are basic types from which others are constructed.
原子类是构造其他类的基本类型。
The character class is your typical string, a set of one or more letters.
字符类是您的典型字符串,是一个或多个字母的集合。
> myString <- "Hello World"
> class(myString)
[1] "character"
The [1]
will be explained below, in the Vectors section.
[1]
将在下面的“向量”部分中进行说明。
Corresponds to “float” in other languages – indicates numeric values like 10, 15.6, -48792.5498982749879 and so on.
对应于其他语言中的“浮点数”,表示数字值,例如10、15.6,-48792.5498982749879等。
> myNum <- 5.983904798274987298
> class(myNum)
[1] "numeric"
You can coerce (change type of) numeric string values into numeric types, like so:
您可以将数字字符串值强制(更改类型)为数字类型,如下所示:
> myString <- "5.60"
> class(myString)
[1] "character"
> myNumber <- as.numeric(myString)
> myNumber
[1] 5.6
> class(myNumber)
[1] "numeric"
There is also a special number Inf
which represents infinity. It can be used in calculations:
还有一个特殊的数字Inf
表示无穷大。 可用于计算:
> 1/0
[1] Inf
Another “number” is NaN
which stands for “Not a Number”. This is what you get when you do something like 0/0
.
另一个“数字”是NaN
,代表“非数字”。 这是您执行0/0
类的操作时得到的。
Integers are whole numbers, though they get autocoerced (changed) into numerics when saved into variables:
整数是整数,但是当保存到变量中时它们会被自动转换为数字:
> myInt <- 209173987
> class(myInt)
[1] "numeric"
To actually force them to be integers, we need to invoke a function that manually coerces them, called as.integer
:
要实际将它们强制为整数,我们需要调用一个手动强制它们的函数,称为as.integer
:
> myInt <- as.integer(myInt)
> class(myInt)
[1] "integer"
You can prevent autocoercion by setting integers with an L suffix:
您可以通过设置带有L后缀的整数来防止自动强制:
> myInt = 5L
> class(myInt)
[1] "integer"
Note that if you give R a number that is greater than what its memory can store, it autocoerces it into a real number, even if you put L at the end:
请注意,如果您给R提供的数字大于其内存可以存储的数字,即使您将L放在末尾,它也会自动将其强制转换为实数:
> myInt <- 2479827498237498723498729384
> class(myInt)
[1] "numeric"
> myInt
[1] 2.479827e+27
but if you then try to coerce that number into an integer, R will discard it because it simply cannot make integers that big. Instead of a number, you get “NA”, which is a special type in R indicating “Not Available”, also known as a missing value.
但是如果您随后尝试将该数字强制转换为整数,则R会舍弃该数字,因为它根本无法使该整数变大。 取而代之的是数字“ NA”,它是R中的一种特殊类型,表示“不可用”,也称为缺失值。
> myIntCoerced <- as.integer(myInt)
Warning message:
NAs introduced by coercion
> myIntCoerced
[1] NA
> class(myIntCoerced)
[1] "integer"
The NA is still of a type “integer”, but one without value.
NA仍然是“整数”类型,但没有价值。
Note that when coercing numerics into integers, decimal places get lost. The same applies to coercing from numeric decimal strings:
请注意,将数字强制为整数时,小数位会丢失。 从数字十进制字符串强制转换也是如此:
> myString <- "5.60"
> myNumeric <- 5.6
> myInteger1 <- as.integer(myString)
> myInteger2 <- as.integer(myNumeric)
> myInteger1 == myInteger2
[1] TRUE
> myInteger1
[1] 5
Explaining complex numbers is a bit outside the scope of this tutorial, particularly if you weren’t exposed to them in school, but if you’re curious, you can find out more here. They take the form of a + bi
where a
and b
are real numbers and i
is imaginary. In R, they’re constructed with a special complex
function.
解释复数有点超出本教程的范围,特别是如果您在学校里没有接触过复数的时候,但是如果您好奇的话,可以在这里找到更多信息 。 它们采用a + bi
的形式,其中a
和b
是实数,而i
是虚数。 在R中,它们是使用特殊的complex
函数构造的。
> myComplex <- complex(1, 3292, 8974892)
> myComplex
[1] 3292+8974892i
> class(myComplex)
[1] "complex"
You won’t be needing those nearly as often as you might need the other types, but if you want to know more about the complex
function just call for help on it: ?complex
.
您几乎不需要其他类型的对象,而使用它们的频率却几乎没有,但是如果您想了解更多有关complex
函数的信息,请致电寻求帮助: ?complex
。
Logical types (booleans) are the same as in most other languages and can be two things – either true, or false. True can be represented with TRUE
or T
while false is, predictably, FALSE
or F
.
逻辑类型(布尔值)与大多数其他语言相同,可以是两件事– true或false。 True可以用TRUE
或T
表示,而false可以预测为FALSE
或F
> TRUE == T
[1] TRUE
> myBool <- TRUE
> myBool == T
[1] TRUE
> myComparison <- 5 > 6
> myComparison == FALSE
[1] TRUE
> class(myComparison)
[1] "logical"
Whenever you create an expression the result of which is a “yes” or “no” value, you get a TRUE or FALSE – like in the case of 5 > 6
above. 5 is not greater than 6, so the expression becomes FALSE
. Comparing myComparison
to FALSE
thus yields TRUE
because the myComparison
variable indeed contains a value of FALSE
.
每当创建结果为“是”或“否”的表达式时,都会得到TRUE或FALSE –就像上面的5 > 6
。 5 不大于6,因此表达式变为FALSE
。 因此,将myComparison
与FALSE
进行比较myComparison
得出TRUE
因为myComparison
变量确实包含FALSE
的值。
When needed by a function, logical values will be coerced into numerics. This means that if I write 1 + TRUE
the console will produce 2
, where as 1 + FALSE
gives 1
. Likewise, we can easily coerce other types into logicals (as.logical(myVariable)
). Any numeric or integer with a value not equal to 0 or NA will give TRUE
. 0 and 0L will give FALSE
. Strings like “True”, “TRUE”, “true” and “T” will be turned into TRUE, “False”, “FALSE”, “false” and “F” will be FALSE
. Any other string will coerce into a logical NA value.
当功能需要时,逻辑值将被强制转换为数字。 这意味着如果我写1 + TRUE
则控制台将产生2
,而1 + FALSE
则为1
。 同样,我们可以轻松地将其他类型强制转换为逻辑( as.logical(myVariable)
)。 数值不等于0或NA的任何数字或整数将给出TRUE
。 0和0L将给出FALSE
。 诸如“ True”,“ TRUE”,“ true”和“ T”之类的字符串将变为TRUE,“ False”,“ FALSE”,“ false”和“ F”将为FALSE
。 任何其他字符串将强制转换为逻辑NA值。
Higher types are types composed of the lower ones.
高类型是由低类型组成的类型。
The most essential of all, the vector, is a collection of elements of the same type. In our earlier dataset example, one column of the women
dataset was a numeric vector, meaning it was a collection with only numeric values in it. A vector can only have elements of the exact same type. Vectors can be created with the vector
function, but are usually created with the shorthand c
(concatenate) function:
向量中最重要的是向量,是同一类型的元素的集合。 在我们之前的数据集示例中, women
数据集的一列是数字向量 ,这意味着它是其中仅包含数字值的集合。 向量只能具有完全相同类型的元素。 可以使用vector
函数创建vector
,但是通常使用简写c
(连接)函数创建vector
:
> myVector <- c("Hello", "World", "Third Element")
> class(myVector)
[1] "character"
> myVector
[1] "Hello" "World" "Third Element"
We can see here that the vector’s class is “character”, meaning that it contains only character type values. If we print it out by just calling its variable name, we get all three elements and a [1]
. The [1]
means literally: “I am outputting the contents of your vector. The first element on this line is element number 1 in the vector”. What you see in []
depends entirely on the size of your console panel and the length of the array. For example:
在这里我们可以看到向量的类是“字符”,这意味着它仅包含字符类型值。 如果仅通过调用其变量名将其打印出来,我们将获得所有三个元素和一个[1]
。 [1]
字面意思是:“我正在输出向量的内容。 这行的第一个元素是向量中的元素编号1”。 您在[]
看到的内容完全取决于控制台面板的大小和数组的长度。 例如:
> myVector <- c("One", "Two", "Three", "Four", "Five", "Six", "Seven", "Eight", "Nine", "Ten", "Eleven", "Twelve", "Thirteen", "Fourteen", "Fifteen")
> myVector
[1] "One" "Two" "Three" "Four" "Five" "Six" "Seven"
[8] "Eight" "Nine" "Ten" "Eleven" "Twelve" "Thirteen" "Fourteen"
[15] "Fifteen"
The number in square brackets simply means “The element after me is Nth in the set”. It is there purely to make the output more readable, and does not affect actual data.
方括号中的数字仅表示“我之后的元素在集合中的第N个”。 纯粹是为了使输出更具可读性,并且不影响实际数据。
Note that vectors are strictly one-dimensional. You cannot add another vector as an element inside an existing vector – their elements get merged into one:
请注意,向量严格是一维的。 您不能在现有矢量中添加另一个矢量作为元素–它们的元素合并为一个:
> v1 <- c("a", "b", "c")
> v2 <- c("d", "e", "f")
> v3 <- c(v1, v2)
> v3
[1] "a" "b" "c" "d" "e" "f"
You can generate entire numeric vectors by specifying a range:
您可以通过指定范围来生成整个数值向量:
> myRange <- c(1:10)
> myRange
[1] 1 2 3 4 5 6 7 8 9 10
Lists are just like vectors, only they don’t have the limitation of being able to hold elements of the same type exclusively. They are built with the list
function or with the c
function if one of the elements you’re adding is a list:
列表就像向量一样,只是它们没有能够排他地容纳相同类型元素的限制。 如果要添加的元素之一是list
,则它们是使用list
函数或c
函数构建的:
> myList <- list(5, "Hello", "Worlds", TRUE)
> class(myList)
[1] "list"
> myList
[[1]]
[1] 5
[[2]]
[1] "Hello"
[[3]]
[1] "Worlds"
[[4]]
[1] TRUE
Like with vectors, the one-dimensionality rule applies. Adding a list into another will merge their elements.
与向量一样,一维规则也适用。 将列表添加到另一个列表将合并其元素。
The [[N]]
output means “The first element of this list is a vector with a single element 5, the second element of this list is a vector with a single element Hello… etc.” Appending the [[N]]
to the variable that holds the list actually returns the element. We can check this easily:
[[N]]
输出表示“此列表的第一个元素是带有单个元素5的向量,此列表的第二个元素是带有单个元素Hello ...的向量”。 将[[N]]
附加到保存列表的变量实际上返回该元素。 我们可以很容易地检查一下:
> class(myList[[1]])
[1] "numeric"
> class(myList[[2]])
[1] "character"
> class(myList[[3]])
[1] "character"
> class(myList[[4]])
[1] "logical"
Dataframes are essentially tables with rows and columns, much like spreadsheets. The women
dataset we loaded above was a dataframe. You can access individual columns of dataframes by using the $
operator on the variable, followed by the column name:
数据框本质上是具有行和列的表,非常类似于电子表格。 我们上面加载的women
数据集是一个数据框。 您可以通过在变量上使用$
运算符来访问数据帧的各个列,后跟列名:
> women$height
[1] 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
The result gets returned as a numeric vector. Check its class with class(women$height)
.
结果将作为数字矢量返回。 使用class(women$height)
检查其类。
If the column name contains spaces, you can wrap it in quotation marks (women$"Female Height"
), but you can also access a column by its numeric position in the list of columns. For example, we know height is the first column:
如果列名包含空格,则可以将其用引号引起来( women$"Female Height"
),但是也可以通过列在列列表中的数字位置来访问该列。 例如,我们知道height是第一列:
> head(women[1])
height
1 58
2 59
3 60
4 61
5 62
6 63
> head(women[[1]])
[1] 58 59 60 61 62 63
The head
function tells R to only return the first 6 results – this is so we keep our console nice and scroll-free, excellent for brief looks into datasets without printing them out in their entirety. (The opposite can be achieved with tail
which prints out the last 6 results).
head
函数告诉R仅返回前6个结果–这样我们就可以保持控制台良好且无滚动条,非常适合简短查看数据集而无需完整打印它们。 (相反的情况是可以用tail
打印出最后6个结果)。
We can see here that when accessing the column with single-square-bracket [1]
we get a dataframe but with one less column. If, however, we access it with the double-square-bracket [[1]]
, we get a numeric vector of heights. The data returned by using the single bracket stays the same type as the parent data, while the double bracket accessor targets the specific values in that column and returns them in their most rudimentary form, a numeric vector.
我们在这里可以看到,使用单方括号[1]
访问列时,我们得到一个数据帧,但列少了一个。 但是,如果我们使用双方括号[[1]]
访问它,则会得到一个高度的数值向量。 使用单括号返回的数据与父数据保持相同的类型,而双括号访问器则以该列中的特定值为目标,并以最基本的形式(数字矢量)返回它们。
We create dataframes with the data.frame
function:
我们使用data.frame
函数创建数据data.frame
:
> men <- data.frame(height = c(50:65), weight = c(150:165))
> head(men)
height weight
1 50 150
2 51 151
3 52 152
4 53 153
5 54 154
6 55 155
Here we created a sample men
dataset not unlike the women
set from before.
在这里,我们创建了一个样本men
数据集,该数据集与之前设置的women
没有什么不同。
If we want to get just the names of the columns, we use the names
function. We can even assign a value to it and thus change the column names:
如果只想获取列的名称,则可以使用names
函数。 我们甚至可以为其分配一个值,从而更改列名:
> names(men)
[1] "height" "weight"
> names(men) <- c("Male Height", "Male Weight")
> head(men)
Male Height Male Weight
1 50 150
2 51 151
3 52 152
4 53 153
5 54 154
6 55 155
Matrices are multi-dimensional vectors. They are like dataframes, but can only contain values of the same type. They are created with the matrix
function and need the number of rows and columns as parameters, and the values to place into these slots:
矩阵是多维向量。 它们就像数据框一样,但是只能包含相同类型的值。 它们是使用matrix
函数创建的,需要将行数和列数作为参数,并将值放入这些插槽中:
> m <- matrix(nrow = 4, ncol = 5, 1:20)
> m
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 9 13 17
[2,] 2 6 10 14 18
[3,] 3 7 11 15 19
[4,] 4 8 12 16 20
What happens if the number of values provided doesn’t match the number of cells?
如果提供的值数与单元格数不匹配怎么办?
> m <- matrix(nrow = 4, ncol = 5, 1:25)
Warning message:
In matrix(nrow = 4, ncol = 5, 1:25) :
data length [25] is not a sub-multiple or multiple of the number of rows [4]
> m
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 9 13 17
[2,] 2 6 10 14 18
[3,] 3 7 11 15 19
[4,] 4 8 12 16 20
We get a warning, but the matrix is created anyway with all the extra values simply discarded. If there’s fewer values than cells, the values provided get recycled until the matrix is full:
我们得到警告,但是无论如何都会创建矩阵,而所有多余的值都将被简单丢弃。 如果值少于单元格,则提供的值将被回收,直到矩阵已满:
> m <- matrix(nrow = 4, ncol = 5, 1:16)
Warning message:
In matrix(nrow = 4, ncol = 5, 1:16) :
data length [16] is not a sub-multiple or multiple of the number of columns [5]
> m
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 9 13 1
[2,] 2 6 10 14 2
[3,] 3 7 11 15 3
[4,] 4 8 12 16 4
Much like data frames have the names
attribute/function, so do the matrices have a dim
(dimension) one. Changing this property can mutate the form of a matrix:
数据框很像names
属性/函数,矩阵也有dim
(维)维。 更改此属性可以改变矩阵的形式:
> m <- 1:15
> m
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
> dim(m)
NULL
> dim(m) <- c(3,5)
> m
[,1] [,2] [,3] [,4] [,5]
[1,] 1 4 7 10 13
[2,] 2 5 8 11 14
[3,] 3 6 9 12 15
> dim(m)
[1] 3 5
> dim(m) <- c(5,3)
> m
[,1] [,2] [,3]
[1,] 1 6 11
[2,] 2 7 12
[3,] 3 8 13
[4,] 4 9 14
[5,] 5 10 15
> dim(m)
[1] 5 3
> dim(m) <- NULL
> m
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Here we created a numeric vector of 15 sequential numbers (in such cases, we can omit the c
). The dim
attribute did not exist on it, as evident by the dim
function returning NULL, so we changed it by assigning a numeric vector of 3, 5
to it. This resulted in a reshuffling of the elements to fit into the newly constructed matrix. We then changed the dimensions again by inverting the number of columns and rows to 5, 3
, which again produced a different matrix. Finally, nullifying the dimensions produced the numeric vector from the beginning.
在这里,我们创建了一个由15个连续数字组成的数字矢量(在这种情况下,我们可以省略c
)。 dim
属性不存在于其上,如dim
函数返回NULL所示,因此我们通过为其分配数字3, 5
对其进行了更改。 这导致元素的改组以适合新构建的矩阵。 然后,我们通过反转的列和行的数量来再次改变尺寸5, 3
,这再次产生了不同的基质。 最后,从零开始消除尺寸会产生数值向量。
It is also possible to combine / expand / alter vectors, dataframes and matrices by using cbind
and rbind
:
也可以使用cbind
和rbind
组合/扩展/更改矢量,数据帧和矩阵:
> v <- 1:5
> x <- 6:10
> bound <- cbind(v, x)
> bound
v x
[1,] 1 6
[2,] 2 7
[3,] 3 8
[4,] 4 9
[5,] 5 10
> bound <- rbind(v, x)
> bound
[,1] [,2] [,3] [,4] [,5]
v 1 2 3 4 5
x 6 7 8 9 10
Factors are vectors with labels. This is different than character vectors or even numeric ones because they’re what R folks call “self describing”, allowing R’s functions to automatically make more sense of them than they would of the other types. They’re built with the factor
function which needs to be fed a vector as an argument:
因素是带有标签的向量。 这与字符向量甚至是数字向量都不同,因为它们是R族人所说的“自我描述”,这使得R的函数能够自动地比其他类型更容易理解它们。 它们是用factor
函数构建的,需要将其作为参数传递给向量:
> f <- factor(c("Hello", "World", "Hello", "Annie", "Hello", "World"))
> f
[1] Hello World Hello Annie Hello World
Levels: Annie Hello World
> table(f)
f
Annie Hello World
1 3 2
Levels lists out the unique elements in the factor. The table
function grabs a factor and builds a table consisting of the various levels in the factor, and the number of their occurrences in the factor.
级别列出了因素中的独特元素。 table
函数获取一个因子并构建一个表,该表包含该因子中的各个级别以及该因子中它们出现的次数。
I personally haven’t found a use for factors yet in my projects, but I’m learning about them.
我个人尚未在项目中发现因素的用途,但我正在学习有关因素 。
In this article, we covered the basic data types in R and the essentials of using RStudio. You are now armed with all the knowledge you need to start some basic data operations, something we’ll look into in the very next post. Remember that all the functions we covered above are fully searchable in the help files with ?function
where “function” is the function name.
在本文中,我们介绍了R中的基本数据类型以及使用RStudio的要点。 现在,您已经掌握了开始一些基本数据操作所需的所有知识,我们将在下一篇文章中介绍这些知识。 请记住,我们上面介绍的所有功能都可以在带有?function
的帮助文件中进行完全搜索,其中“ function”是功能名称。
When learning a programming language, it is customary to teach data types first, logical operators second and control structures third before moving into advanced things like functions and classes, but in this case, I believe we’ve laid a decent enough foundation to jump straight into the fire and learn by example. We’ll cover all that on real, practical data – but only if you’re interested.
在学习编程语言时,习惯上先讲授数据类型,然后讲授逻辑运算符,然后讲授控制结构,然后再学习诸如函数和类之类的高级内容,但是在这种情况下,我相信我们已经奠定了足够不错的基础,可以直接跳跃陷入困境,以身作则。 我们将在真实,实用的数据上涵盖所有内容-但前提是您有兴趣。
Let us know what you thought about this post – comment below, reshare it, or just “heart” it in the forums – if there’s plenty of interest in learning R in our SitePoint audience, we’d be more than glad to go into detail.
让我们知道您对这篇文章的想法–在下面评论,转发或在论坛中“发表” –如果在SitePoint受众群体中学习R有很多兴趣,我们将非常乐于详细介绍。
翻译自: https://www.sitepoint.com/introduction-r-rstudio/
r语言和rstudio