########R Built-in Data Sets#########
{datasets}
# > ls("package:datasets")
# [1] "ability.cov" "airmiles" "AirPassengers"
# [4] "airquality" "anscombe" "attenu"
# [7] "attitude" "austres" "beaver1"
# [10] "beaver2" "BJsales" "BJsales.lead"
# [13] "BOD" "cars" "ChickWeight"
# [16] "chickwts" "co2" "CO2"
# [19] "crimtab" "discoveries" "DNase"
# [22] "esoph" "euro" "euro.cross"
# [25] "eurodist" "EuStockMarkets" "faithful"
# [28] "fdeaths" "Formaldehyde" "freeny"
# [31] "freeny.x" "freeny.y" "HairEyeColor"
# [34] "Harman23.cor" "Harman74.cor" "Indometh"
# [37] "infert" "InsectSprays" "iris"
# [40] "iris3" "islands" "JohnsonJohnson"
# [43] "LakeHuron" "ldeaths" "lh"
# [46] "LifeCycleSavings" "Loblolly" "longley"
# [49] "lynx" "mdeaths" "morley"
# [52] "mtcars" "nhtemp" "Nile"
# [55] "nottem" "npk" "occupationalStatus"
# [58] "Orange" "OrchardSprays" "PlantGrowth"
# [61] "precip" "presidents" "pressure"
# [64] "Puromycin" "quakes" "randu"
# [67] "rivers" "rock" "Seatbelts"
# [70] "sleep" "stack.loss" "stack.x"
# [73] "stackloss" "state.abb" "state.area"
# [76] "state.center" "state.division" "state.name"
# [79] "state.region" "state.x77" "sunspot.month"
# [82] "sunspot.year" "sunspots" "swiss"
# [85] "Theoph" "Titanic" "ToothGrowth"
# [88] "treering" "trees" "UCBAdmissions"
# [91] "UKDriverDeaths" "UKgas" "USAccDeaths"
# [94] "USArrests" "UScitiesD" "USJudgeRatings"
# [97] "USPersonalExpenditure" "uspop" "VADeaths"
# [100] "volcano" "warpbreaks" "women"
# [103] "WorldPhones" "WWWusage"
#Ref.From:http://www.sthda.com/english/wiki/r-built-in-data-sets#preleminary-tasks
# Preleminary tasks
# List of pre-loaded data
# Loading a built-in R data
# Most used R built-in data sets
# mtcars: Motor Trend Car Road Tests
# iris
# ToothGrowth
# PlantGrowth
# USArrests
# Summary
#R comes with several built-in data sets, which are generally used as demo data for playing with R functions.
#In this article, we’ll first describe how load and use R built-in data sets.
#Next, we’ll describe some of the most used R demo data sets: mtcars, iris, ToothGrowth, PlantGrowth and USArrests.
#Preleminary tasks
#Launch RStudio as described here: Running RStudio and setting up your working directory
#List of pre-loaded data
#To see the list of pre-loaded data, type the function data():
data()
#The output is as follow: R data sets
#Data sets in package ‘datasets’:#
# AirPassengers Monthly Airline Passenger Numbers 1949-1960
# BJsales Sales Data with Leading Indicator
# BJsales.lead (BJsales) Sales Data with Leading Indicator
# BOD Biochemical Oxygen Demand
# CO2 Carbon Dioxide Uptake in Grass Plants
# ChickWeight Weight versus age of chicks on different diets
# DNase Elisa assay of DNase
# EuStockMarkets Daily Closing Prices of Major European Stock Indices, 1991-1998
# Formaldehyde Determination of Formaldehyde
# HairEyeColor Hair and Eye Color of Statistics Students
# Harman23.cor Harman Example 2.3
# Harman74.cor Harman Example 7.4
# Indometh Pharmacokinetics of Indomethacin
# InsectSprays Effectiveness of Insect Sprays
# JohnsonJohnson Quarterly Earnings per Johnson & Johnson Share
# LakeHuron Level of Lake Huron 1875-1972
# LifeCycleSavings Intercountry Life-Cycle Savings Data
# Loblolly Growth of Loblolly pine trees
# Nile Flow of the River Nile
# Orange Growth of Orange Trees
# OrchardSprays Potency of Orchard Sprays
# PlantGrowth Results from an Experiment on Plant Growth
# Puromycin Reaction Velocity of an Enzymatic Reaction
# Seatbelts Road Casualties in Great Britain 1969-84
# Theoph Pharmacokinetics of Theophylline
# Titanic Survival of passengers on the Titanic
# ToothGrowth The Effect of Vitamin C on Tooth Growth in Guinea Pigs
# UCBAdmissions Student Admissions at UC Berkeley
# UKDriverDeaths Road Casualties in Great Britain 1969-84
# UKgas UK Quarterly Gas Consumption
# USAccDeaths Accidental Deaths in the US 1973-1978
# USArrests Violent Crime Rates by US State
# USJudgeRatings Lawyers' Ratings of State Judges in the US Superior Court
# USPersonalExpenditure Personal Expenditure Data
# UScitiesD Distances Between European Cities and Between US Cities
# VADeaths Death Rates in Virginia (1940)
# WWWusage Internet Usage per Minute
# WorldPhones The World's Telephones
# ability.cov Ability and Intelligence Tests
# airmiles Passenger Miles on Commercial US Airlines, 1937-1960
# airquality New York Air Quality Measurements
# anscombe Anscombe's Quartet of 'Identical' Simple Linear Regressions
# attenu The Joyner-Boore Attenuation Data
# attitude The Chatterjee-Price Attitude Data
# austres Quarterly Time Series of the Number of Australian Residents
# beaver1 (beavers) Body Temperature Series of Two Beavers
# beaver2 (beavers) Body Temperature Series of Two Beavers
# cars Speed and Stopping Distances of Cars
# chickwts Chicken Weights by Feed Type
# co2 Mauna Loa Atmospheric CO2 Concentration
# crimtab Student's 3000 Criminals Data
# discoveries Yearly Numbers of Important Discoveries
# esoph Smoking, Alcohol and (O)esophageal Cancer
# euro Conversion Rates of Euro Currencies
# euro.cross (euro) Conversion Rates of Euro Currencies
# eurodist Distances Between European Cities and Between US Cities
# faithful Old Faithful Geyser Data
# fdeaths (UKLungDeaths) Monthly Deaths from Lung Diseases in the UK
# freeny Freeny's Revenue Data
# freeny.x (freeny) Freeny's Revenue Data
# freeny.y (freeny) Freeny's Revenue Data
# infert Infertility after Spontaneous and Induced Abortion
# iris Edgar Anderson's Iris Data
# iris3 Edgar Anderson's Iris Data
# islands Areas of the World's Major Landmasses
# ldeaths (UKLungDeaths) Monthly Deaths from Lung Diseases in the UK
# lh Luteinizing Hormone in Blood Samples
# longley Longley's Economic Regression Data
# lynx Annual Canadian Lynx trappings 1821-1934
# mdeaths (UKLungDeaths) Monthly Deaths from Lung Diseases in the UK
# morley Michelson Speed of Light Data
# mtcars Motor Trend Car Road Tests
# nhtemp Average Yearly Temperatures in New Haven
# nottem Average Monthly Temperatures at Nottingham, 1920-1939
# npk Classical N, P, K Factorial Experiment
# occupationalStatus Occupational Status of Fathers and their Sons
# precip Annual Precipitation in US Cities
# presidents Quarterly Approval Ratings of US Presidents
# pressure Vapor Pressure of Mercury as a Function of Temperature
# quakes Locations of Earthquakes off Fiji
# randu Random Numbers from Congruential Generator RANDU
# rivers Lengths of Major North American Rivers
# rock Measurements on Petroleum Rock Samples
# sleep Student's Sleep Data
# stack.loss (stackloss) Brownlee's Stack Loss Plant Data
# stack.x (stackloss) Brownlee's Stack Loss Plant Data
# stackloss Brownlee's Stack Loss Plant Data
# state.abb (state) US State Facts and Figures
# state.area (state) US State Facts and Figures
# state.center (state) US State Facts and Figures
# state.division (state) US State Facts and Figures
# state.name (state) US State Facts and Figures
# state.region (state) US State Facts and Figures
# state.x77 (state) US State Facts and Figures
# sunspot.month Monthly Sunspot Data, from 1749 to "Present"
# sunspot.year Yearly Sunspot Data, 1700-1988
# sunspots Monthly Sunspot Numbers, 1749-1983
# swiss Swiss Fertility and Socioeconomic Indicators (1888) Data
# treering Yearly Treering Data, -6000-1979
# trees Diameter, Height and Volume for Black Cherry Trees
# uspop Populations Recorded by the US Census
# volcano Topographic Information on Auckland's Maunga Whau Volcano
# warpbreaks The Number of Breaks in Yarn during Weaving
# women Average Heights and Weights for American Women
# Use ‘data(package = .packages(all.available = TRUE))’
# to list the data sets in all *available* packages.
#Loading a built-in R data
#Load and print mtcars data as follow:
# Loading
data(mtcars)#A data frame with 32 observations on 11 (numeric) variables.
# [, 1] mpg Miles/(US) gallon
# [, 2] cyl Number of cylinders
# [, 3] disp Displacement (cu.in.)
# [, 4] hp Gross horsepower
# [, 5] drat Rear axle ratio
# [, 6] wt Weight (1000 lbs)
# [, 7] qsec 1/4 mile time
# [, 8] vs Engine (0 = V-shaped, 1 = straight)
# [, 9] am Transmission (0 = automatic, 1 = manual)
# [,10] gear Number of forward gears
# [,11] carb Number of carburetors
# Print the first 6 rows
head(mtcars, 6)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
#If you want learn more about mtcars data sets, type this:
?mtcars
#Most used R built-in data sets
mtcars#Motor Trend Car Road Tests
#The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models)
#View the content of mtcars data set:
# 1. Loading
data("mtcars")
# 2. Print
head(mtcars)
#It contains 32 observations and 11 variables:
# Number of rows (observations)
nrow(mtcars)
[1] 32
# Number of columns (variables)
ncol(mtcars)
[1] 11
#Description of variables:
# mpg: Miles/(US) gallon
# cyl: Number of cylinders#汽缸
# disp: Displacement (cu.in.)#排量
# hp: Gross horsepower#马力(功率单位)
# drat: Rear axle ratio#后轴比
# wt: Weight (1000 lbs)
# qsec: 1/4 mile time
# vs: V/S#Engine (0 = V-shaped, 1 = straight)
# am: Transmission (0 = automatic, 1 = manual)#n. 传动装置,[机] 变速器;传递;传送;播送
# gear: Number of forward gears#齿轮
# carb: Number of carburetors#汽化器
#If you want to learn more about mtcars, type this:
?mtcars
iris
?iris
#iris is a data frame with 150 cases (rows) and 5 variables (columns) named Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, and Species.
#iris data set gives the measurements in centimeters of the variables sepal length, sepal width, petal length and petal width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa, versicolor, and virginica.
# sepal n. [植] 萼片;花萼
#petal n. 花瓣
#The species are Iris setosa(多刚毛的), versicolorz(adj. 杂色的,多色的;颜色变化的) and virginica.
data("iris")
head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
iris3
class(iris3)
[1] "array"
str(iris3)
num [1:50, 1:4, 1:3] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
- attr(*, "dimnames")=List of 3
..$ : NULL
..$ : chr [1:4] "Sepal L." "Sepal W." "Petal L." "Petal W."
..$ : chr [1:3] "Setosa" "Versicolor" "Virginica"
#iris3 gives the same data arranged as a 3-dimensional array of size 50 by 4 by 3, as represented by S-PLUS. The first dimension gives the case number within the species subsample, the second the measurements with names Sepal L., Sepal W., Petal L., and Petal W., and the third the species.
?iris3
#Examples
dni3 <- dimnames(iris3)
ii <- data.frame(matrix(aperm(iris3, c(1,3,2)), ncol = 4,
dimnames = list(NULL, sub(" L.",".Length",
sub(" W.",".Width", dni3[[2]])))),
Species = gl(3, 50, labels = sub("S", "s", sub("V", "v", dni3[[3]]))))
all.equal(ii, iris) # TRUE
ToothGrowth
?ToothGrowth
#ToothGrowth data set contains the result from an experiment studying the effect of vitamin C on tooth growth in 60 Guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, (orange juice or ascorbic acid (a form of vitamin C and coded as VC).
# Format
# A data frame with 60 observations on 3 variables.#
# [,1] len #numeric Tooth length
# [,2] supp #factor Supplement type (VC or OJ).
# [,3] dose #numeric Dose in milligrams/day
data("ToothGrowth")
head(ToothGrowth)
len supp dose
1 4.2 VC 0.5
2 11.5 VC 0.5
3 7.3 VC 0.5
4 5.8 VC 0.5
5 6.4 VC 0.5
6 10.0 VC 0.5
PlantGrowth
#Results obtained from an experiment to compare yields (as measured by dried weight of plants) obtained under a control and two different treatment condition.
data("PlantGrowth")
PlantGrowth
head(PlantGrowth)
weight group
1 4.17 ctrl
2 5.58 ctrl
3 5.18 ctrl
4 6.11 ctrl
5 4.50 ctrl
6 4.61 ctrl
?PlantGrowth
#PlantGrowth {datasets}
# Description
# Results from an experiment to compare yields (as measured by dried weight of plants) obtained under a control and two different treatment conditions.
# Usage
# PlantGrowth
# Format
# A data frame of 30 cases on 2 variables.
# [, 1] weight numeric
# [, 2] group factor
# The levels of group are ‘ctrl’, ‘trt1’, and ‘trt2’.
USArrests
#This data set contains statistics about violent crime rates by us state.
data("USArrests")
head(USArrests)
Murder Assault UrbanPop Rape
Alabama 13.2 236 58 21.2
Alaska 10.0 263 48 44.5
Arizona 8.1 294 80 31.0
Arkansas 8.8 190 50 19.5
California 9.0 276 91 40.6
Colorado 7.9 204 78 38.7
Murder: Murder arrests (per 100,000)
Assault: Assault arrests (per 100,000)
UrbanPop: Percent urban population
Rape: Rape arrests (per 100,000)
?USArrests
#Violent Crime Rates by US State
#Description
#This data set contains statistics, in arrests per 100,000 residents for assault, murder, and rape in each of the 50 US states in 1973. Also given is the percent of the population living in urban areas.
#Usage
USArrests
#Format
#A data frame with 50 observations on 4 variables.
# [,1] Murder numeric Murder arrests (per 100,000)
# [,2] Assault numeric Assault arrests (per 100,000)
# [,3] UrbanPop numeric Percent urban population
# [,4] Rape numeric Rape arrests (per 100,000)
#Summary
data(“dataset_name”)#Load a built-in R data set
head(dataset_name)#Inspect the data set