R中数据集&适用的实验

R中有很多内置的数据集,用于学习和实验,下面仅就平时用的每一种算法摘取一个数据集,仅仅用于算法练习,当然,有可能一种数据集能用于不同的算法;更多的数据集请参考:

http://stat.ethz.ch/R-manual/R-patched/library/datasets/html/00Index.html

 

1. attitude

用于线性回归;

Description

From a survey of the clerical employees of a large financial organization, the data are aggregated from the questionnaires of the approximately 35 employees for each of 30 (randomly selected) departments. The numbers give the percent proportion of favourable responses to seven questions in each department.

Format

A dataframe with 30 observations on 7 variables. The first column are the short names from the reference, the second one the variable names in the data frame:

Y rating numeric Overall rating
X[1] complaints numeric Handling of employee complaints
X[2] privileges numeric Does not allow special privileges
X[3] learning numeric Opportunity to learn
X[4] raises numeric Raises based on performance
X[5] critical numeric Too critical
X[6] advancel numeric Advancement

 

2. infert

广义线性模型-二项分布(逻辑回归)

Description

This is a matched case-control study dating from before the availability of conditional logistic regression.

Format

1. Education 0 = 0-5 years
    1 = 6-11 years
    2 = 12+ years
2. age age in years of case
3. parity count
4. number of prior 0 = 0
  induced abortions 1 = 1
    2 = 2 or more
5. case status 1 = case
    0 = control
6. number of prior 0 = 0
  spontaneous abortions 1 = 1
    2 = 2 or more
7. matched set number 1-83
8. stratum number 1-63

例:

model1 <- glm(case ~ spontaneous+induced, data=infert,family=binomial())

 

3. 广义线性回归-泊松分布

在R内置的数据集中实在找不到单纯的泊松回归的测试集,可以用下面的数据进行测试:

http://www.ats.ucla.edu/stat/data/poisson_sim.csv
p <- read.csv(http://www.ats.ucla.edu/stat/data/poisson_sim.csv)
m1 <- glm(num_awards ~ prog + math, family = "poisson", data = p)
 
 

4. BOD (内在)非线性回归

Description

The BOD data frame has 6 rows and 2 columns giving the biochemical oxygen demand versus time in an evaluation of water quality.

Format

This data frame contains the following columns:

Time

A numeric vector giving the time of the measurement (days).

demand

A numeric vector giving the biochemical oxygen demand (mg/l).

Examples

fm1 <- nls(demand ~ A*(1-exp(-exp(lrc)*Time)), data = BOD,   start = c(A = 20, lrc = log(.35)))5 

 

5. MASS 中的cats 用于CART算法,当然任何目标变量为分类变量的数据基本都可以用于CART算法

 > library(MASS)
> data(cats)
> names(cats)
[1] "Sex" "Bwt" "Hwt"
> head(cats)
  Sex Bwt Hwt
1   F 2.0 7.0
2   F 2.0 7.4
3   F 2.0 9.5
4   F 2.1 7.2
5   F 2.1 7.3
6   F 2.1 7.6

 >library(rpart)

>cats_rpart_model <- rpart(Sex~., data = cats)

 

6. USArrests  用于主成分分析princomp、因子分析factanal

Description

This data set contains statistics, in arrests per 100,000 residents for assault, murder, and rape in each of the 50 US states in 1973. Also given is the percent of the population living in urban areas.

Usage

USArrests

Format

A data frame with 50 observations on 4 variables.

 

[,1] Murder numeric Murder arrests (per 100,000)
[,2] Assault numeric Assault arrests (per 100,000)
[,3] UrbanPop numeric Percent urban population
[,4] Rape numeric Rape arrests (per 100,000)

 

你可能感兴趣的:(each,reference,Numbers)