1 Stata自带数据集
Stata是个高度商业化的收费软件,不管什么功能,只要有的,一般都做得很极致。就拿自带数据导入来说,就是一个简单的sysuse auto.dta命令。除了系统自带的数据,Stata还提供官方的在线数据集。在线数据集包括了sysuse命令相关的数据集,具体可以用webuse xxx.dta命令来导入。
1.1 sysuse命令与本地数据集
在Stata命令窗口输入 sysuse dir,返回的是Stata本地的几十个范例数据集列表。
1.2 在线示范数据集
通过help dta_manuals命令,返回一个按手册名字分类、共包含数百个示例数据集(可以用webuse命令调用,也包括了全部的本地数据集)。
下面我们点击打开最后一个User's Guide [U]手册(help q_user命令也可),里面带有数十个范例数据集。其中前面两个就是我们比较熟悉的auto和nlsw系列数据之一的nlswork数据集。help q_user 命令可查看Stata全部范例dta
点击旁边的use就可以导入内存(如下图所示),点击describe对数据进行描述(不导入内存)。点击 use 在线导入auto数据集
也可以使用 webuse 命令在线导入 auto 数据集
. export delimited using "D:\Spyder\auto.csv", replace //导出为CSV文件
. export excel using "D:\Spyder\auto.xlsx", firstrow(variables) //导出为Excel文件
2 R 自带的数据集
2.1 R自带的datasets包中的数据集
> ##直接查看mtcars数据集
> head(mtcars) ##查看头6行
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
> View(mtcars) ##使用查看器查看数据。相当于Stata的browse命令。结果在最后再展示
> ##查看mtcars数据集及其各变量的信息
> str(mtcars)
'data.frame':32 obs. of 11 variables:
$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
$ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
$ disp: num 160 160 108 258 360 ...
$ hp : num 110 110 93 110 175 105 245 62 95 123 ...
$ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
$ wt : num 2.62 2.88 2.32 3.21 3.44 ...
$ qsec: num 16.5 17 18.6 19.4 17 ...
$ vs : num 0 0 1 1 0 1 0 1 1 1 ...
$ am : num 1 1 1 0 0 0 0 0 0 0 ...
$ gear: num 4 4 4 3 3 3 3 4 4 4 ...
$ carb: num 4 4 1 1 2 1 4 2 2 4 ...
> summary(mtcars)
mpg cyl disp hp drat
Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0 Min. :2.760
1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5 1st Qu.:3.080
Median :19.20 Median :6.000 Median :196.3 Median :123.0 Median :3.695
Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7 Mean :3.597
3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0 3rd Qu.:3.920
Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0 Max. :4.930
wt qsec vs am
Min. :1.513 Min. :14.50 Min. :0.0000 Min. :0.0000
1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000 1st Qu.:0.0000
Median :3.325 Median :17.71 Median :0.0000 Median :0.0000
Mean :3.217 Mean :17.85 Mean :0.4375 Mean :0.4062
3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000 3rd Qu.:1.0000
Max. :5.424 Max. :22.90 Max. :1.0000 Max. :1.0000
gear carb
Min. :3.000 Min. :1.000
1st Qu.:3.000 1st Qu.:2.000
Median :4.000 Median :2.000
Mean :3.688 Mean :2.812
3rd Qu.:4.000 3rd Qu.:4.000
Max. :5.000 Max. :8.000
> ##使用mtcars数据建模
> m1
> m1
lm(formula = mtcars$mpg ~ mtcars$cyl + mtcars$disp + mtcars$hp)
(Intercept) mtcars$cyl mtcars$disp mtcars$hp
34.18492 -1.22742 -0.01884 -0.01468
> summary(m1)
lm(formula = mtcars$mpg ~ mtcars$cyl + mtcars$disp + mtcars$hp)
Min 1Q Median 3Q Max
-4.0889 -2.0845 -0.7745 1.3972 6.9183
Estimate Std. Error t value Pr(>|t|)
(Intercept) 34.18492 2.59078 13.195 1.54e-13 ***
mtcars$cyl -1.22742 0.79728 -1.540 0.1349
mtcars$disp -0.01884 0.01040 -1.811 0.0809 .
mtcars$hp -0.01468 0.01465 -1.002 0.3250
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.055 on 28 degrees of freedom
Multiple R-squared: 0.7679,Adjusted R-squared: 0.743
F-statistic: 30.88 on 3 and 28 DF, p-value: 5.054e-09
> ##目前为止,我们都是在直接使用mtcars
> ##将mtcars导入到当前的工作环境Environment
> data("mtcars")
> View(iris) ##查看数据。结果如下图
>data(packages = 'datasets') ##或者直接输入data()
Data sets in package ‘datasets’:
AirPassengers Monthly Airline Passenger Numbers
BJsales Sales Data with Leading Indicator
BJsales.lead (BJsales)
Sales Data with Leading Indicator
BOD Biochemical Oxygen Demand
CO2 Carbon Dioxide Uptake in Grass Plants
ChickWeight Weight versus age of chicks on different
DNase Elisa assay of DNase
EuStockMarkets Daily Closing Prices of Major European
Stock Indices, 1991-1998
Formaldehyde Determination of Formaldehyde
HairEyeColor Hair and Eye Color of Statistics Students
Harman23.cor Harman Example 2.3
Harman74.cor Harman Example 7.4
Indometh Pharmacokinetics of Indomethacin
InsectSprays Effectiveness of Insect Sprays
JohnsonJohnson Quarterly Earnings per Johnson & Johnson
LakeHuron Level of Lake Huron 1875-1972
LifeCycleSavings Intercountry Life-Cycle Savings Data
Loblolly Growth of Loblolly pine trees
Nile Flow of the River Nile
Orange Growth of Orange Trees
OrchardSprays Potency of Orchard Sprays
PlantGrowth Results from an Experiment on Plant Growth
Puromycin Reaction Velocity of an Enzymatic Reaction
Seatbelts Road Casualties in Great Britain 1969-84
Theoph Pharmacokinetics of Theophylline
Titanic Survival of passengers on the Titanic
ToothGrowth The Effect of Vitamin C on Tooth Growth in
Guinea Pigs
UCBAdmissions Student Admissions at UC Berkeley
UKDriverDeaths Road Casualties in Great Britain 1969-84
UKgas UK Quarterly Gas Consumption
USAccDeaths Accidental Deaths in the US 1973-1978
USArrests Violent Crime Rates by US State
USJudgeRatings Lawyers' Ratings of State Judges in the US
Superior Court
Personal Expenditure Data
UScitiesD Distances Between European Cities and
Between US Cities
VADeaths Death Rates in Virginia (1940)
WWWusage Internet Usage per Minute
WorldPhones The World's Telephones
ability.cov Ability and Intelligence Tests
airmiles Passenger Miles on Commercial US Airlines,
airquality New York Air Quality Measurements
anscombe Anscombe's Quartet of 'Identical' Simple
Linear Regressions
attenu The Joyner-Boore Attenuation Data
attitude The Chatterjee-Price Attitude Data
austres Quarterly Time Series of the Number of
Australian Residents
beaver1 (beavers) Body Temperature Series of Two Beavers
beaver2 (beavers) Body Temperature Series of Two Beavers
cars Speed and Stopping Distances of Cars
chickwts Chicken Weights by Feed Type
co2 Mauna Loa Atmospheric CO2 Concentration
crimtab Student's 3000 Criminals Data
discoveries Yearly Numbers of Important Discoveries
esoph Smoking, Alcohol and (O)esophageal Cancer
euro Conversion Rates of Euro Currencies
euro.cross (euro) Conversion Rates of Euro Currencies
eurodist Distances Between European Cities and
Between US Cities
faithful Old Faithful Geyser Data
fdeaths (UKLungDeaths)
Monthly Deaths from Lung Diseases in the
freeny Freeny's Revenue Data
freeny.x (freeny) Freeny's Revenue Data
freeny.y (freeny) Freeny's Revenue Data
infert Infertility after Spontaneous and Induced
iris Edgar Anderson's Iris Data
iris3 Edgar Anderson's Iris Data
islands Areas of the World's Major Landmasses
ldeaths (UKLungDeaths)
Monthly Deaths from Lung Diseases in the
lh Luteinizing Hormone in Blood Samples
longley Longley's Economic Regression Data
lynx Annual Canadian Lynx trappings 1821-1934
mdeaths (UKLungDeaths)
Monthly Deaths from Lung Diseases in the
morley Michelson Speed of Light Data
mtcars Motor Trend Car Road Tests
nhtemp Average Yearly Temperatures in New Haven
nottem Average Monthly Temperatures at
Nottingham, 1920-1939
npk Classical N, P, K Factorial Experiment
occupationalStatus Occupational Status of Fathers and their
precip Annual Precipitation in US Cities
presidents Quarterly Approval Ratings of US
pressure Vapor Pressure of Mercury as a Function of
quakes Locations of Earthquakes off Fiji
randu Random Numbers from Congruential Generator
rivers Lengths of Major North American Rivers
rock Measurements on Petroleum Rock Samples
sleep Student's Sleep Data
stack.loss (stackloss)
Brownlee's Stack Loss Plant Data
stack.x (stackloss)
Brownlee's Stack Loss Plant Data
stackloss Brownlee's Stack Loss Plant Data
state.abb (state) US State Facts and Figures
state.area (state) US State Facts and Figures
state.center (state)
US State Facts and Figures
state.division (state)
US State Facts and Figures
state.name (state) US State Facts and Figures
state.region (state)
US State Facts and Figures
state.x77 (state) US State Facts and Figures
sunspot.month Monthly Sunspot Data, from 1749 to
sunspot.year Yearly Sunspot Data, 1700-1988
sunspots Monthly Sunspot Numbers, 1749-1983
swiss Swiss Fertility and Socioeconomic
Indicators (1888) Data
treering Yearly Treering Data, -6000-1979
trees Diameter, Height and Volume for Black
Cherry Trees
uspop Populations Recorded by the US Census
volcano Topographic Information on Auckland's
Maunga Whau Volcano
warpbreaks The Number of Breaks in Yarn during
women Average Heights and Weights for American
3 Python导入自带数据集
3.1 sciki-learn机器学习的datasets
from sklearn import datasets ##导入datasets
iris = datasets.load_iris() ##导入iris数据集
print(iris) ##结果太长不作展示
3.2 高级画图seaborn包所带数据集
import seaborn as sns
titanic=sns.load_dataset('titanic') ##加载titanic数据集
## 还可以查看更多seaborns所携带的数据集
Ref7. Dataset loading utilitiesscikit-learn.org