stata 截取高德地图
Maps are a powerful visualization tool that immediately show the spatial distribution of data. In Stata, the ability to make maps is fairly recent and was introduced with the user-written commands shp2dta and spmap around 2014. Stata 15 formally integrated both of these commands as spshape2dta and grmaps respectively in 2017. This also coincided with spatial regressions introduced in Stata 15.
地图是功能强大的可视化工具,可立即显示数据的空间分布。 在Stata中 ,制作地图的能力是相当新的,并且是由用户编写的命令shp2dta和spmap引入的 2014年左右。 Stata 15在2017年将这两个命令分别正式整合为spshape2dta和grmaps 。这也与Stata 15中引入的空间回归相吻合。
In this guide, will be learn how to map COVID-19 policy stringency data to come up with a map which looks like this:
在本指南中,将学习如何映射COVID-19策略严格性数据以提供如下所示的映射:
For this guide, a basic knowledge of Stata is assumed including a familiarity with the Stata interface, dofiles, and code syntax. The guide here will introduce shapefiles, map projections, merging datasets from different sources, the commands to generate customized maps, customized color schemes, and the use of locals for automation of code.
在本指南中,假定您掌握Stata的基本知识,包括对Stata接口,dofiles和代码语法的熟悉。 这里的指南将介绍shapefile , 地图投影 , 合并来自不同来源的数据集 , 生成自定义地图的命令, 自定义配色方案以及使用本地代码自动化代码。
了解shapefile (Understanding shapefiles)
Spatial data comes in different formats. The most common version of map data are called shapefiles that was developed by ESRI, the makers of ArcGIS, which is also the standard industrial software for mapping and spatial analysis. ESRI has also developed the now well-known dashboard used by John Hopkin’s University (JHU) and several other countries to showcase COVID-19 maps and trends. Given the rise and popularity of maps, shapefiles are usually easily accessible for most countries including different layers of administrative boundaries. Shapefiles are basically a set of multiple files, of which, the following three core files are required to generate a map:
空间数据有不同的格式。 地图数据最常见的版本称为shapefile ,由ESRI ( ArcGIS的制造商)开发,它也是用于地图和空间分析的标准工业软件。 ESRI还开发了约翰霍普金大学 (JHU)和其他几个国家使用的,现在众所周知的仪表板,以展示COVID-19的地图和趋势。 鉴于地图的兴起和普及,对于大多数国家(包括不同层次的行政边界),shapefile通常都很容易访问。 Shapefile基本上是一组多个文件,其中,需要以下三个核心文件才能生成映射:
.shp: contains the coordinates of shapes which can be polygons, lines, or points
.shp :包含形状的坐标,可以是多边形,直线或点
.dbf: contains the attributes of the shapes
.dbf :包含形状的属性
.shx: has the index of spatial objects
.shx :具有空间对象的索引
Several other optional files are usually bundled with the above three, but an important one here is the projections file:
上面的三个文件通常与其他几个可选文件捆绑在一起,但是这里重要的一个是投影文件:
.prj: contains the projection or spatial reference system of the shape
.prj :包含形状的投影或空间参照系
While we won’t go too much into the details of projection systems, it is still essential to understand the core concept behind them. When we see maps, they are usually drawn on a flat two-dimensional surface to represent shape of the earth, a three-dimensional object. This 3-D to 2-D mapping can be done in multiple ways. Different projections are needed to calculate the correct distances or areas. But since we are just plotting data on a map, we will use Google’s Web Mercator projection, which distorts both distance and area, for making countries easier to read. Some countries also release their own projection systems to make maps as accurately as possible. An easy to relate to example given here, shows how a face can be drawn on a flat surface using different projections.
尽管我们不会过多介绍投影系统的细节,但了解它们背后的核心概念仍然至关重要。 当我们看到地图时,通常将它们绘制在平坦的二维表面上以表示地球的形状(三维物体)。 这种3-D到2-D映射可以通过多种方式完成。 需要不同的投影来计算正确的距离或面积。 但是,由于我们只是在地图上绘制数据,因此我们将使用Google的Web Mercator投影,该投影会扭曲距离和面积,从而使国家/地区更易于阅读。 一些国家还发布了自己的投影系统,以尽可能准确地制作地图。 这里给出的一个易于关联的示例显示了如何使用不同的投影在平面上绘制面Kong。
任务1:按顺序获取文件夹 (Task 1: Get the folders in order)
In line with the previous guides, we use the following folder structure to keep track of the files:
与以前的指南一致,我们使用以下文件夹结构来跟踪文件:
Those, who have read and followed earlier guides, can see that here we add an additional folder, called GIS, to keep all the spatial files. If you are reading this guide for the first time, then just create a root folder, and within it, the following six subfolders shown above. The description of each folder is given in the figure.
那些阅读并遵循了先前指南的人可以看到,在这里我们添加了一个名为GIS的附加文件夹,用于保留所有空间文件。 如果您是第一次阅读本指南,则只需创建一个根文件夹,并在其中创建上面显示的以下六个子文件夹。 图中给出了每个文件夹的描述。
任务2:获取地图文件 (Task 2: Get the map files)
The generic map files for the world can be downloaded from different places on the internet. For this guide, we will use the ArcGIS 2020 world country boundaries file given at the link here: https://hub.arcgis.com/datasets/UIA::uia-world-countries-boundaries
可以从Internet的不同位置下载世界通用地图文件。 对于本指南,我们将使用以下链接中给出的ArcGIS 2020世界国家边界文件: https : //hub.arcgis.com/datasets/UIA :: uia- world-countries-boundaries
The page looks like the image below. Since there are different versions of the files, make sure that the date of the file is somewhere in 2020. If not, then one can scroll down the page and select more recent versions of the same dataset.
该页面如下图所示。 由于文件的版本不同,请确保文件的日期位于2020年。如果没有,则可以向下滚动页面并选择同一数据集的更多最新版本。
On this page, just click on the Download drop-down menu and then click on the Shapefile, which will download a zipped file. Extract the contents of the zipped file in the GIS folder:
在此页面上,只需单击“ 下载”下拉菜单,然后单击Shapefile ,它将下载一个压缩文件。 提取GIS文件夹中压缩文件的内容:
Here we get five files from the compressed zip file. In general, shapefiles can be directly easily viewed in softwares like ArcGIS (needs a license), QGIS (free) and GeoDa (free) just by opening the .shp file.
在这里,我们从压缩的zip文件中获得了五个文件。 通常,只需打开.shp文件,即可在ArcGIS (需要许可证), QGIS (免费)和GeoDa (免费)等软件中直接轻松查看shapefile。
任务3:准备数据 (Task 3: Prepare the data)
For the maps, we will use the Oxford COVID-19 Government Response Tracker (OxCGRT), that we also used in Guide 3. Guide 3, also included a detailed description on how to extract data from a Github repository. For the sake of completeness, the code required to set up the data set is repeated below:
对于地图,我们将使用在指南3中也使用的牛津COVID-19政府响应跟踪器(OxCGRT) 。 指南3还详细介绍了如何从Github存储库中提取数据。 为了完整起见,下面重复设置数据集所需的代码:
clear
cd "D:/Programs/Dropbox/Dropbox/PROJECT COVID - MEDIUM"
*** import the data from Github
insheet using "https://raw.githubusercontent.com/OxCGRT/covid-policy-tracker/master/data/OxCGRT_latest.csv", clear*** new data has been added for USA and UK for regions so we drop the regions:
drop if regionname!=""*** keep only the variables we need
keep date countryname containmenthealthindexfordisplayren stringencyindexfordisplay Index
ren countryname country*** fix the datetostring date, gen(date2) // string the date variablegen year = substr(date2,1,4) // extract the date components
gen month = substr(date2,5,2)
gen day = substr(date2,7,2)destring year month day, replace
drop date date2gen date = mdy(month,day,year)
format date %tdDD-Mon-yyyydrop month day year
order country date
sort country datedrop if date < 21915 // 21915 = 1st January** save the files
compress
save ./master/COVID_policies.dta, replace
In the code above, we keep the overall policy stringency index, country name, and the date variables. The date variable is fixed in the middle part of the code above and, in the last step, we save the cleaned data in the master subfolder with the name COVID_policies.dta.
在上面的代码中,我们保留整体策略严格性索引,国家/地区名称和日期变量。 date变量固定在上面代码的中间部分,在最后一步,我们将清理后的数据保存在主子文件夹中,名称为COVID_policies.dta 。
任务4:在Stata中使用Shapefile (Task 4: Using Shapefiles in Stata)
In order to use the shapefiles, we need to install the following Stata packages:
为了使用shapefile,我们需要安装以下Stata软件包:
*** install the core packagesssc install spmap // for the maps package
ssc install shp2dta // shapefiles to dta. For versions < v15.
ssc install geo2xy // for fixing the coordinate system
For Stata 15 or higher, the custom spmap
can also be replaced with the Stata’s official grmap
command. Since grmap
is a wrapper for spmap
, I will stick to the original command. Both can be used interchangeably after minor tweaks to the syntax. An official Stata guide on importing JHU COVID-19 data, and setting it up using the grmap is given here.
对于Stata 15或更高版本,自定义spmap
也可以替换为Stata的官方grmap
命令。 由于grmap
是grmap
的包装器, spmap
我将坚持原始命令。 在对语法进行较小的调整之后,两者都可以互换使用。 在进口JHU COVID-19的数据,并使用grmap设置它的官方塔塔指南给出这里 。
shp2dta
is only necessary if you are using Stata 14 or earlier versions. shp2dta
should work with most of the shapefiles but there is a chance it might not read all shapefiles correctly and give an error. For Stata 15 or higher, the built-in command spshape2dta
can be used. geo2xy
is used for changing the map projections in Stata.
仅在使用Stata 14或更早版本时,才需要shp2dta
。 shp2dta
应该适用于大多数shapefile,但是有可能无法正确读取所有shapefile并给出错误。 对于Stata 15或更高版本,可以使用内置命令spshape2dta
。 geo2xy
用于更改Stata中的地图投影。
In the first step, we read the shapefile in Stata and covert it to Stata format:
第一步,我们读取Stata中的shapefile并将其转换为Stata格式:
********* get the shapefiles in order in ordercd GIS // switch to the GIS folder
spshape2dta "World_Countries__Generalized_.shp", replace saving(world)
This will show the following screen output:
这将显示以下屏幕输出:
Here we can see that two datasets are created: world.dta, which contains the attributes of the shapes (or countries in our case), and world_shp.dta which contains the outlines of the shapes.
在这里,我们可以看到创建了两个数据集: world.dta,其中包含形状的属性(在本例中为国家), world_shp.dta ,其中包含形状的轮廓。
For Stata 14 and earlier versions, the following shp2dta
can be used:
对于Stata 14和更早版本,可以使用以下shp2dta
:
*** For Stata 14 or earlier versions ONLY:shp2dta using World_Countries__Generalized_.shp, database(world) coordinates(world_shp) genid(_ID) gencentroids(c) replace
We can also explore the world.dta and the world_shp.dta files:
我们还可以浏览world.dta和world_shp.dta文件:
*** explore the outline fileuse world_shp, clear
scatter _Y _X, msize(tiny) msymbol(point)
In the file above, each countries are defined by the variable _ID and are stored as a series of points. This is a standard structure of map files (spatial JSON files also have a similar concept of connecting dots to form a shape). From the command above, we get the following scatter:
在上面的文件中,每个国家/地区都由变量_ID定义,并存储为一系列点。 这是地图文件的标准结构(空间JSON文件也具有连接点以形成形状的类似概念)。 从上面的命令,我们得到以下散点图:
In the map above, Antartica sticks out prominently at the bottom but has no known reported cases. Therefore, we can drop it to make some space for other countries. In order to drop regions or countries, we need to get their names. Since names exists in the attributes data file world.dta, we can merge and drop as follows:
在上面的地图中,南极洲在底部突出突出,但没有已知的病例报告。 因此,我们可以将其删除以为其他国家留出空间。 为了删除地区或国家/地区,我们需要获取其名称。 由于名称存在于属性数据文件world.dta中 ,因此我们可以按以下步骤合并和删除:
*** merge with attributes file to get the namesmerge m:1 _ID using world
drop rec_header- _merge // drop all additional variables
drop if COUNTRY=="Antarctica" // drop the polar regionsscatter _Y _X, msize(tiny) msymbol(point)
This makes the map a bit more readable. In order to improve the map further, we change the default map projection to Google’s Web Mercator projection:
这使地图更具可读性。 为了进一步改善地图,我们将默认地图投影更改为Google的Web Mercator投影 :
geo2xy _Y _X, proj (web_mercator) replace
scatter _Y _X, msize(tiny) msymbol(point)
Which gives us:
这给了我们:
Here you can see that the scale of the axes has changed and the countries are squeezed in the middle and stretched a bit more on the top and the bottom. This (sort of misleading) projection makes European countries a bit more prominent by making them larger and also makes their data easier to read. We can now save the modified world_shp.dta boundary file with the updated information:
在这里,您可以看到轴的比例已更改,国家/地区在中间被挤压,顶部和底部被拉伸得更多。 这种(有点误导性的)预测使欧洲国家变得更大,从而使其更加突出,并使它们的数据更易于阅读。 现在,我们可以使用更新的信息保存修改后的world_shp.dta边界文件:
sort _ID
save, replace
As an additional step, we also generate a file which contains the names of the countries for labeling. This step is strictly not necessary for this guide, but this information is useful in general. Since names and center points of country shapes exist in the attribute world.dta file, we use the following code to generate a labels file:
作为附加步骤,我们还将生成一个文件,其中包含要标记的国家/地区的名称。 对于本指南,此步骤绝对不是必需的,但此信息通常很有用。 由于国家/地区形状的名称和中心点存在于属性world.dta文件中,因此我们使用以下代码生成标签文件:
*** generate a file for labelsuse world, clear
ren COUNTRY country
drop if country=="Antarctica"
keep _CX _CY country
geo2xy _CY _CX, proj(web_mercator) replace // fix the projection
compress
save world_label.dta, replace
Here we just keep the X and Y coordinates of the center points and the country names. We also need to change the projection of the coordinates so they can be correctly overlaid with the boundaries.
在这里,我们只保留中心点的X和Y坐标以及国家/地区名称。 我们还需要更改坐标的投影,以便可以将它们正确地与边界重叠。
Note: for labels we can also just use the original world.dta file, but the advantage of the additional file is that it allows us to custom label some selected group of countries and regions. This will be used and discussed in later guides.
注意:对于标签,我们也可以只使用原始的world.dta文件,但是附加文件的优点是它允许我们自定义标签某些选定的国家和地区组。 这将在以后的指南中使用和讨论。
We are now ready to make some proper maps. Here we start by loading the attribute file and map it using the spmap
command:
我们现在准备制作一些合适的地图。 在这里,我们首先加载属性文件并使用spmap
命令对其进行映射:
use world, clear
ren COUNTRY country
drop if country=="Antarctica"*** the first map
spmap using world_shp, id(_ID)
graph export ../graphs/map1.png, replace wid(1000)
which gives us the following figure:
这给我们下图:
Note the use of relative paths here. Since we are in the GIS directory (we typed cd GIS
earlier to process shapefiles), the two dots ..
imply that we first go up one directory, and then go back down in the graphs folder where we save the map1.png file.
请注意此处相对路径的使用。 由于我们位于GIS目录中(我们之前输入cd GIS
来处理shapefile),所以两个点..
表示我们首先进入一个目录,然后再回到保存了map1.png文件的graphs文件夹中。
We can also generate the same map above with country labels:
我们还可以使用国家标签生成上面的同一张地图:
** second map with labelsspmap using world_shp, ///
id(_ID) ocolor(gs4 ..) osize(vvthin ..) ///
label(data("world_label.dta") x(_CX) y(_CY) label(country) size(*0.45 ..) length(25))graph export ../graphs/map2.png, replace wid(1000)
The graph above is a bit messy with the country name clusters but it is to just showcase how this can be done.
上图对国家名称集群有些混乱,但这只是为了展示如何做到这一点。
A random note: The north-most label shows the Svelbard region, which belongs to Norway (also known as Spitzbergen in German). Svelbard lies just below the Arctic circle and it is one of the coldest and the north-most inhabited regions in the world. It also has no known COVID-19 cases.
随机注释 :最北的标签显示了斯维尔巴德地区,该地区属于挪威(德语中也称为Spitzbergen)。 斯维尔巴德位于北极圈正下方,是世界上最寒冷,最北端的居民区之一。 它还没有已知的COVID-19病例。
任务5:将所有内容放在一起 (Task 5: Putting it all together)
We can now merge the world.dta attribute dataset with the COVID19 policy database we saved earlier. The two datasets can be combined on the country name variable.
现在,我们可以将world.dta属性数据集与我们之前保存的COVID19策略数据库合并。 可以在国家/地区名称变量上合并这两个数据集。
Note: Merging on names, in general, is sub-optimal since names and spellings can vary. Ideally, the datasets should contain unique identifiers for each region. For example, countries like France have territories all over the world. Sometimes, these territories are labeled by their actual names and sometimes just as France. Similar issues exist with small island territories, regions with contested boundaries etc. While typically, such regions tend to get dropped (or even overlooked), accuracy in data management requires these need be correctly identified across multiple datasets.
注意:由于名称和拼写可能会有所不同,因此名称合并通常不是最佳选择。 理想情况下,数据集应包含每个区域的唯一标识符。 例如,像法国这样的国家在世界范围内都有领土。 有时,这些地区会用其实际名称标记,有时也像法国一样。 在小岛地区,边界争夺的地区等也存在类似的问题。尽管通常这些地区容易掉落(甚至被忽略),但数据管理的准确性要求跨多个数据集正确识别这些需求。
Since we are merging datasets from different databases, and names vary across datasets, not all names will merge perfectly. Here, I will just show some examples of basic (and manual) name cleaning we can do before the merge (there are other countries that need to be cleaned up as well):
由于我们要合并来自不同数据库的数据集,并且名称随数据集的不同而不同,因此并非所有名称都可以完美地合并。 在这里,我将仅展示一些我们可以在合并之前进行的基本(和手动)名称清除的示例(还有其他一些国家也需要清除):
*** clean some country namesreplace country="Cote d'Ivoire" if country=="Côte d'Ivoire"
replace country="Kyrgyz Republic" if country=="Kyrgyzstan"
replace country="Slovak Republic" if country=="Slovakia"
replace country="Democratic Republic of Congo" if country=="Congo DRC"
replace country="Russia" if country=="Russian Federation"
replace country="Palestine" if country=="Palestinian Territory"
We can now do a one-to-many 1:m
merge since we are combining countries in world.dta with countries plus dates in COVID19_policies.dta:
现在,我们可以进行一对多的1:m
合并,因为我们将world.dta中的国家与COVID19_policies.dta中的国家以及日期进行合并 :
merge 1:m country using ../master/COVID_policiestab country if _m==1 // show the unmerged countries in world.dta
tab country if _m==2 // show the unmerged countries in COVID data*** see exactly how many countries merged
egen tag = tag(country) // tag=1 for each country observation
tab _m if tag==1 // 178 countries merge perfectly.
drop if _m==2 // drop the unmerged countries from the policy data
We can now make the first map of policy stringency:
现在,我们可以制作第一个策略严格性映射:
***
summ datespmap Index using world_shp if date==`r(max)', ///
id(_ID) cln(11) fcolor(Heat)
graph export ../graphs/map3.png, replace wid(1000)
This is the default Stata map layout. It contains eleven endogenously generated cut-offs cln(11)
and use the Heat color scheme fcolor(Heat)
. See help spmap
to show the other color schemes available.
这是默认的Stata地图布局。 它包含11个内生的截止cln(11)
并使用Heat配色方案fcolor(Heat)
。 请参阅help spmap
以显示其他可用的配色方案。
We can refine the above map further:
我们可以进一步细化上面的地图:
summ date spmap Index using world_shp if date==`r(max)', ///
id(_ID) ocolor(black ..) osize(vvthin ..) ///
clmethod(custom) clbreaks(0(10)100) fcolor(Heat) ///
legend(pos(7) size(*1)) legstyle(2) ///
title("COVID-19 Policy Stringency Index", size(small)) ///
note("Data source: Oxford COVID-19 Government Response Tracker. Mercator projection used. Antartica dropped from maps." , size(tiny))graph export ../graphs/map4.png, replace wid(1000)
Here we make the outline osize(vvthin ..)
very very thin. We also defined a custom legend cut-off clmethod(custom) clbreaks(0(10)100)
. We also modify the legend position, size and marker separators legstyle(2)
and add a title and notes.
在这里,我们使轮廓osize(vvthin ..)
非常细。 我们还定义了一个自定义图例截止clmethod(custom) clbreaks(0(10)100)
。 我们还修改了图例的位置,大小和标记分隔符legstyle(2)
并添加了标题和注释。
This already looks neater but we can some more useful additional information and tone-down the colors a bit.
这看起来已经比较整洁了,但是我们可以提供一些更有用的附加信息,并稍微降低颜色的色调。
In the last step, we can add a custom color using the colorpalette
package (discussed at length in Guide 2 and Guide 3), and the date when the map was created:
在最后一步中,我们可以使用colorpalette
包(在Guide 2和Guide 3中详细讨论)以及创建地图的日期添加自定义颜色:
*** get the color scheme packages if not installed
ssc install palettes, replace // for color palettes
ssc install colrspace, replace // for expanding the color base
*** generate a local for today's date
local date: display %tdd_m_yy date(c(current_date), "DMY")
*** generate a local for the w3 color scheme
colorpalette w3 amber, n(10) nograph
local colors `r(p)'
summ date spmap Index using world_shp if date==`r(max)', ///
id(_ID) ocolor(black ..) osize(vvthin ..) ///
clmethod(custom) clbreaks(0(10)100) fcolor("`colors'") ///
legend(pos(7) size(*0.8)) legstyle(2) ///
title("COVID-19 Policy Stringency Index (`date')", size(small)) ///
note("Data source: Oxford COVID-19 Government Response Tracker. Mercator projection used. Antartica dropped from maps." , size(tiny))graph export ../graphs/map5.png, replace wid(1000)
From the code above, we get the following final map:
从上面的代码中,我们得到以下最终地图:
The three locals used in the code above are highlighted in bold above. These locals help us with the automation process of the code. Automation in codes is essential to expedite tasks and avoid human errors. These points are discussed at length in Guide 1 and Guide 2 and will be used throughout the various Stata guides. Here we use the w3 CSS amber color scheme pre-defined in colorpalette
and colrspace
packages which uses very soft gradients.
上面的代码中使用的三个本地语言以粗体显示。 这些本地人可以帮助我们实现代码的自动化过程。 代码自动化对于加快任务执行速度和避免人为错误至关重要。 这些要点在指南1和指南2中进行了详细讨论,并将在各个Stata指南中使用。 在这里,我们使用在colorpalette
和colrspace
软件包中预定义的w3 CSS琥珀色方案,该方案使用非常柔和的渐变。
行使 (Exercise)
Try making a map using different indicators and other color schemes available in the colorpalette guide.
尝试使用调色板指南中提供的其他指示器和其他配色方案制作地图。
其他指南 (Other guides)
Part 1: An introduction to data setup and customized graphs
第1部分:数据设置和自定义图的介绍
Part 2: Customizing colors schemes
第2部分:自定义配色方案
Part 3: Heat plots
第3部分:热图
If you enjoy these guides and find them useful, then please like and follow the Medium Stata blog: The Stata Guide
如果您喜欢这些指南并发现它们很有用,请喜欢并关注Medium Stata博客: The Stata Guide
关于作者 (About the author)
I am an economist by profession and I have been using Stata for almost 18 years. I have worked and lived across three different continents and I am currently based in Vienna, Austria. You can find my research work on ResearchGate. You can follow my COVID-19 related Stata visualizations on my Twitter. I am also featured on the COVID-19 Stata webpage in the visualization and graphics section. You can find my code repository on GitHub.
我是一名经济学家,并且已经使用Stata近18年了。 我曾在三个不同的大陆工作和生活,目前居住在奥地利维也纳。 您可以在ResearchGate上找到我的研究工作。 您可以在Twitter上关注与COVID-19相关的Stata可视化。 我还出现在COVID-19 Stata网页的“可视化和图形”部分中。 您可以在GitHub上找到我的代码存储库。
I can be reached via Medium, Twitter, LinkedIn or email: [email protected].
可以通过Medium , Twitter , LinkedIn或电子邮件与我们联系:[email protected]。
My Medium blog for Stata-related posts can be found here: The Stata Guide
我的Stata相关文章的中型博客可以在这里找到: The Stata Guide
翻译自: https://medium.com/the-stata-guide/covid-19-visualizations-with-stata-part-4-maps-fbd4fe2642f6
stata 截取高德地图