使用 Stata 分析全国人大代表数据

最近各大新闻媒体都在铺天盖地报道有关两会的新闻，我们也来蹭一下热度，虽然找不到今年人大代表的相关数据，但是我们可以从这个网站： https://news.cgtn.com/event/2019/whorunschina/index.html 获取 2019 年的两会人大代表数据。数据我们已经爬好放在本文的附件中了。

下面我们使用 Stata 分析和绘图展示下这份数据。

读取

clear all
import delimited using "NPC.csv", clear encoding(utf8)

概览数据：

des

*> Contains data
*>   obs:         2,975                          
*>  vars:            25                          
*>  size:     1,383,375                          
*> -----------------------------------------------------------------------------------
*>               storage   display    value
*> variable name   type    format     label      variable label
*> -----------------------------------------------------------------------------------
*> delegation      str59   %59s                  Delegation
*> partisan        str53   %53s                  Partisan
*> 党派            str21   %21s                  
*> name            str41   %41s                  Name
*> 姓名            str43   %43s                  
*> gender          str6    %9s                   Gender
*> 性别            str3    %9s                   
*> birthyear       int     %8.0g                 Birth year
*> age             byte    %8.0g                 Age
*> generation      str5    %9s                   Generation
*> 年代            str5    %9s                   
*> ethnicity       str9    %9s                   Ethnicity
*> 民族            str15   %15s                  
*> birthplace      str14   %14s                  Birthplace
*> 籍贯            str9    %9s                   
*> region          str44   %44s                  Region
*> 区域            str9    %9s                   
*> subjectdepart~t str30   %30s                  Subject Department
*> 专业分类        str12   %12s                  
*> major           str18   %18s                  Major
*> 人文社科拆后~业 str9    %9s                   
*> educationalba~d str29   %29s                  Educational background
*> 学历            str15   %15s                  
*> everstudiedab~d str7    %9s                   Ever studied abroad
*> 海外留学经验    str6    %9s                   
*> -----------------------------------------------------------------------------------
*> Sorted by: 
*>      Note: Dataset has changed since last saved.

每个代表团的总人数

使用 contract 命令可以快速汇总每个代表团的总人数：

contract delegation
list in 1/5

*>     +-------------------+
*>     | delegat~n   _freq |
*>     |-------------------|
*>  1. |     Anhui     111 |
*>  2. |   Beijing      54 |
*>  3. | Chongqing      60 |
*>  4. |    Fujian      69 |
*>  5. |     Gansu      52 |
*>     +-------------------+

我们可以绘制一幅条形统计图展示人数最多的九个代表团：

全国人民代表大会人数最多的九个代表团

绘图代码为：

gsort -_freq
keep in 1/9
/* 翻译一下 */
replace delegation = `""中国人民" "解放军" "武警部队""' if delegation == "The Chinese People's Liberation Army and Armed Police Force"
replace delegation = "山东" if delegation == "Shandong"
replace delegation = "河南" if delegation == "Henan"
replace delegation = "广东" if delegation == "Guangdong"
replace delegation = "江苏" if delegation == "Jiangsu"
replace delegation = "四川" if delegation == "Sichuan"
replace delegation = "河北" if delegation == "Hebei"
replace delegation = "湖北" if delegation == "Hubei"
replace delegation = "湖南" if delegation == "Hunan"
/* 把 delegation 变量变成带标签的数值变量： */
encode delegation, gen(delegation2)
tw bar _freq delegation2, barw(0.8) ///
    xla(1(1)9, val) yla(100(50)300) ///
    xti(" " "代表团") yti("人数") ///
    ti("全国人民代表大会人数最多的九个代表团") ///
    subti("这九个代表团的总人数为 1454") ///
    note("数据来源：Who runs China?" "", pos(5))

如果我们想让柱条从高到低排列，我们在生成 delegation2 变量的时候可以使用 sencode 命令：

* 安装 sencode
* ssc install sencode
sencode delegation, gen(delegation3) gsort(-_freq)
tw bar _freq delegation3, barw(0.8) ///
    xla(1(1)9, val) yla(100(50)300) ///
    xti(" " "代表团") yti("人数") ///
    ti("全国人民代表大会人数最多的九个代表团") ///
    subti("这九个代表团的总人数为 1454") ///
    note("数据来源：Who runs China?" "", pos(5))

全国人民代表大会人数最多的九个代表团

如果我想给每个柱条上不同的颜色该如何操作呢？方法就是一个个柱条的画：

给每个柱条上不同的颜色

tw ///
bar _freq delegation2 if delegation2 == 9, ///
    fc("192 55 40") lc("192 55 40") barw(0.8) || ///
bar _freq delegation2 if delegation2 == 8, ///
    fc("145 156 76") lc("145 156 76") barw(0.8) || ///
bar _freq delegation2 if delegation2 == 7, ///
    fc("253 143 36") lc("253 143 36") barw(0.8) || ///
bar _freq delegation2 if delegation2 == 6, ///
    fc("245 192 74") lc("245 192 74") barw(0.8) || ///
bar _freq delegation2 if delegation2 == 5, ///
    fc("230 140 124") lc("230 140 124") barw(0.8) || ///
bar _freq delegation2 if delegation2 == 4, ///
    fc("130 133 133") lc("130 133 133") barw(0.8) || ///
bar _freq delegation2 if delegation2 == 3, ///
    fc("195 195 119") lc("195 195 119") barw(0.8) || ///
bar _freq delegation2 if delegation2 == 2, ///
    fc("79 81 87") lc("79 81 87") barw(0.8) || ///
bar _freq delegation2 if delegation2 == 1, ///
    fc("111 84 56") lc("111 84 56") barw(0.8) ||, ///
xla(1(1)9, val nogrid) yla(100(50)300, nogrid) ///
    xti(" " "代表团") yti("人数") ///
    ti("全国人民代表大会人数最多的九个代表团") ///
    subti("这九个代表团的总人数为 1454") ///
    note("数据来源：Who runs China?" "", pos(5)) ///
    plotr(fc("249 248 229") lc("249 248 229")) graphr(fc("249 248 229")) leg(off)

性别、年龄分布

性别年龄分布我们可以一起来看：

性别、年龄分布

import delimited using "NPC.csv", clear encoding(utf8)
contract 性别 年代
encode 性别, gen(gender)
encode 年代, gen(age)
gr bar _freq, over(gender) over(age, axis(noline)) stack asy ///
    legend(on pos(11) ring(0) rows(2) ///
        ti(性别, size(*0.6) pos(11)) ///
        region(lc("249 248 229") fc("249 248 229"))) ///
    ti("全国人大代表的年龄分布") ///
    subti("平均年龄为 53.77 岁") ///
    yti("人数") yla(, nogrid) ///
    note("数据来源：Who runs China?" ///
         "", pos(5)) ///
    plotr(fc("249 248 229") lc("249 248 229")) graphr(fc("249 248 229")) ///
    bar(1, color(190 57 47)) ///
    bar(2, color(139 149 78))

可以看出，60 后是 NPC 的核心，90 后中女性的数量多于男性。人大代表们的平均年龄是 53.77 岁。其中，1672 名代表出生于 20 世纪 60 年代，占总数的一半以上。另外我们还可以看到，代表们越年轻，性别比例越均衡。

是不是觉得女性的比例很少？实际上近几届人大会议上女代表的比例正在稳步上升：

女性比例

clear all
input str10 year    woman
            "10th"  0.202
            "11th"  0.213
            "12th"  0.234
            "13th"  0.249
end
gen man = 1 - woman
gr pie man woman, by(year, leg(off) rows(1) ///
        plotr(fc(249 248 229) lc(249 248 229)) graphr(fc(249 248 229)) ///
        ti("历届全国人大女性代表的比例变化") ///
        subti("第 13 届全国人大女性代表的比例为 24.9%，比第 11 届提高了 4.7 个百分点" ///
            " " " ") ///
        note("数据来源：Who runs China?" ///
         "", pos(5))) ///
    plabel(1 "男性") plabel(2 "女性") noclockwise ///
    pie(2, explode color(139 149 78)) pie(1, color(190 57 47)) ///
    plotr(fc(249 248 229) lc(249 248 229)) graphr(fc(249 248 229)) ///
    subti(, fc(249 248 229) lc(249 248 229) size(5))
gr_edit .plotregion1.subtitle[1].style.editstyle horizontal(center) editcopy

民族分布

中国是个有着 56 个民族的多民族国家，55个少数民族 + 汉族，那么人大代表中有多少个民族呢？

民族分布

其中汉族人 2538 人，占总数的 85%。

import delimited using "NPC.csv", clear encoding(utf8)
contract 民族
gsort -_freq
gr pie _freq, over(民族) sort des ///
    ti("全国人大代表的民族分布") ///
    subti("我国是一个多民族国家" ///
        " " " ") ///
    note("数据来源：Who runs China?" ///
         "", pos(5)) ///
    plotr(fc(249 248 229) lc(249 248 229)) graphr(fc(249 248 229) margin(medium)) clockwise ///
    plabel(1 "汉族: 85.31%", size(*1.6)) plabel(2 "回族", size(*1)) ///
    plabel(3 "壮族", size(*1)) ///
    leg(off)

学历分布

不用想就知道人大代表们的学历应该都挺高的：

学历分布

import delimited using "NPC.csv", clear encoding(utf8)
contract 性别 学历
gr bar (sum) _freq, over(性别) over(学历, axis(noline)) stack asy percentages ///
    plotr(fc(249 248 229) lc(249 248 229)) graphr(fc(249 248 229)) ///
    ti("全国人大代表的学历分布") ///
    subti("每 10 名人大代表中就有 9 名持有学士及以上的学位。88.5％的代表拥" ///
        "有学士学位或以上学历。拥有硕士学位的人占最大比例（836 人），" ///
        "博士学位排名第二（584 人）。") ///
    note(" " "数据来源：Who runs China?" ///
         "", pos(5)) ///
    yti("比例") yla(, nogrid) ///
    bar(1, color(190 57 47)) ///
    bar(2, color(139 149 78)) ///
    legend(on pos(3) ring(1) rows(2) ///
        ti(性别, size(*0.6) pos(12)) ///
        region(lc("249 248 229") fc("249 248 229")))

学科分布

全国人大代表的专业背景怎么样？根据中国教育部的专业分类，管理科学，哲学，文学，历史，教育，艺术，经济，法律和军事科学属于人文社会科学 ; 而科学，工程，农业和医学是自然科学。

学科分布

import delimited using "NPC.csv", clear encoding(utf8)
contract 人文社科拆后专业
gsort -_freq
gr pie _freq, over(人文社科拆后专业)  sort des ///
    plabel(1 "未知：40.27%") plabel(2 "管理学：15.09%") ///
    plabel(3 "工学：9.95%") plabel(4 "经济学：8.605%") ///
    leg(on pos(3) ring(0) cols(1) region(lc(249 248 229) fc(249 248 229))) ///
    ti("全国人大代表的专业背景怎么样？") ///
    subti("根据中国教育部的专业分类，管理科学，哲学，文学，历史，教育，" ///
        "艺术，经济，法律和军事科学属于人文社会科学 ; 而科学，工程，农" ///
        "业和医学是自然科学。") ///
    note(" " "数据来源：Who runs China?" ///
         "", pos(5)) ///
    plotr(fc(249 248 229) lc(249 248 229)) graphr(fc(249 248 229)) ///
    graphr(margin(medium))

代表们的籍贯分布

来自山东的人大代表最多，为 320 人：

代表们的籍贯分布

import delimited using "NPC.csv", clear encoding(utf8)
contract 籍贯
gen prov = substr(籍贯, 1, 6)
save 籍贯, replace 

use china_prov_db.dta, clear
gen prov = substr(省, 1, 6)
merge 1:1 prov using 籍贯
drop if missing(ID)
replace _freq = 0 if missing(_freq)

spmap _freq using china_prov_coord.dta, id(ID) ///
    plotr(fc(249 248 229) lc(249 248 229)) graphr(fc(249 248 229) margin(medium)) ///
    clmethod(custom) clbreaks(0 25 75 150 320) ///
    fcolor("224 242 241" "178 223 219" "128 203 196" "77 182 172") ///
    line(data(china_city_line_coord.dta) size(*0.5 ...) color(black) select(keep if _ID <= 6)) ///
    polygon(data(polygon) by(value) fcolor(black) ///
            osize(vvthin)) ///
    label(data(china_prov_label.dta) x(X) y(Y) label(name) size(*0.8)) ///
    ocolor("gray" ...) osize(vthin ...) ///
    ti("全国人大代表都来自哪里？") ///
    subti("来自山东的人大代表最多，为 320 人") ///
    note(" " "数据来源：Who runs China?" ///
         "", pos(5) size(*0.8)) ///
    leg(order(2 "0～25人" 3 "25～75人" 4 "75～150人" 5 ">150人"))

使用 Stata 分析全国人大代表数据

读取

每个代表团的总人数

性别、年龄分布

民族分布

学历分布

学科分布

代表们的籍贯分布

你可能感兴趣的:(使用 Stata 分析全国人大代表数据)