我信张i

[Discuz 插件] SEO天涯海角 3.1.0 正式版.rar

discuz商业插件 SEO天涯海角 3.2.0高级版 DZ论坛插件价值299元

什么收录、蜘蛛、收否一个插件搞定。

你还在为网站没有收录、流量而发愁吗？每日会员从十大搜索引擎的结果中访问你的网站，网站收录、权重、快照都不是梦！，还能方便会员查找相关信息，会员站长互利共赢。

请不要相信夸大其词的各类SEO插件，做好原创内容的更新比你把应用中心所有SEO插件都安装了还有用，内容的更新才是实打实的，插件起到辅助作用，没有原创内容也不建议只用关键词替换这种伪原创方式，因为各大搜索引擎都是可以很容易识别的，时间久了有很大的被K可能，你可以自己编辑整理下再发布，不要让百度觉得你的网站是垃圾采集站，所有帖子都只是关键字替换了，别当百度是傻子。

1.在帖子页面天涯海角搜索引擎便捷搜索功能

2.提供三种搜索方式：帖子标题、标签关键字、自定义关键字

3.提供三处位置显示，对应三种搜索方式，各个位置的搜索方式均可独立设置

4.支持十种搜索引擎，包括最近刚出来的360综合搜索，10%的搜索份额你不要？

5.背景颜色随模板自适应，不同模板不同效果，后台也可以设置调节

文件下载：n459.com/file/25127180-476821493

以下内容无关：

-------------------------------------------分割线-------------------------------------------

基于XGBoost模型的幸福度预测——阿里天池学习赛
加载数据#
加载的是完整版的数据 happiness_train_complete.csv 。

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set_style(‘whitegrid’)

将 id 列作为 DataFrame 的 index 并且指定 survey_time 为时间序列

data_origin = pd.read_csv(’./data/happiness_train_complete.csv’, index_col=‘id’, parse_dates=[‘survey_time’], encoding=‘gbk’)
数据集基本信息的探索#
下面简单输出前5行查看。

data_origin.head()
happiness survey_type province city county survey_time gender birth nationality religion … neighbor_familiarity public_service_1 public_service_2 public_service_3 public_service_4 public_service_5 public_service_6 public_service_7 public_service_8 public_service_9
id
1 4 1 12 32 59 2015-08-04 14:18:00 1 1959 1 1 … 4 50 60 50 50 30.0 30 50 50 50
2 4 2 18 52 85 2015-07-21 15:04:00 1 1992 1 1 … 3 90 70 70 80 85.0 70 90 60 60
3 4 2 29 83 126 2015-07-21 13:24:00 2 1967 1 0 … 4 90 80 75 79 80.0 90 90 90 75
4 5 2 10 28 51 2015-07-25 17:33:00 2 1943 1 1 … 3 100 90 70 80 80.0 90 90 80 80
5 4 1 7 18 36 2015-08-10 09:50:00 2 1994 1 1 … 2 50 50 50 50 50.0 50 50 50 50
5 rows × 139 columns

查看数据的详细信息，共8000条记录，139个特征。

第二列为特证名、第三列为非空记录个数、第四列为特征的数据格式。

data_origin.info(verbose=True, null_counts=True)

Int64Index: 8000 entries, 1 to 8000
Data columns (total 139 columns):

Column Non-Null Count Dtype

0 happiness 8000 non-null int64
1 survey_type 8000 non-null int64
2 province 8000 non-null int64
3 city 8000 non-null int64
4 county 8000 non-null int64
5 survey_time 8000 non-null datetime64[ns]
6 gender 8000 non-null int64
7 birth 8000 non-null int64
8 nationality 8000 non-null int64
9 religion 8000 non-null int64
10 religion_freq 8000 non-null int64
11 edu 8000 non-null int64
12 edu_other 3 non-null object
13 edu_status 6880 non-null float64
14 edu_yr 6028 non-null float64
15 income 8000 non-null int64
16 political 8000 non-null int64
17 join_party 824 non-null float64
18 floor_area 8000 non-null float64
19 property_0 8000 non-null int64
20 property_1 8000 non-null int64
21 property_2 8000 non-null int64
22 property_3 8000 non-null int64
23 property_4 8000 non-null int64
24 property_5 8000 non-null int64
25 property_6 8000 non-null int64
26 property_7 8000 non-null int64
27 property_8 8000 non-null int64
28 property_other 66 non-null object
29 height_cm 8000 non-null int64
30 weight_jin 8000 non-null int64
31 health 8000 non-null int64
32 health_problem 8000 non-null int64
33 depression 8000 non-null int64
34 hukou 8000 non-null int64
35 hukou_loc 7996 non-null float64
36 media_1 8000 non-null int64
37 media_2 8000 non-null int64
38 media_3 8000 non-null int64
39 media_4 8000 non-null int64
40 media_5 8000 non-null int64
41 media_6 8000 non-null int64
42 leisure_1 8000 non-null int64
43 leisure_2 8000 non-null int64
44 leisure_3 8000 non-null int64
45 leisure_4 8000 non-null int64
46 leisure_5 8000 non-null int64
47 leisure_6 8000 non-null int64
48 leisure_7 8000 non-null int64
49 leisure_8 8000 non-null int64
50 leisure_9 8000 non-null int64
51 leisure_10 8000 non-null int64
52 leisure_11 8000 non-null int64
53 leisure_12 8000 non-null int64
54 socialize 8000 non-null int64
55 relax 8000 non-null int64
56 learn 8000 non-null int64
57 social_neighbor 7204 non-null float64
58 social_friend 7204 non-null float64
59 socia_outing 8000 non-null int64
60 equity 8000 non-null int64
61 class 8000 non-null int64
62 class_10_before 8000 non-null int64
63 class_10_after 8000 non-null int64
64 class_14 8000 non-null int64
65 work_exper 8000 non-null int64
66 work_status 2951 non-null float64
67 work_yr 2951 non-null float64
68 work_type 2951 non-null float64
69 work_manage 2951 non-null float64
70 insur_1 8000 non-null int64
71 insur_2 8000 non-null int64
72 insur_3 8000 non-null int64
73 insur_4 8000 non-null int64
74 family_income 7999 non-null float64
75 family_m 8000 non-null int64
76 family_status 8000 non-null int64
77 house 8000 non-null int64
78 car 8000 non-null int64
79 invest_0 8000 non-null int64
80 invest_1 8000 non-null int64
81 invest_2 8000 non-null int64
82 invest_3 8000 non-null int64
83 invest_4 8000 non-null int64
84 invest_5 8000 non-null int64
85 invest_6 8000 non-null int64
86 invest_7 8000 non-null int64
87 invest_8 8000 non-null int64
88 invest_other 29 non-null object
89 son 8000 non-null int64
90 daughter 8000 non-null int64
91 minor_child 6934 non-null float64
92 marital 8000 non-null int64
93 marital_1st 7172 non-null float64
94 s_birth 6282 non-null float64
95 marital_now 6230 non-null float64
96 s_edu 6282 non-null float64
97 s_political 6282 non-null float64
98 s_hukou 6282 non-null float64
99 s_income 6282 non-null float64
100 s_work_exper 6282 non-null float64
101 s_work_status 2565 non-null float64
102 s_work_type 2565 non-null float64
103 f_birth 8000 non-null int64
104 f_edu 8000 non-null int64
105 f_political 8000 non-null int64
106 f_work_14 8000 non-null int64
107 m_birth 8000 non-null int64
108 m_edu 8000 non-null int64
109 m_political 8000 non-null int64
110 m_work_14 8000 non-null int64
111 status_peer 8000 non-null int64
112 status_3_before 8000 non-null int64
113 view 8000 non-null int64
114 inc_ability 8000 non-null int64
115 inc_exp 8000 non-null float64
116 trust_1 8000 non-null int64
117 trust_2 8000 non-null int64
118 trust_3 8000 non-null int64
119 trust_4 8000 non-null int64
120 trust_5 8000 non-null int64
121 trust_6 8000 non-null int64
122 trust_7 8000 non-null int64
123 trust_8 8000 non-null int64
124 trust_9 8000 non-null int64
125 trust_10 8000 non-null int64
126 trust_11 8000 non-null int64
127 trust_12 8000 non-null int64
128 trust_13 8000 non-null int64
129 neighbor_familiarity 8000 non-null int64
130 public_service_1 8000 non-null int64
131 public_service_2 8000 non-null int64
132 public_service_3 8000 non-null int64
133 public_service_4 8000 non-null int64
134 public_service_5 8000 non-null float64
135 public_service_6 8000 non-null int64
136 public_service_7 8000 non-null int64
137 public_service_8 8000 non-null int64
138 public_service_9 8000 non-null int64
dtypes: datetime64ns, float64(25), int64(110), object(3)
memory usage: 8.5+ MB
查看数据总体统计量。

data_origin.describe()
happiness survey_type province city county gender birth nationality religion religion_freq … neighbor_familiarity public_service_1 public_service_2 public_service_3 public_service_4 public_service_5 public_service_6 public_service_7 public_service_8 public_service_9
count 8000.000000 8000.000000 8000.000000 8000.000000 8000.000000 8000.00000 8000.000000 8000.00000 8000.000000 8000.000000 … 8000.000000 8000.000000 8000.000000 8000.000000 8000.000000 8000.000000 8000.000000 8000.00000 8000.000000 8000.000000
mean 3.850125 1.405500 15.155375 42.564750 70.619000 1.53000 1964.707625 1.37350 0.772250 1.427250 … 3.722250 70.809500 68.170000 62.737625 66.320125 62.794187 67.064000 66.09625 65.626750 67.153750
std 0.938228 0.491019 8.917100 27.187404 38.747503 0.49913 16.842865 1.52882 1.071459 1.408441 … 1.143358 21.184742 20.549943 24.771319 22.049437 23.463162 21.586817 23.08568 23.827493 22.502203
min -8.000000 1.000000 1.000000 1.000000 1.000000 1.00000 1921.000000 -8.00000 -8.000000 -8.000000 … -8.000000 -3.000000 -3.000000 -3.000000 -3.000000 -3.000000 -3.000000 -3.00000 -3.000000 -3.000000
25% 4.000000 1.000000 7.000000 18.000000 37.000000 1.00000 1952.000000 1.00000 1.000000 1.000000 … 3.000000 60.000000 60.000000 50.000000 60.000000 55.000000 60.000000 60.00000 60.000000 60.000000
50% 4.000000 1.000000 15.000000 42.000000 73.000000 2.00000 1965.000000 1.00000 1.000000 1.000000 … 4.000000 79.000000 70.000000 70.000000 70.000000 70.000000 70.000000 70.00000 70.000000 70.000000
75% 4.000000 2.000000 22.000000 65.000000 104.000000 2.00000 1977.000000 1.00000 1.000000 1.000000 … 5.000000 80.000000 80.000000 80.000000 80.000000 80.000000 80.000000 80.00000 80.000000 80.000000
max 5.000000 2.000000 31.000000 89.000000 134.000000 2.00000 1997.000000 8.00000 1.000000 9.000000 … 5.000000 100.000000 100.000000 100.000000 100.000000 100.000000 100.000000 100.00000 100.000000 100.000000
8 rows × 135 columns

数据预处理#
缺失值处理#
查看子特征的缺失情况，其中

required_list 表示特征中的必填项
continuous_list 表示特征属性为连续型变量
categorical_list 表示分类型变量
其余特征均为等级（ordinal）型的分类变量。

required_list = [‘survey_type’, ‘province’, ‘city’, ‘county’, ‘survey_time’, ‘gender’, ‘birth’, ‘nationality’, ‘religion’,
‘religion_freq’, ‘edu’, ‘income’, ‘political’, ‘floor_area’, ‘height_cm’, ‘weight_jin’, ‘health’, ‘health_problem’,
‘depression’, ‘hukou’, ‘socialize’, ‘relax’, ‘learn’, ‘equity’, ‘class’, ‘work_exper’, ‘work_status’, ‘work_yr’, ‘work_type’,
‘work_manage’, ‘family_income’, ‘family_m’, ‘family_status’, ‘house’, ‘car’, ‘marital’, ‘status_peer’, ‘status_3_before’,
‘view’, ‘inc_ability’]
continuous_list = [‘birth’, ‘edu_yr’, ‘income’, ‘floor_area’, ‘height_cm’, ‘weight_jin’, ‘work_yr’, ‘family_income’, ‘family_m’, ‘house’, ‘son’,
‘daughter’, ‘minor_child’, ‘marital_1st’, ‘s_birth’, ‘marital_now’, ‘s_income’, ‘f_birth’, ‘m_birth’, ‘inc_exp’,
‘public_service_1’, ‘public_service_2’, ‘public_service_3’, ‘public_service_4’, ‘public_service_5’, ‘public_service_6’,
‘public_service_7’, ‘public_service_8’, ‘public_service_9’]
categorical_list = [‘survey_type’, ‘province’, ‘gender’, ‘nationality’]
必填项的缺失值分析#
查看必填项中缺失值的情况。

data_origin[required_list].isna().sum()[data_origin[required_list].isna().sum() > 0].to_frame().T
work_status work_yr work_type work_manage family_income
0 5049 5049 5049 5049 1
其中

work_status 表示目前工作的状况
work_yr 表示一共工作了多少年
work_type 表示目前工作的性质
work_manage 表示目前工作的管理活动情况
family_income 表示去年全年家庭总收入
首先分析 work_ 开头的四项特征的缺失情况，它们的缺失计数一样，可能说明调查问卷的填写方式，可能被跳过了。

首先检查调查问卷，找到对应的问卷问题，发现在 work_exper 特征中，即工作经历及状况，根据不同的工作经历，将上面四个问题跳过。

查看 work_exper 对应的问卷。

图片

可以发现 work_exper 除了 1 分类，其它问题均被跳问；所以将上面四列的缺失记录的 work_exper 输出，查看是否都为非 1 类的记录。

通过下面的输出可以看到，在上面四项特征为缺失值的情况下，其记录对应的 work_exper 的取值大部分不为 1 。

data_origin.loc[data_origin[required_list].isna().sum(axis=1)[data_origin[required_list].isna().sum(axis=1) > 0].index, ‘work_exper’].to_frame().plot.hist()
pd.value_counts(data_origin.loc[data_origin[required_list].isna().sum(axis=1)[data_origin[required_list].isna().sum(axis=1) > 0].index, ‘work_exper’])
5 1968
3 1242
4 1065
2 387
6 380
1 7
Name: work_exper, dtype: int64
output_15_1

进一步查看取值为 1 的记录。

(data_origin[data_origin[required_list].isna().sum(axis=1) > 0])[(data_origin[data_origin[required_list].isna().sum(axis=1) > 0].work_exper == 1)]
happiness survey_type province city county survey_time gender birth nationality religion … neighbor_familiarity public_service_1 public_service_2 public_service_3 public_service_4 public_service_5 public_service_6 public_service_7 public_service_8 public_service_9
id
692 4 2 21 64 101 2015-07-20 11:12:00 2 1975 1 1 … 5 80 70 80 80 80.0 80 80 80 80
841 4 2 31 88 133 2015-08-17 13:49:00 2 1971 1 0 … 4 50 30 -2 -2 -2.0 50 50 50 70
1411 4 2 2 2 9 2015-07-23 09:25:00 1 1967 8 1 … 4 90 85 80 90 90.0 92 93 94 90
3117 4 1 4 7 18 2015-10-03 16:02:00 1 1980 1 1 … 2 30 35 30 40 60.0 40 30 70 70
4783 5 2 22 65 103 2015-07-08 18:45:00 1 1955 1 1 … 5 90 90 90 90 80.0 90 80 90 90
5589 5 2 16 46 78 2015-07-29 11:34:00 2 1964 1 1 … 3 89 63 67 75 74.0 67 65 78 79
7368 4 2 21 64 101 2015-07-19 08:32:00 2 1963 1 1 … 5 70 70 70 60 70.0 70 60 60 60
7 rows × 139 columns

可以发现 work_exper 为 1 的记录存在7条，故将此删除。

data_origin.drop((data_origin[data_origin[required_list].isna().sum(axis=1) > 0])[(data_origin[data_origin[required_list].isna().sum(axis=1) > 0].work_exper == 1)].index, inplace=True)
因为 family_income 缺失个数只有1条，不影响数据规模，所以直接将其删除。

data_origin.drop(data_origin[‘family_income’].isna()[data_origin[‘family_income’].isna()].index, inplace=True)
连续型特征缺失值分析#
查看连续型特征的却失情况。

data_origin[continuous_list].isna().sum()[data_origin[continuous_list].isna().sum() > 0].to_frame().T
edu_yr work_yr minor_child marital_1st s_birth marital_now s_income
0 1970 5041 1066 828 1718 1770 1718
其中

edu_yr 表示已经完成的最高学历是哪一年获得的
work_yr 表示第一份非农工作到目前的工作一共工作了多少年
minor_child 表示有几个18周岁以下未成年子女
marital_1st 表示第一次结婚的时间
s_birth 表示目前的配偶或同居伴侣是哪一年出生的
martital_now 表示与目前的配偶是哪一年结婚的
s_income 表示配偶或同居伴侣去年全年的总收入
对于 edu_yr 即已经完成的最高学历是哪一年获得的，查看缺失记录的 edu_status 取值分布情况。

data_origin[data_origin[‘edu_yr’].isna()][‘edu_status’].plot.hist()
pd.value_counts(data_origin[data_origin[‘edu_yr’].isna()][‘edu_status’])
2.0 746
3.0 103
4.0 1
1.0 1
Name: edu_status, dtype: int64
output_26_1

查看 edu_yr 缺失的记录的 edu_status 特征后，只有选项 4 即毕业的记录才应该填写 edu_yr 的毕业年份，所以应该删除记录。

data_origin.drop(data_origin[(data_origin[‘edu_status’] == 4) & (data_origin[‘edu_yr’].isna())].index, inplace=True)
data_origin.shape
(7991, 139)
对于 minor_child 特征，可以检查这个特征缺失的记录另外两项特征 son 和 daughter 分别表示儿子、女儿的数量，如果为0，则将 minor_child 也填充为0。

print(data_origin[np.array(data_origin[‘minor_child’].isna())].loc[:, ‘son’].sum())
print(data_origin[np.array(data_origin[‘minor_child’].isna())].loc[:, ‘daughter’].sum())
data_origin[np.array(data_origin[‘minor_child’].isna())].loc[:, ‘son’:‘daughter’]
0
0
son daughter
id
2 0 0
5 0 0
9 0 0
29 0 0
31 0 0
… … …
7967 0 0
7972 0 0
7991 0 0
7999 0 0
8000 0 0
1066 rows × 2 columns

可以看对 minor_child 缺失的记录，其儿子和女儿的个数也为0，所以将 minor_child 缺失值填充为0。

data_origin[‘minor_child’].fillna(0, inplace=True)
对于 marital_1st 的记录的缺失情况，可以查看对应的记录的 marital 的取值是否为 1 表示未婚。

print(data_origin[np.array(data_origin[‘marital_1st’].isna())][‘marital’].sum() == data_origin[np.array(data_origin[‘marital_1st’].isna())][‘marital’].shape[0])
data_origin[np.array(data_origin[‘marital_1st’].isna())][‘marital’].plot.hist()
pd.value_counts(data_origin[np.array(data_origin[‘marital_1st’].isna())][‘marital’])
True

1 828
Name: marital, dtype: int64
output_35_2

可以看到输出结果表明对于 marital_1st 缺失的记录都是未婚人士，所以缺失值正常。

下面查看 s_birth 即目前的配偶或同居伴侣是哪一年出生的的缺失情况，首先查看缺失的记录的 marital 状态，查看是否满足无配偶或同居伴侣的情况。

data_origin[data_origin[‘s_birth’].isna()][‘marital’].plot.hist()
pd.value_counts(data_origin[data_origin[‘s_birth’].isna()][‘marital’])
1 828
7 718
6 171
2 1
Name: marital, dtype: int64
output_37_1

根据输出可以看到，marital 取值为 1 、6、7 分别表示未婚、离婚和丧偶，所以 s_birth 缺失属于正常；而且取值为 2 表示同居的缺失记录只有一条，所以直接将其删除即可。

data_origin.drop(data_origin[data_origin[‘s_birth’].isna()][‘marital’][data_origin[data_origin[‘s_birth’].isna()][‘marital’] == 2].index, inplace=True)
对于 marital_now 即与目前的配偶是哪一年结婚的，首先输出 marital 查看婚姻的状态，是否满足没结婚的条件。

data_origin[data_origin[‘marital_now’].isna()][‘marital’].plot.hist()
pd.value_counts(data_origin[data_origin[‘marital_now’].isna()][‘marital’])
1 828
7 718
6 171
2 51
3 1
Name: marital, dtype: int64
output_41_1

根据输出可以得到 1 和 2 表示没有结婚的情况，所以缺失属于正常；

对于 3、6、7 分别表示初婚有配偶、离婚、丧偶；只有 3 属于目前有配偶并结婚的情况，所以应该删除。

data_origin.drop(data_origin[data_origin[‘marital_now’].isna()].loc[data_origin[data_origin[‘marital_now’].isna()][‘marital’] == 3].index, inplace=True)
data_origin.shape
(7989, 139)
对于 s_income 即配偶或同居伴侣去年全年的总收入的缺失情况，可以检查对于 marital 查看其是否满足无配偶或伴侣的条件。

data_origin[data_origin[‘s_income’].isna()][‘marital’].plot.hist()
pd.value_counts(data_origin[data_origin[‘s_income’].isna()][‘marital’])
1 828
7 718
6 171
Name: marital, dtype: int64
output_46_1

可以看到对于 s_income 的缺失值，其记录对应的婚姻状态都为未婚、离婚或丧偶，所以 s_income 缺失是正常的。

分类变量缺失值分析#
查看分类型（categorical）变量的缺失情况，全部为0，则没有缺失值。

data_origin[categorical_list].isna().sum().to_frame().T
survey_type province gender nationality
0 0 0 0 0
所有特征缺失值分析#
查看所有特征的缺失情况。

data_origin.isna().sum()[data_origin.isna().sum() > 0].to_frame().T
edu_other edu_status edu_yr join_party property_other hukou_loc social_neighbor social_friend work_status work_yr … marital_1st s_birth marital_now s_edu s_political s_hukou s_income s_work_exper s_work_status s_work_type
0 7986 1119 1969 7167 7923 4 795 795 5038 5038 … 828 1717 1768 1717 1717 1717 1717 1717 5427 5427
1 rows × 23 columns

首先对于 edu_other 特征，只有在 edu 填写了 14 的情况下才填写，首先检查 edu_other 缺失的记录的 edu 是否为 14 若为 14 则说明 edu_other 不应该为缺失，应该将其删除。

data_origin[data_origin[‘edu_other’].isna()][data_origin[data_origin[‘edu_other’].isna()][‘edu’] == 14]
happiness survey_type province city county survey_time gender birth nationality religion … neighbor_familiarity public_service_1 public_service_2 public_service_3 public_service_4 public_service_5 public_service_6 public_service_7 public_service_8 public_service_9
id
1242 4 2 3 6 13 2015-09-24 17:58:00 1 1971 1 1 … 5 100 90 60 80 70.0 80 70 60 50
3651 3 2 3 6 13 2015-09-24 20:25:00 1 1953 1 1 … 5 100 100 60 50 70.0 50 30 70 40
5330 2 2 3 6 13 2015-09-25 07:57:00 1 1953 1 1 … 5 100 100 100 100 100.0 100 30 100 50
3 rows × 139 columns

可以看到 edu 为 14 的记录中，有3条记录 edu_other 也为缺失；所以将3条记录删除。

data_origin.drop(data_origin[data_origin[‘edu_other’].isna()][data_origin[data_origin[‘edu_other’].isna()][‘edu’] == 14].index, inplace=True)
对于 edu_status 的缺失记录，可以先检查记录对应的 edu 是取的何值。

data_origin[data_origin[‘edu_status’].isna()][‘edu’].plot.hist()
pd.value_counts(data_origin[data_origin[‘edu_status’].isna()][‘edu’])
1 1052
2 65
3 2
Name: edu, dtype: int64
output_57_1

可以看到对于 edu_status 缺失的记录，其对应的 edu 教育程度为别为没有受过任何教育、私塾、扫盲班和小学；对于取值为 1 和 2 的情况，属于跳问选项，对应的 edu_status 属于缺失是正常的；所以将 edu 取值为 3 的记录删除。

data_origin.drop(data_origin[data_origin[‘edu_status’].isna()][data_origin[data_origin[‘edu_status’].isna()][‘edu’] == 3].index, inplace=True)
对于 join_party 即目前政治面貌是党员的入党时间，只有政治面貌不是党员的缺失值才算正确，查看分布情况。

data_origin[data_origin[‘join_party’].isna()][‘political’].plot.hist()
pd.value_counts(data_origin[data_origin[‘join_party’].isna()][‘political’])
1 6703
2 402
-8 41
3 11
4 5
Name: political, dtype: int64
output_61_1

根据直方图看到，有5条记录的 partical 的取值是 4 而入党时间没有填写，所以将这5条记录删除。

data_origin.drop(data_origin[data_origin[‘join_party’].isna()][data_origin[data_origin[‘join_party’].isna()][‘political’] == 4].index, inplace=True)
对于 hukou_loc 即目前的户口登记地，查看缺失记录的 hukou 登记情况，发现取值都为 7 即没有户口，所以缺失属于正常。

data_origin[data_origin[‘hukou_loc’].isna()][‘hukou’].to_frame()
hukou
id
589 7
3657 7
3799 7
7811 7
对于 social_neighbor 和 social_friend 即与与其他朋友进行社交娱乐活动的频繁程度和有多少个晚上是因为出去度假或者探访亲友而没有在家过夜，首先查看缺失记录的 socialize 的分布情况。

data_origin[data_origin[‘social_neighbor’].isna()][‘socialize’].plot.hist()
pd.value_counts(data_origin[data_origin[‘social_neighbor’].isna()][‘socialize’])
1 793
Name: socialize, dtype: int64
output_67_1

可以发现所有的 social_neighbor 和 social_friend 缺失记录的 socialize 即是否经常在空闲时间做社交的事情全部均为 1 即从不社交，所以两个特征的缺失值可以使用 1 填充。

data_origin[‘social_neighbor’].fillna(1, inplace=True)
data_origin[‘social_friend’].fillna(1, inplace=True)
对于 s_edu 到 s_work_exper 的特征，缺失值的记录数都一样，所以存在可能这几项特征的缺失记录都来自同一批问卷对象。

首先查看 s_edu 的缺失记录的 marital 的分布情况。

data_origin[data_origin[‘s_edu’].isna()][‘marital’].plot.hist()
pd.value_counts(data_origin[data_origin[‘s_edu’].isna()][‘marital’])
1 827
7 717
6 171
Name: marital, dtype: int64
output_71_1

可以发现 s_edu 缺失的记录的婚姻情况全部均为未婚、离婚或丧偶，均属于没有配偶或同居伴侣的情况，所以属于正常的缺失。

对于 s_political 到 s_work_exper 全部均属于上述情况。

对于 s_work_status 即配偶或同居伴侣目前的工作状况，首先查看调查问卷。

图片

可以得知只有 s_work_exper 填写了 1 的情况下才应该填写 s_work_status 和 s_work_type 其它选项均需要跳过，所以属于正常缺失值。

下面查看 s_work_status 缺失记录的 s_work_exper 的分布情况。

data_origin[data_origin[‘s_work_status’].isna()][‘s_work_exper’].plot.hist()
pd.value_counts(data_origin[data_origin[‘s_work_status’].isna()][‘s_work_exper’])
5.0 1424
3.0 1017
4.0 823
6.0 221
2.0 217
1.0 1
Name: s_work_exper, dtype: int64
output_73_1

查看得知 s_work_exper 选 1 的记录只有1条，直接删除即可。

data_origin.drop(data_origin[data_origin[‘s_work_status’].isna()][data_origin[data_origin[‘s_work_status’].isna()][‘s_work_exper’] == 1].index, inplace=True)
在调查问卷中，每个选项通用含义，其 -1 表示不适用；-2 表示不知道；-3 表示拒绝回答；-8 表示无法回答。

在这里将所有的特征的负数使用每一个特征的中位数进行填充。

data_origin.shape
(7978, 139)
no_ne_rows_index = (data_origin.drop([‘survey_time’, ‘edu_other’, ‘property_other’, ‘invest_other’], axis=1) < 0).sum(axis=1)[(data_origin.drop([‘survey_time’, ‘edu_other’, ‘property_other’, ‘invest_other’], axis=1) < 0).sum(axis=1) == 0].index
for column, content in data_origin.items():
if pd.api.types.is_numeric_dtype(content):
data_origin[column] = data_origin[column].apply(lambda x : pd.Series(data_origin.loc[no_ne_rows_index, :][column].unique()).median() if(x < 0 and x != np.nan) else x)
将所有的负数填充完成后，再将 NaN 数值全部使用统一的一个值 -1 填充。

data_origin.fillna(-1, inplace=True)
至此，所有特征的缺失值已经全部处理完毕。

文本数据处理#
在所有的特征中，有3个特征分别是 edu_other、property_other、invest_other 是字符串数据，需要将其转换成序号编码（Ordinal Encoding）。

首先查看 edu_other 的填写情况。

data_origin[data_origin[‘edu_other’] != -1][‘edu_other’].to_frame()
edu_other
id
1170 夜校
2513 夜校
4926 夜校
可以看到 edu_other 的填写情况全都是夜校，将字符串转换成序号编码。

data_origin[‘edu_other’] = data_origin[‘edu_other’].astype(‘category’).values.codes + 1
查看 property_other 即房子产权归属谁，首先检查调查问卷的填写情况。

data_origin[data_origin[‘property_other’] != -1][‘property_other’].to_frame()
property_other
id
76 无产权
92 已购买，但未过户
99 家庭共同所有
132 待办
455 没有产权
… …
7376 家人共有
7746 全家人共有
7776 兄弟共有
7821 未分家，全家所有
7917 家人共有
66 rows × 1 columns

根据填写情况来看，其中有很多填写信息都是一个意思，例如家庭共同所有和全家所有是同一个意思，但是在python处理中只能一个个的手动处理。

#data_origin.loc[[8009, 9212, 9759, 10517], ‘property_other’] = ‘多人拥有’
#data_origin.loc[[8014, 8056, 10264], ‘property_other’] = ‘未过户’
#data_origin.loc[[8471, 8825, 9597, 9810, 9842, 9967, 10069, 10166, 10203, 10469], ‘property_other’] = ‘全家拥有’
#data_origin.loc[[8553, 8596, 9605, 10421, 10814], ‘property_other’] = ‘无产权’
data_origin.loc[[76, 132, 455, 495, 1415, 2511, 2792, 2956, 3647, 4147, 4193, 4589, 5023, 5382, 5492, 6102, 6272, 6339,
6507, 7184, 7239], ‘property_other’] = ‘无产权’
data_origin.loc[[92, 1888, 2703, 3381, 5654], ‘property_other’] = ‘未过户’
data_origin.loc[[99, 619, 2728, 3062, 3222, 3251, 3696, 5283, 6191, 7295, 7376, 7746, 7821, 7917], ‘property_other’] = ‘全家拥有’
data_origin.loc[[1597, 4993, 5398, 5899, 7240, 7776], ‘property_other’] = ‘多人拥有’
data_origin.loc[[6469, 6891], ‘property_other’] = ‘小产权’
将字符串编码为整数型的序号（ordinal）类型。

data_origin[‘property_other’] = data_origin[‘property_other’].astype(‘category’).values.codes + 1
查看 invest_other 即从事的投资活动的填写情况。

pd.DataFrame(data_origin[data_origin[‘invest_other’] != -1][‘invest_other’].unique())
0
0 理财产品
1 民间借贷
2 银行理财
3 储蓄存款
4 理财
5 银行存款利息
6 活期储蓄
7 投资服务业、家具业
8 银行存款
9 个人融资
10 租房
11 老人家不清楚
12 家中有部分土地承包出去
13 没有
14 高利贷
15 彩票
16 自己没有，儿女不清楚
17 网上理财
18 统筹
19 福利车票
20 其他理财产品
21 商业万能保险
22 投资开发区
23 字画、茶壶
同样地，将其转换成整数类型的序号（ordinal）编码。

data_origin[‘invest_other’] = data_origin[‘invest_other’].astype(‘category’).values.codes + 1
离群值处理#
data_nona = data_origin.copy()
画出箱型图分析特征的异常值。

并删除离群记录。

sns.boxplot(x=data_nona[‘house’])
AxesSubplot:xlabel=‘house’
output_100_1

data_nona.drop(data_nona[data_nona[‘house’] > 25].index, inplace=True)
sns.boxplot(x=data_nona[‘family_m’])
AxesSubplot:xlabel=‘family_m’
output_102_1

data_nona.drop(data_nona[data_nona[‘family_m’] > 40].index, inplace=True)
sns.boxplot(x=data_nona[‘inc_exp’])
AxesSubplot:xlabel=‘inc_exp’
output_104_1

data_nona.drop(data_nona[data_nona[‘inc_exp’] > 0.6e8].index, inplace=True)
查看调查时间的月份分布情况，因为调查问卷都是在2015年填写，只需要查看月份的离群点。

图片

由图可知调查问卷是从6月开始的，记录中2月的问卷属于异常数据，应该删除。

sns.boxplot(x=data_nona[‘survey_time’].dt.month)
AxesSubplot:xlabel=‘survey_time’
output_107_1

data_nona.drop(data_nona[data_nona[‘survey_time’].dt.month < 6].index, inplace=True)
特征构造#
特征构造也可称为特征交叉、特征组合、数据变换。

连续变量离散化#
离散化除了一些计算方面等等好处，还可以引入非线性特性，也可以很方便的做cross-feature。离散特征的增加和减少都很容易，易于模型的快速迭代。此外，噪声很大的环境中，离散化可以降低特征中包含的噪声，提升特征的表达能力。

pd.DataFrame(continuous_list)
0
0 birth
1 edu_yr
2 income
3 floor_area
4 height_cm
5 weight_jin
6 work_yr
7 family_income
8 family_m
9 house
10 son
11 daughter
12 minor_child
13 marital_1st
14 s_birth
15 marital_now
16 s_income
17 f_birth
18 m_birth
19 inc_exp
20 public_service_1
21 public_service_2
22 public_service_3
23 public_service_4
24 public_service_5
25 public_service_6
26 public_service_7
27 public_service_8
28 public_service_9
将连续型变量全部进行分箱，然后对每个区间进行编码，生成新的离散的特征。

for column in continuous_list:
cut = pd.qcut(data_nona[column], q=5, duplicates=‘drop’)
cat = cut.values
codes = cat.codes
data_nona[column + ‘_discrete’] = codes
for column, content in data_nona.items():
if pd.api.types.is_numeric_dtype(content):
data_nona[column] = content.astype(‘int’)
特征选择#
将连续变量离散化后，生成以后缀 _discrete 的新特征，所以将原来的连续变量的特征删除掉。

data_nona.to_csv(’./data/happiness_train_complete_analysis.csv’)
data_nona.drop(continuous_list, axis=1, inplace=True)
data_nona.to_csv(’./data/happiness_train_complete_nona.csv’)
特征分析#
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
data = pd.read_csv(’./data/happiness_train_complete_analysis.csv’, index_col=‘id’, parse_dates=[‘survey_time’])
data.head()
happiness survey_type province city county survey_time gender birth nationality religion … inc_exp_discrete public_service_1_discrete public_service_2_discrete public_service_3_discrete public_service_4_discrete public_service_5_discrete public_service_6_discrete public_service_7_discrete public_service_8_discrete public_service_9_discrete
id
1 4 1 12 32 59 2015-08-04 14:18:00 1 1959 1 1 … 2 0 0 0 0 0 0 0 0 0
2 4 2 18 52 85 2015-07-21 15:04:00 1 1992 1 1 … 2 4 1 2 3 4 1 4 0 0
3 4 2 29 83 126 2015-07-21 13:24:00 2 1967 1 0 … 3 4 2 3 3 3 4 4 4 2
4 5 2 10 28 51 2015-07-25 17:33:00 2 1943 1 1 … 0 4 3 2 3 3 4 4 3 2
5 4 1 7 18 36 2015-08-10 09:50:00 2 1994 1 1 … 4 0 0 0 0 0 0 0 0 0
5 rows × 168 columns

data.describe()
happiness survey_type province city county gender birth nationality religion religion_freq … inc_exp_discrete public_service_1_discrete public_service_2_discrete public_service_3_discrete public_service_4_discrete public_service_5_discrete public_service_6_discrete public_service_7_discrete public_service_8_discrete public_service_9_discrete
count 7968.000000 7968.000000 7968.000000 7968.000000 7968.000000 7968.000000 7968.000000 7968.000000 7968.000000 7968.000000 … 7968.000000 7968.000000 7968.000000 7968.000000 7968.000000 7968.000000 7968.000000 7968.000000 7968.000000 7968.000000
mean 3.866466 1.405120 15.158258 42.572164 70.631903 1.530748 1964.710216 1.399724 0.880271 1.452560 … 1.725653 1.665537 1.272214 1.841365 1.613328 1.848519 1.643449 1.651732 1.654869 1.302962
std 0.818844 0.490946 8.915876 27.183764 38.736751 0.499085 16.845155 1.466409 0.324665 1.358444 … 1.338535 1.420309 1.108440 1.342524 1.499494 1.297290 1.533445 1.544477 1.511468 1.078601
min 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1921.000000 1.000000 0.000000 1.000000 … 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
25% 4.000000 1.000000 7.000000 18.000000 37.000000 1.000000 1952.000000 1.000000 1.000000 1.000000 … 1.000000 0.000000 0.000000 1.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000
50% 4.000000 1.000000 15.000000 42.000000 73.000000 2.000000 1965.000000 1.000000 1.000000 1.000000 … 1.000000 2.000000 1.000000 2.000000 1.000000 2.000000 1.000000 1.000000 1.000000 1.000000
75% 4.000000 2.000000 22.000000 65.000000 104.000000 2.000000 1977.000000 1.000000 1.000000 1.000000 … 3.000000 2.000000 2.000000 3.000000 3.000000 3.000000 3.000000 3.000000 3.000000 2.000000
max 5.000000 2.000000 31.000000 89.000000 134.000000 2.000000 1997.000000 8.000000 1.000000 9.000000 … 4.000000 4.000000 3.000000 4.000000 4.000000 4.000000 4.000000 4.000000 4.000000 3.000000
8 rows × 167 columns

data.info(verbose=True, null_counts=True)

Int64Index: 7968 entries, 1 to 8000
Data columns (total 168 columns):

Column Non-Null Count Dtype

0 happiness 7968 non-null int64
1 survey_type 7968 non-null int64
2 province 7968 non-null int64
3 city 7968 non-null int64
4 county 7968 non-null int64
5 survey_time 7968 non-null datetime64[ns]
6 gender 7968 non-null int64
7 birth 7968 non-null int64
8 nationality 7968 non-null int64
9 religion 7968 non-null int64
10 religion_freq 7968 non-null int64
11 edu 7968 non-null int64
12 edu_other 7968 non-null int64
13 edu_status 7968 non-null int64
14 edu_yr 7968 non-null int64
15 income 7968 non-null int64
16 political 7968 non-null int64
17 join_party 7968 non-null int64
18 floor_area 7968 non-null int64
19 property_0 7968 non-null int64
20 property_1 7968 non-null int64
21 property_2 7968 non-null int64
22 property_3 7968 non-null int64
23 property_4 7968 non-null int64
24 property_5 7968 non-null int64
25 property_6 7968 non-null int64
26 property_7 7968 non-null int64
27 property_8 7968 non-null int64
28 property_other 7968 non-null int64
29 height_cm 7968 non-null int64
30 weight_jin 7968 non-null int64
31 health 7968 non-null int64
32 health_problem 7968 non-null int64
33 depression 7968 non-null int64
34 hukou 7968 non-null int64
35 hukou_loc 7968 non-null int64
36 media_1 7968 non-null int64
37 media_2 7968 non-null int64
38 media_3 7968 non-null int64
39 media_4 7968 non-null int64
40 media_5 7968 non-null int64
41 media_6 7968 non-null int64
42 leisure_1 7968 non-null int64
43 leisure_2 7968 non-null int64
44 leisure_3 7968 non-null int64
45 leisure_4 7968 non-null int64
46 leisure_5 7968 non-null int64
47 leisure_6 7968 non-null int64
48 leisure_7 7968 non-null int64
49 leisure_8 7968 non-null int64
50 leisure_9 7968 non-null int64
51 leisure_10 7968 non-null int64
52 leisure_11 7968 non-null int64
53 leisure_12 7968 non-null int64
54 socialize 7968 non-null int64
55 relax 7968 non-null int64
56 learn 7968 non-null int64
57 social_neighbor 7968 non-null int64
58 social_friend 7968 non-null int64
59 socia_outing 7968 non-null int64
60 equity 7968 non-null int64
61 class 7968 non-null int64
62 class_10_before 7968 non-null int64
63 class_10_after 7968 non-null int64
64 class_14 7968 non-null int64
65 work_exper 7968 non-null int64
66 work_status 7968 non-null int64
67 work_yr 7968 non-null int64
68 work_type 7968 non-null int64
69 work_manage 7968 non-null int64
70 insur_1 7968 non-null int64
71 insur_2 7968 non-null int64
72 insur_3 7968 non-null int64
73 insur_4 7968 non-null int64
74 family_income 7968 non-null int64
75 family_m 7968 non-null int64
76 family_status 7968 non-null int64
77 house 7968 non-null int64
78 car 7968 non-null int64
79 invest_0 7968 non-null int64
80 invest_1 7968 non-null int64
81 invest_2 7968 non-null int64
82 invest_3 7968 non-null int64
83 invest_4 7968 non-null int64
84 invest_5 7968 non-null int64
85 invest_6 7968 non-null int64
86 invest_7 7968 non-null int64
87 invest_8 7968 non-null int64
88 invest_other 7968 non-null int64
89 son 7968 non-null int64
90 daughter 7968 non-null int64
91 minor_child 7968 non-null int64
92 marital 7968 non-null int64
93 marital_1st 7968 non-null int64
94 s_birth 7968 non-null int64
95 marital_now 7968 non-null int64
96 s_edu 7968 non-null int64
97 s_political 7968 non-null int64
98 s_hukou 7968 non-null int64
99 s_income 7968 non-null int64
100 s_work_exper 7968 non-null int64
101 s_work_status 7968 non-null int64
102 s_work_type 7968 non-null int64
103 f_birth 7968 non-null int64
104 f_edu 7968 non-null int64
105 f_political 7968 non-null int64
106 f_work_14 7968 non-null int64
107 m_birth 7968 non-null int64
108 m_edu 7968 non-null int64
109 m_political 7968 non-null int64
110 m_work_14 7968 non-null int64
111 status_peer 7968 non-null int64
112 status_3_before 7968 non-null int64
113 view 7968 non-null int64
114 inc_ability 7968 non-null int64
115 inc_exp 7968 non-null int64
116 trust_1 7968 non-null int64
117 trust_2 7968 non-null int64
118 trust_3 7968 non-null int64
119 trust_4 7968 non-null int64
120 trust_5 7968 non-null int64
121 trust_6 7968 non-null int64
122 trust_7 7968 non-null int64
123 trust_8 7968 non-null int64
124 trust_9 7968 non-null int64
125 trust_10 7968 non-null int64
126 trust_11 7968 non-null int64
127 trust_12 7968 non-null int64
128 trust_13 7968 non-null int64
129 neighbor_familiarity 7968 non-null int64
130 public_service_1 7968 non-null int64
131 public_service_2 7968 non-null int64
132 public_service_3 7968 non-null int64
133 public_service_4 7968 non-null int64
134 public_service_5 7968 non-null int64
135 public_service_6 7968 non-null int64
136 public_service_7 7968 non-null int64
137 public_service_8 7968 non-null int64
138 public_service_9 7968 non-null int64
139 birth_discrete 7968 non-null int64
140 edu_yr_discrete 7968 non-null int64
141 income_discrete 7968 non-null int64
142 floor_area_discrete 7968 non-null int64
143 height_cm_discrete 7968 non-null int64
144 weight_jin_discrete 7968 non-null int64
145 work_yr_discrete 7968 non-null int64
146 family_income_discrete 7968 non-null int64
147 family_m_discrete 7968 non-null int64
148 house_discrete 7968 non-null int64
149 son_discrete 7968 non-null int64
150 daughter_discrete 7968 non-null int64
151 minor_child_discrete 7968 non-null int64
152 marital_1st_discrete 7968 non-null int64
153 s_birth_discrete 7968 non-null int64
154 marital_now_discrete 7968 non-null int64
155 s_income_discrete 7968 non-null int64
156 f_birth_discrete 7968 non-null int64
157 m_birth_discrete 7968 non-null int64
158 inc_exp_discrete 7968 non-null int64
159 public_service_1_discrete 7968 non-null int64
160 public_service_2_discrete 7968 non-null int64
161 public_service_3_discrete 7968 non-null int64
162 public_service_4_discrete 7968 non-null int64
163 public_service_5_discrete 7968 non-null int64
164 public_service_6_discrete 7968 non-null int64
165 public_service_7_discrete 7968 non-null int64
166 public_service_8_discrete 7968 non-null int64
167 public_service_9_discrete 7968 non-null int64
dtypes: datetime64ns, int64(167)
memory usage: 10.3 MB
首先，查看 happiness 幸福程度的分布，可以发现多数人都属于比较幸福的程度。

sns.set_theme(style=“darkgrid”)
sns.displot(data, x=“happiness”, facet_kws=dict(margin_titles=True))

output_129_1

查看每个人的收入和幸福度的散点图，通过散点图可以看出随着收入的提高，大多数点都落在了较高的幸福程度上；即使如此，也会发现存在一些收入非常高的人也处在一个说不上幸福不幸福的程度。

sns.set_theme(style=“whitegrid”)
f, ax = plt.subplots()
sns.despine(f, left=True, bottom=True)
sns.scatterplot(x=“happiness”, y=“income”,
size=“income”,
palette=“ch:r=-.2,d=.3_r”,
data=data, ax=ax)

output_131_1

查看性别男女的幸福程度的分布直方图，在性别特征上没有过多的类别不平衡情况。

sns.set_theme(style=“darkgrid”)
sns.displot(
data, x=“happiness”, col=“gender”,
facet_kws=dict(margin_titles=True)
)

output_133_1

通过直线图，可以看出，随着 edu 受到的教育的提高，幸福程度也随之提升。

sns.set_theme(style=“ticks”)
palette = sns.color_palette(“rocket_r”)
sns.relplot(
data=data,
x=“edu”, y=“happiness”,
kind=“line”, size_order=[“T1”, “T2”], palette=palette,
facet_kws=dict(sharex=False)
)

output_135_0

查看每个幸福程度的出生日期，可以看出，不同幸福程度的年代的人分布都是大同小异的。

sns.set_theme(style=“ticks”, palette=“pastel”)
sns.boxplot(x=“happiness”, y=“birth”,
data=data)
sns.despine(offset=10, trim=True)
output_135_0

将记录分为是否信仰宗教信仰，查看幸福度和健康状况的分裂小提琴图，也可以看出一个趋势，幸福度高的人大多数都分布在较高的健康状况上，而且也可以看出一个现象，随着健康状况和幸福度的提高，信仰宗教信仰的人数也慢慢增加。

sns.set_theme(style=“whitegrid”)
sns.violinplot(data=data, x=“happiness”, y=“health”, hue=“religion”,
split=True, inner=“quart”, linewidth=1)
sns.despine(left=True)
output_137_0

绘制一个多变量分布直方图，可以看出大多数比较幸福的人，房产的数量也不会大幅增加。

import seaborn as sns
sns.set_theme(style=“ticks”)
g = sns.JointGrid(data=data, x=“happiness”, y=“house”, marginal_ticks=True)

Set a log scaling on the y axis

g.ax_joint.set(yscale=“linear”)

Create an inset legend for the histogram colorbar

cax = g.fig.add_axes([.15, .55, .02, .2])

Add the joint and marginal histogram plots

g.plot_joint(
sns.histplot, discrete=(True, False),
cmap=“light:#03012d”, pmax=.8, cbar=True, cbar_ax=cax
)
g.plot_marginals(sns.histplot, element=“step”, color="#03012d")

output_139_1

绘制幸福度和住房建筑面积的核密度估计图，可以看出同样的现象，多数比较幸福的人的房屋建筑面积也不会集中在很高的一个水平，但是也会有一个随着房屋建筑面积的增加幸福度也增加的现象。

sns.set_theme(style=“ticks”)
g = sns.jointplot(
data=data[data[‘floor_area’] < 600],
x=“happiness”, y=“floor_area”,
kind=“kde”,
)
output_141_0

查看各个特征的热力图，可以根据图中的颜色深度看出两两特征之间的相关性的高低。

sns.set_theme(style=“whitegrid”)
corr_list = [‘survey_type’, ‘province’, ‘city’, ‘county’, ‘survey_time’, ‘gender’, ‘birth’, ‘nationality’, ‘religion’,
‘religion_freq’, ‘edu’, ‘income’, ‘political’, ‘floor_area’, ‘height_cm’, ‘weight_jin’, ‘health’, ‘health_problem’,
‘depression’, ‘hukou’, ‘socialize’, ‘relax’, ‘learn’, ‘equity’, ‘class’, ‘work_exper’, ‘work_status’, ‘work_yr’, ‘work_type’,
‘work_manage’, ‘family_income’, ‘family_m’, ‘family_status’, ‘house’, ‘car’, ‘marital’, ‘status_peer’, ‘status_3_before’,
‘view’, ‘inc_ability’]
df = data
corr_mat = data[corr_list].corr().stack().reset_index(name=“correlation”)
g = sns.relplot(
data=corr_mat,
x=“level_0”, y=“level_1”, hue=“correlation”, size=“correlation”,
palette=“vlag”, hue_norm=(-1, 1), edgecolor=".7",
height=10, sizes=(50, 250), size_norm=(-.2, .8),
)
g.set(xlabel="", ylabel="", aspect=“equal”)
g.despine(left=True, bottom=True)
g.ax.margins(.02)
for label in g.ax.get_xticklabels():
label.set_rotation(90)
for artist in g.legend.legendHandles:
artist.set_edgecolor(".7")
output_143_0

查看全国省会城市的幸福人数的占比条形图，通过图中可以看出，湖北省调查人数最多但幸福人数不算高；河南省和山东省的幸福人数的占比非常之高；即使内蒙古自治区的调查人数最少，但是幸福人数的占比却是非常高的。

sns.set_theme(style=“whitegrid”)

province_total = data[‘province’].groupby(data[‘province’]).count().sort_values(ascending=False).to_frame()
province_total.columns = [‘total’]
happiness_involved = []
for index in province_total.index:
happiness_involved.append((data[data[‘province’] == index][data[data[‘province’] == index][‘happiness’] > 3].shape[0]))
happiness_involved = pd.DataFrame(happiness_involved, index=province_total.index)
happiness_involved.columns = [‘involved’]
province_total[‘province’] = province_total.index.map({
1 : ‘Shanghai’, 2 : ‘Yunnan’, 3 : ‘Neimeng’, 4 : ‘Beijing’, 5 : ‘Jilin’, 6 : ‘Sichuan’, 7 : ‘Tianjin’, 8 : ‘Ningxia’,
9 : ‘Anhui’, 10 : ‘Shandong’, 11 : ‘Shanxi’, 12 : ‘Guangdong’, 13 : ‘Guangxi’, 14 : ‘Xinjiang’, 15 : ‘Jiangsu’,
16 : ‘Jiangxi’, 17 : ‘Hebei’, 18 : ‘Henan’, 19 : ‘Zhejiang’, 20 : ‘Hainan’, 21 : ‘Hubei’, 22 : ‘Hunan’, 23 : ‘Gansu’,
24 : ‘Fujian’, 25 : ‘XIzang’, 26 : ‘Guizhou’, 27 : ‘Liangning’, 28 : ‘Chongqing’, 29 : ‘Shaanxi’, 30 : ‘Qinghai’, 31 : ‘Heilongjiang’})
happiness_involved[‘province’] = province_total[‘province’]

f, ax = plt.subplots(figsize=(6, 15))

sns.set_color_codes(“pastel”)
sns.barplot(x=“total”, y=“province”, data=province_total,
label=“Total”, color=“b”)

sns.set_color_codes(“muted”)
sns.barplot(x=“involved”, y=“province”, data=happiness_involved,
label=“Alcohol-involved”, color=“b”)

ax.legend(ncol=2, loc=“lower right”, frameon=True)
ax.set(ylabel="", xlabel=“Happiness of every province”)
sns.despine(left=True, bottom=True)
output_145_0

查看调查对象认为的当今社会的公平度中的幸福人数占比的直方图，多数调查对象认为当今社会是出于一个比较公平的，但仍有近半数人认为不算太公平。

sns.set_theme(style=“ticks”)
f, ax = plt.subplots(figsize=(7, 5))
sns.despine(f)
sns.histplot(
data, hue=‘happiness’,
x=“equity”,
multiple=“stack”,
palette=“light:m_r”,
edgecolor=".3",
linewidth=.5
)

output_147_1

根据多变量的散点图，幸福度高的人的都均匀地分布在了不同身高、体重的地方；体形没有太大地影响幸福度。

sns.set_theme(style=“white”)
sns.relplot(x=“height_cm”, y=“weight_jin”, hue=“happiness”, size=“health”,
alpha=.5, palette=“muted”, data=data)

output_149_1

绘制一个带有误差带的直线图，横轴表示幸福度的提升，纵轴表示期待的年收入的提升，可以看出，在幸福度比较低的人期待的年收入通常会很高并带有非常大的误差，随着幸福度的提升每个人期待的年收入也没有变得更高，并且随之误差带也变小了。

sns.set_theme(style=“ticks”)
palette = sns.color_palette(“rocket_r”)
sns.relplot(
data=data,
x=“happiness”, y=“inc_exp”,
kind=“line”, palette=palette,
aspect=.75, facet_kws=dict(sharex=False)
)

output_151_1

模型建立#
XGBoost 模型介绍#
XGBoost 是一个具有高效、灵活和可移植性的经过优化的分布式梯度提升库。它的实现是基于机器学习算法梯度提升框架。XGBoost 提供了并行的提升树（例如GBDT、GBM）以一个非常快速并且精准的方法解决了许多的数据科学问题。相同的代码可以运行在主流的分布式环境（如Hadoop、SGE、MPI）并且可以处理数十亿的样本。

XGBoost代表了极端梯度提升（Extreme Gradient Boosting）。

集成决策树#
首先了解XGBoost的模型选择：集成决策树。树的集成模型是由CART（classification and regression trees）的集合组成。下面一张图简单说明了一个CART分出某个人是否喜欢玩电脑游戏的例子。

图片

将每个家庭成员分到不同的叶子结点上，并赋给他们一个分数，每一个叶结点对应了一个分数。CART与决策树是略有不同的，决策树中每个叶结点只包含了一个决策值。在CART上，真实的分数是与叶结点关联的，可以给出比分类更丰富的解释。这也允许了更具有原则、更一致性的优化方法。

通常，在实践中一个单独的树是不够强大的。实际上使用的是集成模型，将多个树的预测结果汇总到一起。

图片

上图中是一个由两棵树集成在一起的例子。每一个树的预测分数被加到一起得到最终的分数。一个重要的因素是两棵树努力补足彼此。可以写出模型：

y^i=∑k=1Kfk(xi),fk∈F
其中，K 是树的数量，f 是一个在函数空间 F 的函数，并且 F 是一个所有可能的CART的集合。可被优化的目标函数为：

obj(θ)=∑inℓ(yi,y^i)+∑k=1KΩ(fk)
随机森林和提升树实际上都是相同的模型；不同之处是如何去训练它们。如果需要一个用来预测的集成树，只需要写出一个并其可以工作在随机森林和提升树上。

提升树#
正如同所有的监督学习一样，想要训练树就要先定义目标函数并优化它。

一个目标函数要总是包含训练的损失度和正则化项。

obj=∑inℓ(yi,y^i)+∑k=1KΩ(fk)
加性训练#
树需要训练的参数有 fi 每一个都包含了树的结构和叶结点的得分。训练树的结构是比传统的可以直接采用梯度的优化问题更难。一次性训练并学习到所有的树是非常棘手的。相反地，可以采取一个附加的策略，修正已经学习到的，同时增加一课新树。可以写出在第 t 步的预测值 y^(t)i

y^(0)iy(1)iy^(2)iy(t)i=0=f1(xi)=y^{(0)i+f1(xi)=f1(xi)+f2(xi)=y}(1)i+f2(xi)…=∑k=1Kfk(xi)=y^(t−1)i+ft(xi)
在每一步需要什么的树，增加一棵树，优化目标函数。

obj(t)=∑i=1nℓ(yi,y^{(t)i)+∑i=1tΩ(fi)=∑i=1nℓ(yi,y}(t−1)i+ft(xi))+Ω(ft)+C
如果考虑使用均方误差（MSE）作为损失函数，目标函数将会变成：

obj(t)=∑i=1nℓ(yi,y^{(t−1)i+ft(xi))+Ω(ft)+C=∑i=1n(yi−(y}(t−1)i+ft(xi)))2+Ω(ft)+C=∑i=1n((yi−y^{(t−1)i)−ft(xi))2+Ω(ft)+C=∑i=1n((yi−y}(t−1)i)2−2(yi−y^{(t−1)i)ft(xi)+ft(xi)2)+Ω(ft)+C=∑i=1n(−2(yi−y}(t−1)i)ft(xi)+ft(xi)2)+Ω(ft)+C
MSE的形式是非常优雅的，其中有一个一阶项（通常称作残差）和一个二阶项。对于其它的损失函数（例如logistic的损失函数）而言，是没有那么轻易就可以得到如此优雅的形式。因此，通常会使用泰勒公式损失函数展开到二阶项：

泰勒公式：函数 f(x) 在开区间 (a,b) 上具有 (n+1) 阶导数，对于任一 x∈(a,b) 有

f(x)=f(x0)0!+f′(x0)1!(x−x0)+f′′(x0)2!(x−x0)2+⋯+f(n)(x0)n!(x−x0)n+Rn(x)
obj(t)=∑i=1nℓ(yi,y^{(t−1)i+ft(xi))+Ω(ft)+C=∑i=1n[ℓ(yi,y}(t−1)i)0!+ℓ′(yi,y^{(t−1)i)1!(y}(t)i−y^{(t−1)i)+ℓ′′(yi,y}(t−1)i)2!(y^(t)i−y(t−1)i)2]+Ω(ft)+C=∑i=1n[ℓ(yi,y^{(t−1)i)+ℓ′(yi,y}(t−1)i)ft(xi)+12ℓ′′(yi,y^{(t−1)i)ft(xi)2]+Ω(ft)+C=∑i=1n[ℓ(yi,y}(t−1)i)+gift(xi)+12hift(xi)2]+Ω(ft)+C
其中，gi 和 hi 被定义为：

gihi=∂y^{(t−1)iℓ(yi,y}(t−1)i)=∂2y^{(t−1)iℓ(yi,y}(t−1)i)
移除所有的常量，在第 t 步的目标函数就成了：

∑i=1n[gift(xi)+12hift(xi)2]+Ω(ft)
这就成了对于一颗新树的优化目标。一个非常重要的优势就是这个定义的目标函数的值只依赖于 gi 和 hi 这正是XGBoost支持自定义损失函数。可以优化各种损失函数，包括逻辑回归和成对排名（pairwise ranking），使用 gi 和 hi 作为输入的完全相同的求解器求解。

模型复杂度#
定义树的复杂度 Ω(f) 。首先提炼出树的定义 f(x) 为：

ft(x)=wq(x),w∈RT,q:Rd→{1,2,…,T}.
其中 w 是叶结点上的得分向量，q 是一个将每一个数据点分配到对应的叶结点上的函数，T 是叶结点的数量。在XGBoost中，定义复杂度为：

Ω(f)=γT+12λ∑j=1Tw2j
有不止一个方法定义复杂度，但是这种方式在实践中可以表现的很好。正则化项是大多数树包都会被忽略的一部分。这是因为传统的树学习的对待仅仅强调改善杂质，模型的复杂度的控制留给了启发式。通过正式的定义它，可以更好的理解模型并使模型的表现更具有泛化能力。

树的结构分数#
通过对树模型的目标函数的推导，可以得到在第 t 步的树的目标值：

obj(t)≈∑i=1n[giwq(xi)+12hiw2q(xi)]+γT+12λ∑j=1Tw2j=[g1wq(x1)+12h1w2q(x1)+g2wq(x2)+12h2w2q(x2)+⋯+gnwq(xn)+12hnw2q(xn)]+γT+12λ∑j=1Tw2j=∑j=1T[(∑i∈Ijgi)wj+12(∑i∈Ijhi)w2j]+γT+12λ∑j=1Tw2j=∑j=1T[(∑i∈Ijgi)wj+12(∑i∈Ijhi)w2j+12λw2j]+γT=∑j=1T[(∑i∈Ijgi)wj+12(∑i∈Ijhi+λ)w2j]+γT
其中 Ij={i|q(xi)=j} 是第 i 个数据点被分配到第 j 个叶结点上的下标集合。改变了其累加的索引，因为被分配到相同的叶结点上的数据点得到的分数是统一的。进一步压缩表达令 Gj=∑i∈Ijgi 和 Hj=∑i∈Ijhi

obj(t)=∑j=1T[Gjwj+12(Hj+λ)w2j]+γT
其中，wj 是彼此独立的，式子 Gjwj+12(Hj+λ)w2j 是二次的，并且对于给定的结构 q(x) 最好的 wj 和可以得到的最佳的目标规约为：

wj∗obj∗=−GjHj+λ=−12∑j=1TG2jHj+λ+γT
此公式衡量了一棵树的结构 q(x) 有多好。

图片

基本上，对于一颗给定的树结构，将统计量 gi 和 hi 推到它们所属的叶结点上，并将它们累加到一起，使用公式计算衡量这棵树多好。这个分数类似于决策树中的不纯度度量（impurity measure），区别之处在于它还将模型复杂度考虑在内。

学习树的结构#
现在已经有了衡量一棵树好坏的指标，一个典型的想法是枚举所有可能的树并从中挑出最好的一个。实际上这是非常棘手的，所以应该尝试一次优化树的一个级别。具体来说，是将一个子结点分割成两个叶结点，得分增益为：

Gain=12[G2LHL+λ+G2RHR+λ−(GL+GR)2HL+HR+λ]−γ
这个公式可以被分解为几个部分，一部分是在新左子结点的得分，第二部分是在新右子结点上的得分，第三部分是原先叶结点上的得分，第四部分是在新叶结点上的正则化项。可以看到非常重要的因素是，如果增益小于 γ 更好的选择是不去分割出一个新分支。这就是基本的树模型的剪枝（pruning）技术。

对于实际中的数据，通常想要搜索一个最优的分割点。一个高效率的做法是，将所有的实例（记录）排好序，如下图示。

图片

从左到右扫描计算所有分割方案的结构分数是非常高效的，并且可以快速地找出最优的分割点。

加性数训练的限制

因为将所有可能的树结构枚举出来是非常棘手的，所以每次增加一个分割点（split）。这个方法在大多数情况下运行的很好，但是有一些边缘案例导致这个方法失效。对于退化模型的训练结果，每次仅仅考虑一个特征维度。参考Can Gradient Boosting Learn Simple Arithmetic?

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
train = pd.read_csv(’./data/happiness_train_complete_nona.csv’, index_col=‘id’, parse_dates=[‘survey_time’])
test = pd.read_csv(’./data/happiness_test_complete_nona.csv’, index_col=‘id’, parse_dates=[‘survey_time’])
submit = pd.read_csv(’./data/happiness_submit.csv’, index_col=‘id’)
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

X = train.drop([‘happiness’, ‘survey_time’], axis=1)
y = train[‘happiness’]

X_train, X_test, y_train, y_test = train_test_split(X, y)
from xgboost import XGBRegressor
from xgboost import plot_importance

model = XGBRegressor(gamma=0.1, learning_rate=0.1)
model.fit(X_train, y_train)
mean_squared_error(y_test, model.predict(X_test))
0.4596381608913307
predict = pd.DataFrame({‘happiness’ : model.predict(test.drop(‘survey_time’, axis=1))}, index=test.index)
submit.loc[predict.index, ‘happiness’] = predict[‘happiness’]
submit.to_csv(’./data/predict.csv’)

你可能感兴趣的:([Discuz 插件] SEO天涯海角 3.1.0 正式版.rar)

网易严选官方旗舰店，优质商品，卓越服务高省_飞智666600
网易严选官方旗舰店是网易旗下的一家电商平台，以提供优质商品和卓越服务而闻名。作为一名SEO优化师，我将为您详细介绍网易严选官方旗舰店，并重点强调其特点和优势。大家好！我是高省APP最大团队&联合创始人飞智导师。相较于其他返利app，高省APP的佣金更高，模式更好，最重要的是，终端用户不会流失！高省APP佣金更高，模式更好，终端用户不流失。【高省】是一个自用省钱佣金高，分享推广赚钱多的平台，百度有几
Python爬虫解析工具之xpath使用详解 eqa11 python 爬虫开发语言
文章目录Python爬虫解析工具之xpath使用详解一、引言二、环境准备1、插件安装2、依赖库安装三、xpath语法详解1、路径表达式2、通配符3、谓语4、常用函数四、xpath在Python代码中的使用1、文档树的创建2、使用xpath表达式3、获取元素内容和属性五、总结Python爬虫解析工具之xpath使用详解一、引言在Python爬虫开发中，数据提取是一个至关重要的环节。xpath作为一门
ExpRe[25] bash外的其它shell：zsh和fish tritone ExpRe bash linux ubuntu shell
文章目录zsh基础配置实用特性插件`autojump`语法高亮自动补全fish优点缺点时效性本篇撰写时间为2021.12.15，由于计算机技术日新月异，博客中所有内容都有时效和版本限制，具体做法不一定总行得通，链接可能改动失效，各种软件的用法可能有修改。但是其中透露的思想往往是值得学习的。本篇前置：ExpRe[10]Ubuntu[2]准备神秘软件、备份恢复软件https://www.cnblogs
怎么做才能真正限制塑料袋的使用？ BalNews
Environmentalpollutionisalwaysamajorlivelihoodissue.Morethanadecadeago,ourgovernmenthadintroducedapolicyaboutrestrictionsontheuseofplasticbags,wecallitrestrictionsontheuseofplasticbags.Butmorethan10ye
Python实现TIFF 文件转换为 PNG 和 JPG 格式 sand&wich python 开发语言
在日常的图像处理工作中，可能会遇到需要将TIFF格式的图像转换为其他格式的情况，例如PNG和JPG。下面，本文将介绍如何使用Python和GDAL库实现这一功能。准备工作在开始之前，请确保已经安装了必要的库：GDAL（GeospatialDataAbstractionLibrary）可以使用以下命令安装GDAL：pipinstallgdal代码实现以下是一个将TIFF文件转换为PNG文件的示例代码
GenVisR 基因组数据可视化实战(三) 11的雾
3.genCov画每个突变位点附件的coverage，跟igv有点相似。这个操作起来很复杂，但是图还是挺有用的。可以考虑。由于我的referencegenomebuild是hg38BiocManager::install(c("TxDb.Hsapiens.UCSC.hg38.knownGene","BSgenome.Hsapiens.UCSC.hg38"))library(TxDb.Hsapien
解决Obsidian写笔记中的＜img＞标签无法显示图片的问题全能全知者笔记
Obsidian中写md笔记如果使用标签会显示不出图案，后来才知道因为Obsidian的问题导致只能用绝对路径定位。所以我本人写了一个py插件，将md笔记里的img标签批量替换成Obsidian能够读取的形式。安装FixObsImgDpy:pipinstallFixObsImgDpy安装完成后在需要修复的md文件的父目录下运行命令:FixObsImgDpy就会自动修复父目录以下的全部md文件仓库
轻风拂柳《春意萦怀》之六轻风拂柳
图/来自网络轻风拂柳《春意萦怀》之六轻风拂柳《春意萦怀》原韵烂熳芳林赏丽容，春光明媚盼相逢。娇桃绽蕊仙姿艳，淑杏凝脂玉色浓。对对黄莺穿树影，双双彩蝶逐花踪。风情小雅灵犀有，景美难将笔墨封。图/来自网络步轻风拂柳《春意萦怀》原韵（一）诗·时就三月阳春思丽容，花红柳绿也相逢。不歆桃蕊风姿艳，只慕书斋墨色浓。期翼共窗难觅影，时望携手苦寻踪。天涯海角君何有？一颗痴心哪日封？（二）诗·大漠孤烟滴翠丛林展媚容
【Python】已解决：ModuleNotFoundError: No module named ‘PIL’ 屿小夏 python 开发语言
文章目录一、分析问题背景二、可能出错的原因三、错误代码示例四、正确代码示例五、注意事项已解决：ModuleNotFoundError:Nomodulenamed‘PIL’一、分析问题背景当你在Python环境中尝试导入PIL（PythonImagingLibrary）模块时，可能会遇到“ModuleNotFoundError:Nomodulenamed‘PIL’”的错误。这通常发生在尝试使用PIL
【算法练习】IDEA集成leetcode插件实现快速刷 2401_84102892 2024年程序员学习算法 intellij-idea leetcode
============点击右侧边leetcode->设置->配置地址、用户名、密码、存放目录、文件模板用户名要登录后在账号信息里看模板代码1.codefilename!velocityTool.camelC
leetcode中等.数组(21-40)python 九日火 python leetcode
80.RemoveDuplicatesfromSortedArrayII(m-21)Givenasortedarraynums,removetheduplicatesin-placesuchthatduplicatesappearedatmosttwiceandreturnthenewlength.Donotallocateextraspaceforanotherarray,youmustdoth
使用datepicker和uploadify的冲突解决（IE双击才能打开附件上传对话框） zhanglb12
在开发的过程当中，IE的兼容无疑是我们的一块绊脚石，在我们使用的如期的datepicker插件和使用上传附件的uploadify插件的时候，两者就产生冲突，只要点击过时间的插件，uploadify上传框要双才能打开ie浏览器提示错误Missinginstancedataforthisdatepicker解决方案//if(.browser.msie&&'9.0'===.browser.version
uniapp使用内置地图选择插件，实现地址选择并在地图上标点神夜大侠 Uniapp vue.js uniapp
uniapp使用内置地图选择插件，实现地址选择并在地图上标点代码如下：page{background:#F4F5F6;}::-webkit-scrollbar{width:0;height:0;color:transparent;}page{height:100%;width:100%;font-size:24rpx;}image,view,input,textarea,label,text,na
sublime个人设置 bawangtianzun sublime text 编辑器
如何拥有jiangly蒋老师同款编译器(sublimec++配置竞赛向）_哔哩哔哩_bilibiliSublimeText4的安装教程（新手竞赛向）-知乎(zhihu.com)创建文件自动保存为c++打开SublimeText软件。转到"Tools"（工具）>"Developer"（开发者）>"NewPlugin"（新建插件）。在打开的新文件中，粘贴以下代码：importsublimeimport
【新教育-教师随笔】读《做最好的英语老师》有感 164c5aca7b79
伊川县直中学王素平《做最好的英语老师》这本书是作者这些年在他教学中得与失的总结。里面给我们提供了听力，单词，句子，阅读，作文等模块的教学方法，让我受益匪浅，现总结如下：一.语文教学给了我们什么启示？（1）：现有的英语教材内容简单，枯燥，与学生的心智发展水平严重脱节。我们要给学生补中一些贴近学生生活，能感动和影响他们的经典作品。让学生学习知识的同时，有所感悟和思考，同时享受审美的乐趣！如AWiseO
vue2实现复制,粘贴功能周bro vue.js javascript 前端
一、需求说明在项目中点击按钮复制某行文本是很常见的应用场景，在Vue项目中实现复制功能需要借助vue-clipboard2插件。二、代码实现1、安装vue-clipboard2依赖（出现错误的话，可以试试切换成淘宝镜像源npmconfigsetregistryhttps://registry.npm.taobao.org）npminstall--savevue-clipboard22、在main.
idea使用自定义checkstyle.xml配置文件 Gemkey
1.下载插件image.png2.插件安装完后,找到设置中的checkstyle,点击"+",新增自定义规则image.png3.输入描述信息,点击Browse找到对应的文件image.pngimage.png4.可以把active勾上,则使用默认校验规则,点击OK,则可以开始使用自定义规则检测单个文件了image.png
解决SDK Manager 中没有 Support Library 木鱼wzh
1、直接修改SDK-MANAGER打开sdk-manager—->Tools—->options然后点击packages—->showobsoletepackages即可在最下面的Extras目录下找到推荐两个自己使用的镜像服务器：mirrors.neusoft.edu.cn端口80mirrors.dormforce.net端口802、去官网下载SupportLibrar点击这里进入官网进入百度云
maven-assembly-plugin 打包实例带着二娃去遛弯
1.先在pom.xml文件中添加assembly打包插件org.apache.maven.pluginsmaven-assembly-plugin2.6assembly/assembly.xmlmake-assemblypackagesingle说明:1.需要修改的可能就是descriptors标签下面的打包配置文件目录,指定assembly.xml的路径.2.可以添加多个打包配置文件,进行多种形
访问网站被限制怎么办 Bearjumpingcandy 服务器运维
访问网站被限制的情况下，可以通过以下几种方法来解决：检查是否安装了第三方查询软件或插件：有些第三方软件或插件可能会引起非人为的、高频次的访问系统而被限制访问。可以尝试卸载或禁用这些软件或插件，然后重新尝试访问网站。检查共用公网IP地址内的其他电脑：如果用户电脑所处的共用公网IP地址内的其他电脑存在机器访问行为，多次触发禁止访问规则，就会造成该公网IP地址被禁止访问。可以尝试与网络管理员联系，请求解
探索创新科技： Lite-Mono - 简约高效的小型化Mono框架杭律沛Meris
探索创新科技：Lite-Mono-简约高效的小型化Mono框架Lite-Mono[CVPR2023]Lite-Mono:ALightweightCNNandTransformerArchitectureforSelf-SupervisedMonocularDepthEstimation项目地址:https://gitcode.com/gh_mirrors/li/Lite-Mono如果你在寻找一个轻
UI 自动化的页面对象管理神器 PO-Manager TesterHome
原文由alex发表于TesterHome社区网站，点击原文链接可于作者直接交流。做UI自动化的同学都知道，UI自动化一个难点就是页面元素的变化，让自动化维护成为一个痛点。在此，为了减轻这个痛点，我在基于Page-Object模式的基础上开发了页面对象维护的工具。该工具为vscode的一个插件，可以通过vscode插件市场搜索PO-Manager来下载安装本文中的页面对象库文件基于json.一个元素
TA-Lib Python 库 Windows 64位安装包黄桥壮Quinn
TA-LibPython库Windows64位安装包TA.rar项目地址:https://gitcode.com/open-source-toolkit/3ff39简介本仓库提供了一个适用于Windows64位系统的TA-LibPython库安装包。TA-Lib是一个广泛用于金融技术分析的库，支持多种技术指标的计算。资源文件文件名TA-Lib-0.4.29-cp312-win-amd64.whl描
Sentinel实时监控不展示问题朱杰jjj sentinel sentinel
问题官方插件Endpoint支持，可以实时统计出SpringBoot的健康状况和请求的调用信息在使用Endpoint特性之前需要在Maven中添加spring-boot-starter-actuator依赖，并在配置中允许Endpoints的访问。SpringBoot1.x中添加配置management.security.enabled=false。暴露的endpoint路径为/sentinelS
2019-05-29 vue-router的两种模式的区别 Kason晨
1、大家都知道vue是一种单页应用,单页应用就是仅在页面初始化的时候加载相应的html/css/js一单页面加载完成,不会因为用户的操作而进行页面的重新加载或者跳转,用javascript动态的变化html的内容优点:良好的交互体验,用户不需要刷新页面,页面显示流畅,良好的前后端工作分离模式,减轻服务器压力,缺点:不利于SEO,初次加载耗时比较多2、hash模式vue-router默认的是hash
服务器状态监控php源码,服务器状态监控_监控Linux服务器网站状态的SHELL脚本温糯米服务器状态监控php源码
摘要腾兴网为您分享:监控Linux服务器网站状态的SHELL脚本，蜗牛集市，同花顺，探客宝，手柄助手等软件知识，以及日期倒计时插件，云南省教育资源公共，rui手机桌面，小屁孩桌面便签，合金装备崛起复仇，朝夕日历，photoshop图像处理软件,一年级学生每日计划表，悟空找房，饿了吗外卖商家版，逃生，中国民宿网，realpolitiks，交通安全知识竞赛，雅思流利说等软件it资讯，欢迎关注腾兴网。1
Humanize 项目教程尤嫒冰
Humanize项目教程humanizeAJSlibraryforaddinga“humantouch”todata.项目地址:https://gitcode.com/gh_mirrors/humani/humanize项目介绍Humanize是一个开源项目，旨在将机器生成的文本转换为更加自然、人性化的文本。该项目通过先进的算法和自然语言处理技术，使得AI生成的内容更加贴近人类的表达方式，从而提高
以太坊DApp开发指南 Kirn
DApp架构设计DApp架构.png如上图，DApp的架构我们可以简单分为以上三种类型：轻钱包模式、重钱包模式和兼容模式。轻钱包模式轻钱包模式下我们需要有一个开放HttpRPC协议的节点与钱包通信，这个节点可以是任意链上的节点。轻钱包通常会作为一个浏览器插件存在，插件在运行时会自动注入Web3框架，DApp可以通过Web3与区块链节点通信。当DApp只是单纯的获取数据时是不需要钱包介入的，但是当D
discuz discuz_admincp.php 讲解,Discuz! 1.5-2.5 命令执行漏洞分析(CVE-2018-14729) weixin_39740419 discuz 讲解
0x00漏洞简述漏洞信息8月27号有人在GitHub上公布了有关Discuz1.5-2.5版本中后台数据库备份功能存在的命令执行漏洞的细节。漏洞影响版本Discuz!1.5-2.50x01漏洞复现官方论坛下载相应版本就好。0x02漏洞分析需要注意的是这个漏洞其实是需要登录后台的，并且能有数据库备份权限，所以比较鸡肋。我这边是用Discuz!2.5完成漏洞复现的，并用此进行漏洞分析的。漏洞点在：so
【开发环境搭建】Macbook M1搭建Java开发环境 weixin_44329069 java 开发语言
JDK安装与配置下载并安装JDK：ARM64DMG安装包下载链接：JDK21forMac(ARM64)。双击下载的DMG文件，按照提示安装JDK。配置环境变量：打开终端，使用vim编辑.bash_profile文件：vim~/.bash_profile在文件中添加以下内容来设置JAVA_HOME：exportJAVA_HOME=/Library/Java/JavaVirtualMachines/j
Java实现的简单双向Map，支持重复Value superlxw1234 java 双向map
关键字：Java双向Map、DualHashBidiMap 有个需求，需要根据即时修改Map结构中的Value值，比如，将Map中所有value=V1的记录改成value=V2，key保持不变。数据量比较大，遍历Map性能太差，这就需要根据Value先找到Key，然后去修改。即：既要根据Key找Value，又要根据Value
PL/SQL触发器基础及例子百合不是茶 oracle数据库触发器 PL/SQL编程
触发器的简介; 触发器的定义就是说某个条件成立的时候，触发器里面所定义的语句就会被自动的执行。因此触发器不需要人为的去调用，也不能调用。触发器和过程函数类似过程函数必须要调用, 一个表中最多只能有12个触发器类型的,触发器和过程函数相似触发器不需要调用直接执行, 触发时间：指明触发器何时执行，该值可取： before：表示在数据库动作之前触发
[时空与探索]穿越时空的一些问题 comsci 问题
我们还没有进行过任何数学形式上的证明,仅仅是一个猜想..... 这个猜想就是; 任何有质量的物体(哪怕只有一微克)都不可能穿越时空,该物体强行穿越时空的时候,物体的质量会与时空粒子产生反应,物体会变成暗物质,也就是说,任何物体穿越时空会变成暗物质..(暗物质就我的理
easy ui datagrid上移下移一行商人shang js 上移下移 easyui datagrid
/** * 向上移动一行 * * @param dg * @param row */ function moveupRow(dg, row) { var datagrid = $(dg); var index = datagrid.datagrid("getRowIndex", row); if (isFirstRow(dg, row)) {
Java反射 oloz 反射
本人菜鸟，今天恰好有时间，写写博客，总结复习一下java反射方面的知识，欢迎大家探讨交流学习指教首先看看java中的Class package demo; public class ClassTest { /*先了解java中的Class*/ public static void main(String[] args) { //任何一个类都
springMVC 使用JSR-303 Validation验证杨白白 spring mvc
JSR-303是一个数据验证的规范，但是spring并没有对其进行实现，Hibernate Validator是实现了这一规范的，通过此这个实现来讲SpringMVC对JSR-303的支持。 JSR-303的校验是基于注解的，首先要把这些注解标记在需要验证的实体类的属性上或是其对应的get方法上。登录需要验证类 public class Login { @NotEmpty
log4j 香水浓 log4j
log4j.rootCategory=DEBUG, STDOUT, DAILYFILE, HTML, DATABASE #log4j.rootCategory=DEBUG, STDOUT, DAILYFILE, ROLLINGFILE, HTML #console log4j.appender.STDOUT=org.apache.log4j.ConsoleAppender log4
使用ajax和history.pushState无刷新改变页面URL agevs jquery 框架 Ajax html5 chrome
表现如果你使用chrome或者firefox等浏览器访问本博客、github.com、plus.google.com等网站时，细心的你会发现页面之间的点击是通过ajax异步请求的，同时页面的URL发生了了改变。并且能够很好的支持浏览器前进和后退。是什么有这么强大的功能呢？ HTML5里引用了新的API，history.pushState和history.replaceState，就是通过
centos中文乱码 AILIKES centos OS ssh
一、CentOS系统访问 g.cn ，发现中文乱码。于是用以前的方式：yum -y install fonts-chinese CentOS系统安装后，还是不能显示中文字体。我使用 gedit 编辑源码，其中文注释也为乱码。后来，终于找到以下方法可以解决，需要两个中文支持的包： fonts-chinese-3.02-12.
触发器 baalwolf 触发器
触发器(trigger)：监视某种情况，并触发某种操作。触发器创建语法四要素：1.监视地点(table) 2.监视事件(insert/update/delete) 3.触发时间(after/before) 4.触发事件(insert/update/delete) 语法： create trigger triggerName after/before
JS正则表达式的i m g bijian1013 JavaScript 正则表达式
g:表示全局（global)模式，即模式将被应用于所有字符串，而非在发现第一个匹配项时立即停止。 i:表示不区分大小写（case-insensitive）模式，即在确定匹配项时忽略模式与字符串的大小写。 m:表示
HTML5模式和Hashbang模式 bijian1013 JavaScript AngularJS Hashbang模式 HTML5模式
我们可以用$locationProvider来配置$location服务（可以采用注入的方式，就像AngularJS中其他所有东西一样）。这里provider的两个参数很有意思，介绍如下。 html5Mode 一个布尔值，标识$location服务是否运行在HTML5模式下。 ha
[Maven学习笔记六]Maven生命周期 bit1129 maven
从mvn test的输出开始说起当我们在user-core中执行mvn test时，执行的输出如下： /software/devsoftware/jdk1.7.0_55/bin/java -Dmaven.home=/software/devsoftware/apache-maven-3.2.1 -Dclassworlds.conf=/software/devs
【Hadoop七】基于Yarn的Hadoop Map Reduce容错 bit1129 hadoop
运行于Yarn的Map Reduce作业，可能发生失败的点包括 Task Failure Application Master Failure Node Manager Failure Resource Manager Failure 1. Task Failure 任务执行过程中产生的异常和JVM的意外终止会汇报给Application Master。僵死的任务也会被A
记一次数据推送的异常解决端口解决 ronin47 记一次数据推送的异常解决
　　需求：从db获取数据然后推送到B 程序开发完成，上jboss,刚开始报了很多错，逐一解决，可最后显示连接不到数据库。机房的同事说可以ping 通。　　自已画了个图，逐一排除，把linux 防火墙　和　setenforce　设置最低。　　　service iptables stop
巧用视错觉-UI更有趣 brotherlamp UI ui视频 ui教程 ui自学 ui资料
我们每个人在生活中都曾感受过视错觉（optical illusion）的魅力。视错觉现象是双眼跟我们开的一个玩笑，而我们往往还心甘情愿地接受我们看到的假象。其实不止如此，视觉错现象的背后还有一个重要的科学原理——格式塔原理。格式塔原理解释了人们如何以视觉方式感觉物体，以及图像的结构，视角，大小等要素是如何影响我们的视觉的。在下面这篇文章中，我们首先会简单介绍一下格式塔原理中的基本概念，
线段树-poj1177-N个矩形求边长（离散化+扫描线） bylijinnan 数据结构算法线段树
package com.ljn.base; import java.util.Arrays; import java.util.Comparator; import java.util.Set; import java.util.TreeSet; /** * POJ 1177 (线段树+离散化+扫描线)，题目链接为http://poj.org/problem?id=1177
HTTP协议详解 chicony http协议
引言
Scala设计模式 chenchao051 设计模式 scala
Scala设计模式我的话：在国外网站上看到一篇文章，里面详细描述了很多设计模式，并且用Java及Scala两种语言描述，清晰的让我们看到各种常规的设计模式，在Scala中是如何在语言特性层面直接支持的。基于文章很nice，我利用今天的空闲时间将其翻译，希望大家能一起学习，讨论。翻译
安装mysql daizj mysql 安装
安装mysql (1)删除linux上已经安装的mysql相关库信息。rpm -e xxxxxxx --nodeps (强制删除) 执行命令rpm -qa |grep mysql 检查是否删除干净 (2)执行命令 rpm -i MySQL-server-5.5.31-2.el
HTTP状态码大全 dcj3sjt126com http状态码
完整的 HTTP 1.1规范说明书来自于RFC 2616，你可以在http://www.talentdigger.cn/home/link.php?url=d3d3LnJmYy1lZGl0b3Iub3JnLw%3D%3D在线查阅。HTTP 1.1的状态码被标记为新特性，因为许多浏览器只支持 HTTP 1.0。你应只把状态码发送给支持 HTTP 1.1的客户端，支持协议版本可以通过调用request
asihttprequest上传图片 dcj3sjt126com ASIHTTPRequest
NSURL *url =@"yourURL"; ASIFormDataRequest*currentRequest =[ASIFormDataRequest requestWithURL:url]; [currentRequest setPostFormat:ASIMultipartFormDataPostFormat];[currentRequest se
C语言中，关键字static的作用 e200702084 C++c C#
在C语言中，关键字static有三个明显的作用： 1)在函数体，局部的static变量。生存期为程序的整个生命周期，（它存活多长时间）；作用域却在函数体内（它在什么地方能被访问（空间））。一个被声明为静态的变量在这一函数被调用过程中维持其值不变。因为它分配在静态存储区，函数调用结束后并不释放单元，但是在其它的作用域的无法访问。当再次调用这个函数时，这个局部的静态变量还存活，而且用在它的访
win7/8使用curl geeksun win7
1. WIN7/8下要使用curl，需要下载curl-7.20.0-win64-ssl-sspi.zip和Win64OpenSSL_Light-1_0_2d.exe。下载地址： http://curl.haxx.se/download.html 请选择不带SSL的版本，否则还需要安装SSL的支持包 2. 可以给Windows增加c
Creating a Shared Repository; Users Sharing The Repository hongtoushizi git
转载自： http://www.gitguys.com/topics/creating-a-shared-repository-users-sharing-the-repository/ Commands discussed in this section: git init –bare git clone git remote git pull git p
Java实现字符串反转的8种或9种方法 Josh_Persistence 异或反转递归反转二分交换反转 java字符串反转栈反转
注：对于第7种使用异或的方式来实现字符串的反转，如果不太看得明白的，可以参照另一篇博客： http://josh-persistence.iteye.com/blog/2205768 /** * */ package com.wsheng.aggregator.algorithm.string; import java.util.Stack; /**
代码实现任意容量倒水问题 home198979 PHP 算法倒水
形象化设计模式实战 HELLO!架构 redis命令源码解析倒水问题：有两个杯子，一个A升，一个B升，水有无限多，现要求利用这两杯子装C
Druid datasource zhb8015 druid
推荐大家使用数据库连接池 DruidDataSource. http://code.alibabatech.com/wiki/display/Druid/DruidDataSource DruidDataSource经过阿里巴巴数百个应用一年多生产环境运行验证，稳定可靠。它最重要的特点是：监控、扩展和性能。下载和Maven配置看这里： http
两种启动监听器ApplicationListener和ServletContextListener spjich java spring 框架
引言:有时候需要在项目初始化的时候进行一系列工作，比如初始化一个线程池，初始化配置文件，初始化缓存等等，这时候就需要用到启动监听器，下面分别介绍一下两种常用的项目启动监听器 ServletContextListener 特点: 依赖于sevlet容器，需要配置web.xml 使用方法: public class StartListener implements
JavaScript Rounding Methods of the Math object 何不笑 JavaScript Math
The next group of methods has to do with rounding decimal values into integers. Three methods — Math.ceil(), Math.floor(), and Math.round() — handle rounding in differen