2022数学建模国赛C题官网展示论文C155论文复现

2022数学建模国赛C题C155论文复现

  • 1.内容比对
  • 2.第一问第二小问复现代码
    • 2.1 页表合并
    • 2.2 数据的正态性检验
      • 2.2.1数据的正态性检验效果图
    • 2.3不满足正态性,进行中心化对数比变换
      • 2.3.1 核心步骤-inf用0值替换
      • 2.3.2中心化对数比变换效果图
    • 2.4描述性统计
    • 2.5 箱线图绘制

github查看完整论文复现过程

1.内容比对

箱线图比对
国赛C155
2022数学建模国赛C题官网展示论文C155论文复现_第1张图片
复现内容:
2022数学建模国赛C题官网展示论文C155论文复现_第2张图片

2.第一问第二小问复现代码

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
plt.rcParams['font.sans-serif'] = ['SimHei']

# Load all sheets of the Excel file
xl_file = pd.ExcelFile("E:\\数学建模国赛\\2022数学建模赛题\\C题\\附件.xlsx")

# Load individual sheets with correct names
sheet1 = xl_file.parse('表单1')  # 玻璃文物的基本信息
sheet2 = xl_file.parse('表单2')  # 已分类玻璃文物的化学成分比例
sheet3 = xl_file.parse('表单3')  # 未分类玻璃文物的化学成分比例

# Show the first few rows of each sheet
sheet1.head(), sheet2.head(), sheet3.head()


(   文物编号 纹饰  类型  颜色 表面风化
 0     1  C  高钾  蓝绿  无风化
 1     2  A  铅钡  浅蓝   风化
 2     3  A  高钾  蓝绿  无风化
 3     4  A  高钾  蓝绿  无风化
 4     5  A  高钾  蓝绿  无风化,
    文物采样点  二氧化硅(SiO2)  氧化钠(Na2O)  氧化钾(K2O)  氧化钙(CaO)  氧化镁(MgO)  氧化铝(Al2O3)  \
 0     01       69.33        NaN      9.99      6.32      0.87        3.93   
 1     02       36.28        NaN      1.05      2.34      1.18        5.73   
 2  03部位1       87.05        NaN      5.19      2.01       NaN        4.06   
 3  03部位2       61.71        NaN     12.37      5.87      1.11        5.50   
 4     04       65.88        NaN      9.67      7.12      1.56        6.44   
 
    氧化铁(Fe2O3)  氧化铜(CuO)  氧化铅(PbO)  氧化钡(BaO)  五氧化二磷(P2O5)  氧化锶(SrO)  氧化锡(SnO2)  \
 0        1.74      3.87       NaN       NaN         1.17       NaN        NaN   
 1        1.86      0.26     47.43       NaN         3.57      0.19        NaN   
 2         NaN      0.78      0.25       NaN         0.66       NaN        NaN   
 3        2.16      5.09      1.41      2.86         0.70      0.10        NaN   
 4        2.06      2.18       NaN       NaN         0.79       NaN        NaN   
 
    二氧化硫(SO2)  
 0       0.39  
 1        NaN  
 2        NaN  
 3        NaN  
 4       0.36  ,
   文物编号 表面风化  二氧化硅(SiO2)  氧化钠(Na2O)  氧化钾(K2O)  氧化钙(CaO)  氧化镁(MgO)  氧化铝(Al2O3)  \
 0   A1  无风化       78.45        NaN       NaN      6.08      1.86        7.23   
 1   A2   风化       37.75        NaN       NaN      7.63       NaN        2.33   
 2   A3  无风化       31.95        NaN      1.36      7.19      0.81        2.93   
 3   A4  无风化       35.47        NaN      0.79      2.89      1.05        7.07   
 4   A5   风化       64.29        1.2      0.37      1.64      2.34       12.75   
 
    氧化铁(Fe2O3)  氧化铜(CuO)  氧化铅(PbO)  氧化钡(BaO)  五氧化二磷(P2O5)  氧化锶(SrO)  氧化锡(SnO2)  \
 0        2.15      2.11       NaN       NaN         1.06      0.03        NaN   
 1         NaN       NaN     34.30       NaN        14.27       NaN        NaN   
 2        7.06      0.21     39.58      4.69         2.68      0.52        NaN   
 3        6.45      0.96     24.28      8.31         8.45      0.28        NaN   
 4        0.81      0.94     12.23      2.16         0.19      0.21       0.49   
 
    二氧化硫(SO2)  
 0       0.51  
 1        NaN  
 2        NaN  
 3        NaN  
 4        NaN  )
sheet2
文物采样点 二氧化硅(SiO2) 氧化钠(Na2O) 氧化钾(K2O) 氧化钙(CaO) 氧化镁(MgO) 氧化铝(Al2O3) 氧化铁(Fe2O3) 氧化铜(CuO) 氧化铅(PbO) 氧化钡(BaO) 五氧化二磷(P2O5) 氧化锶(SrO) 氧化锡(SnO2) 二氧化硫(SO2)
0 01 69.33 NaN 9.99 6.32 0.87 3.93 1.74 3.87 NaN NaN 1.17 NaN NaN 0.39
1 02 36.28 NaN 1.05 2.34 1.18 5.73 1.86 0.26 47.43 NaN 3.57 0.19 NaN NaN
2 03部位1 87.05 NaN 5.19 2.01 NaN 4.06 NaN 0.78 0.25 NaN 0.66 NaN NaN NaN
3 03部位2 61.71 NaN 12.37 5.87 1.11 5.50 2.16 5.09 1.41 2.86 0.70 0.10 NaN NaN
4 04 65.88 NaN 9.67 7.12 1.56 6.44 2.06 2.18 NaN NaN 0.79 NaN NaN 0.36
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
64 54严重风化点 17.11 NaN NaN NaN 1.11 3.65 NaN 1.34 58.46 NaN 14.13 1.12 NaN NaN
65 55 49.01 2.71 NaN 1.13 NaN 1.45 NaN 0.86 32.92 7.95 0.35 NaN NaN NaN
66 56 29.15 NaN NaN 1.21 NaN 1.85 NaN 0.79 41.25 15.45 2.54 NaN NaN NaN
67 57 25.42 NaN NaN 1.31 NaN 2.18 NaN 1.16 45.10 17.30 NaN NaN NaN NaN
68 58 30.39 NaN 0.34 3.49 0.79 3.52 0.86 3.13 39.35 7.66 8.99 0.24 NaN NaN

69 rows × 15 columns

component_cols = ['二氧化硅(SiO2)', '氧化钠(Na2O)', '氧化钾(K2O)', '氧化钙(CaO)', '氧化镁(MgO)', 
                  '氧化铝(Al2O3)', '氧化铁(Fe2O3)', '氧化铜(CuO)', '氧化铅(PbO)', '氧化钡(BaO)', 
                  '五氧化二磷(P2O5)', '氧化锶(SrO)', '氧化锡(SnO2)', '二氧化硫(SO2)']

sheet2 ['成分总和'] = sheet2 [component_cols].sum(axis=1)
sheet2 ['成分总和']

sheet2 = sheet2[(sheet2['成分总和'] >= 85) & (sheet2['成分总和'] <= 105)]
sheet2
sheet2 = sheet2.fillna(0)
# Normalize the chemical components to sum up to 100%
sheet2[component_cols] = sheet2[component_cols].div(sheet2[component_cols].sum(axis=1), axis=0) * 100

sheet2 ['成分总和'] = sheet2 [component_cols].sum(axis=1)
sheet2
文物采样点 二氧化硅(SiO2) 氧化钠(Na2O) 氧化钾(K2O) 氧化钙(CaO) 氧化镁(MgO) 氧化铝(Al2O3) 氧化铁(Fe2O3) 氧化铜(CuO) 氧化铅(PbO) 氧化钡(BaO) 五氧化二磷(P2O5) 氧化锶(SrO) 氧化锡(SnO2) 二氧化硫(SO2) 成分总和
0 01 71.027559 0.000000 10.234607 6.474746 0.891302 4.026227 1.782604 3.964758 0.000000 0.000000 1.198648 0.000000 0.0 0.399549 100.0
1 02 36.319952 0.000000 1.051156 2.342577 1.181299 5.736310 1.862048 0.260286 47.482230 0.000000 3.573931 0.190209 0.0 0.000000 100.0
2 03部位1 87.050000 0.000000 5.190000 2.010000 0.000000 4.060000 0.000000 0.780000 0.250000 0.000000 0.660000 0.000000 0.0 0.000000 100.0
3 03部位2 62.408981 0.000000 12.510113 5.936489 1.122573 5.562298 2.184466 5.147654 1.425971 2.892395 0.707929 0.101133 0.0 0.000000 100.0
4 04 68.582136 0.000000 10.066625 7.412034 1.623985 6.704143 2.144493 2.269415 0.000000 0.000000 0.822403 0.000000 0.0 0.374766 100.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
64 54严重风化点 17.653735 0.000000 0.000000 0.000000 1.145274 3.765993 0.000000 1.382584 60.317788 0.000000 14.579034 1.155592 0.0 0.000000 100.0
65 55 50.850799 2.811787 0.000000 1.172442 0.000000 1.504462 0.000000 0.892301 34.156464 8.248599 0.363146 0.000000 0.0 0.000000 100.0
66 56 31.602342 0.000000 0.000000 1.311795 0.000000 2.005637 0.000000 0.856461 44.720295 16.749783 2.753686 0.000000 0.0 0.000000 100.0
67 57 27.489997 0.000000 0.000000 1.416676 0.000000 2.357521 0.000000 1.254461 48.772575 18.708770 0.000000 0.000000 0.0 0.000000 100.0
68 58 30.771567 0.000000 0.344269 3.533819 0.799919 3.564196 0.870798 3.169299 39.844066 7.756177 9.102876 0.243013 0.0 0.000000 100.0

67 rows × 16 columns

sheet2_copy = sheet2.copy()
sheet2=sheet2_copy

# Define the new column names
new_component_cols = ['SiO2', 'Na2O', 'K2O', 'CaO', 'MgO', 
                      'Al2O3', 'Fe2O3', 'CuO', 'PbO', 'BaO', 
                      'P2O5', 'SrO', 'SnO2', 'SO2']

# Create a mapping from old column names to new column names
rename_dict = dict(zip(component_cols, new_component_cols))

# Rename the columns
sheet2.rename(columns=rename_dict, inplace=True)

# Check the updated column names
sheet2.columns
Index(['文物采样点', 'SiO2', 'Na2O', 'K2O', 'CaO', 'MgO', 'Al2O3', 'Fe2O3', 'CuO',
       'PbO', 'BaO', 'P2O5', 'SrO', 'SnO2', 'SO2', '成分总和'],
      dtype='object')

2.1 页表合并

# Merge sheet1 and sheet2 on 文物编号 (artifact number)
# First, we need to extract the 文物编号 from the 文物采样点 in sheet2
# We assume that the 文物编号 is the numeric part before any non-numeric character in the 文物采样点

# Import regular expression library
import re

# Define a function to extract 文物编号 from 文物采样点
def extract_number(s):
    match = re.match(r"(\d+)", s)
    return int(match.group()) if match else None

# Apply the function to the 文物采样点 column
sheet2['文物编号'] = sheet2['文物采样点'].apply(extract_number)

# Merge sheet1 and sheet2
data = pd.merge(sheet1, sheet2, on='文物编号')
# nan for zero

data
 
文物编号 纹饰 类型 颜色 表面风化 文物采样点 SiO2 Na2O K2O CaO ... Al2O3 Fe2O3 CuO PbO BaO P2O5 SrO SnO2 SO2 成分总和
0 1 C 高钾 蓝绿 无风化 01 71.027559 0.000000 10.234607 6.474746 ... 4.026227 1.782604 3.964758 0.000000 0.000000 1.198648 0.000000 0.0 0.399549 100.0
1 2 A 铅钡 浅蓝 风化 02 36.319952 0.000000 1.051156 2.342577 ... 5.736310 1.862048 0.260286 47.482230 0.000000 3.573931 0.190209 0.0 0.000000 100.0
2 3 A 高钾 蓝绿 无风化 03部位1 87.050000 0.000000 5.190000 2.010000 ... 4.060000 0.000000 0.780000 0.250000 0.000000 0.660000 0.000000 0.0 0.000000 100.0
3 3 A 高钾 蓝绿 无风化 03部位2 62.408981 0.000000 12.510113 5.936489 ... 5.562298 2.184466 5.147654 1.425971 2.892395 0.707929 0.101133 0.0 0.000000 100.0
4 4 A 高钾 蓝绿 无风化 04 68.582136 0.000000 10.066625 7.412034 ... 6.704143 2.144493 2.269415 0.000000 0.000000 0.822403 0.000000 0.0 0.374766 100.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
62 54 C 铅钡 浅蓝 风化 54严重风化点 17.653735 0.000000 0.000000 0.000000 ... 3.765993 0.000000 1.382584 60.317788 0.000000 14.579034 1.155592 0.0 0.000000 100.0
63 55 C 铅钡 绿 无风化 55 50.850799 2.811787 0.000000 1.172442 ... 1.504462 0.000000 0.892301 34.156464 8.248599 0.363146 0.000000 0.0 0.000000 100.0
64 56 C 铅钡 蓝绿 风化 56 31.602342 0.000000 0.000000 1.311795 ... 2.005637 0.000000 0.856461 44.720295 16.749783 2.753686 0.000000 0.0 0.000000 100.0
65 57 C 铅钡 蓝绿 风化 57 27.489997 0.000000 0.000000 1.416676 ... 2.357521 0.000000 1.254461 48.772575 18.708770 0.000000 0.000000 0.0 0.000000 100.0
66 58 C 铅钡 NaN 风化 58 30.771567 0.000000 0.344269 3.533819 ... 3.564196 0.870798 3.169299 39.844066 7.756177 9.102876 0.243013 0.0 0.000000 100.0

67 rows × 21 columns

data.drop(['颜色','纹饰','文物编号','成分总和'],axis=1,inplace=True)
data
类型 表面风化 文物采样点 SiO2 Na2O K2O CaO MgO Al2O3 Fe2O3 CuO PbO BaO P2O5 SrO SnO2 SO2
0 高钾 无风化 01 71.027559 0.000000 10.234607 6.474746 0.891302 4.026227 1.782604 3.964758 0.000000 0.000000 1.198648 0.000000 0.0 0.399549
1 铅钡 风化 02 36.319952 0.000000 1.051156 2.342577 1.181299 5.736310 1.862048 0.260286 47.482230 0.000000 3.573931 0.190209 0.0 0.000000
2 高钾 无风化 03部位1 87.050000 0.000000 5.190000 2.010000 0.000000 4.060000 0.000000 0.780000 0.250000 0.000000 0.660000 0.000000 0.0 0.000000
3 高钾 无风化 03部位2 62.408981 0.000000 12.510113 5.936489 1.122573 5.562298 2.184466 5.147654 1.425971 2.892395 0.707929 0.101133 0.0 0.000000
4 高钾 无风化 04 68.582136 0.000000 10.066625 7.412034 1.623985 6.704143 2.144493 2.269415 0.000000 0.000000 0.822403 0.000000 0.0 0.374766
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
62 铅钡 风化 54严重风化点 17.653735 0.000000 0.000000 0.000000 1.145274 3.765993 0.000000 1.382584 60.317788 0.000000 14.579034 1.155592 0.0 0.000000
63 铅钡 无风化 55 50.850799 2.811787 0.000000 1.172442 0.000000 1.504462 0.000000 0.892301 34.156464 8.248599 0.363146 0.000000 0.0 0.000000
64 铅钡 风化 56 31.602342 0.000000 0.000000 1.311795 0.000000 2.005637 0.000000 0.856461 44.720295 16.749783 2.753686 0.000000 0.0 0.000000
65 铅钡 风化 57 27.489997 0.000000 0.000000 1.416676 0.000000 2.357521 0.000000 1.254461 48.772575 18.708770 0.000000 0.000000 0.0 0.000000
66 铅钡 风化 58 30.771567 0.000000 0.344269 3.533819 0.799919 3.564196 0.870798 3.169299 39.844066 7.756177 9.102876 0.243013 0.0 0.000000

67 rows × 17 columns

data.shape
(67, 17)
#data.to_excel('E:\\数学建模国赛\\2022数学建模赛题\\C题\\一二表单合并数据.xlsx', index=True)
data
类型 表面风化 文物采样点 SiO2 Na2O K2O CaO MgO Al2O3 Fe2O3 CuO PbO BaO P2O5 SrO SnO2 SO2
0 高钾 无风化 01 71.027559 0.000000 10.234607 6.474746 0.891302 4.026227 1.782604 3.964758 0.000000 0.000000 1.198648 0.000000 0.0 0.399549
1 铅钡 风化 02 36.319952 0.000000 1.051156 2.342577 1.181299 5.736310 1.862048 0.260286 47.482230 0.000000 3.573931 0.190209 0.0 0.000000
2 高钾 无风化 03部位1 87.050000 0.000000 5.190000 2.010000 0.000000 4.060000 0.000000 0.780000 0.250000 0.000000 0.660000 0.000000 0.0 0.000000
3 高钾 无风化 03部位2 62.408981 0.000000 12.510113 5.936489 1.122573 5.562298 2.184466 5.147654 1.425971 2.892395 0.707929 0.101133 0.0 0.000000
4 高钾 无风化 04 68.582136 0.000000 10.066625 7.412034 1.623985 6.704143 2.144493 2.269415 0.000000 0.000000 0.822403 0.000000 0.0 0.374766
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
62 铅钡 风化 54严重风化点 17.653735 0.000000 0.000000 0.000000 1.145274 3.765993 0.000000 1.382584 60.317788 0.000000 14.579034 1.155592 0.0 0.000000
63 铅钡 无风化 55 50.850799 2.811787 0.000000 1.172442 0.000000 1.504462 0.000000 0.892301 34.156464 8.248599 0.363146 0.000000 0.0 0.000000
64 铅钡 风化 56 31.602342 0.000000 0.000000 1.311795 0.000000 2.005637 0.000000 0.856461 44.720295 16.749783 2.753686 0.000000 0.0 0.000000
65 铅钡 风化 57 27.489997 0.000000 0.000000 1.416676 0.000000 2.357521 0.000000 1.254461 48.772575 18.708770 0.000000 0.000000 0.0 0.000000
66 铅钡 风化 58 30.771567 0.000000 0.344269 3.533819 0.799919 3.564196 0.870798 3.169299 39.844066 7.756177 9.102876 0.243013 0.0 0.000000

67 rows × 17 columns

2.2 数据的正态性检验

"""
对于某些统计分析,如回归分析,数据的正态性是一种关键的假设。
然而,是否需要进行这种变换取决于数据本身的特性和分析目标。
现在,让我们查看一下数据
对于您的数据,考虑到它是化学成分数据,并且从前面的分析中我们看到数据的分布并不完全是正态的,
我建议在中心化对数比变换后进行分析。这样可以确保数据满足统计分析的假设,并能更好地处理组成数据的特性。
"""
# 正态性检验,查看一下这些化学元素的分布。
import matplotlib.pyplot as plt

# Select only the columns that are numeric and not categorical
numeric_cols = data.select_dtypes(include='number').columns

2.2.1数据的正态性检验效果图

# Plot histograms for each numeric column
fig, axs = plt.subplots(len(numeric_cols), figsize=(10, len(numeric_cols)*3))

for i, col in enumerate(numeric_cols):
    axs[i].hist(data[col].dropna(), bins=30, color='skyblue', edgecolor='black', alpha=0.7)
    axs[i].set_title(f'Histogram of {col}')

plt.tight_layout()
plt.show()

2022数学建模国赛C题官网展示论文C155论文复现_第3张图片

data_raw=data.copy()
data
类型 表面风化 文物采样点 SiO2 Na2O K2O CaO MgO Al2O3 Fe2O3 CuO PbO BaO P2O5 SrO SnO2 SO2
0 高钾 无风化 01 71.027559 0.000000 10.234607 6.474746 0.891302 4.026227 1.782604 3.964758 0.000000 0.000000 1.198648 0.000000 0.0 0.399549
1 铅钡 风化 02 36.319952 0.000000 1.051156 2.342577 1.181299 5.736310 1.862048 0.260286 47.482230 0.000000 3.573931 0.190209 0.0 0.000000
2 高钾 无风化 03部位1 87.050000 0.000000 5.190000 2.010000 0.000000 4.060000 0.000000 0.780000 0.250000 0.000000 0.660000 0.000000 0.0 0.000000
3 高钾 无风化 03部位2 62.408981 0.000000 12.510113 5.936489 1.122573 5.562298 2.184466 5.147654 1.425971 2.892395 0.707929 0.101133 0.0 0.000000
4 高钾 无风化 04 68.582136 0.000000 10.066625 7.412034 1.623985 6.704143 2.144493 2.269415 0.000000 0.000000 0.822403 0.000000 0.0 0.374766
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
62 铅钡 风化 54严重风化点 17.653735 0.000000 0.000000 0.000000 1.145274 3.765993 0.000000 1.382584 60.317788 0.000000 14.579034 1.155592 0.0 0.000000
63 铅钡 无风化 55 50.850799 2.811787 0.000000 1.172442 0.000000 1.504462 0.000000 0.892301 34.156464 8.248599 0.363146 0.000000 0.0 0.000000
64 铅钡 风化 56 31.602342 0.000000 0.000000 1.311795 0.000000 2.005637 0.000000 0.856461 44.720295 16.749783 2.753686 0.000000 0.0 0.000000
65 铅钡 风化 57 27.489997 0.000000 0.000000 1.416676 0.000000 2.357521 0.000000 1.254461 48.772575 18.708770 0.000000 0.000000 0.0 0.000000
66 铅钡 风化 58 30.771567 0.000000 0.344269 3.533819 0.799919 3.564196 0.870798 3.169299 39.844066 7.756177 9.102876 0.243013 0.0 0.000000

67 rows × 17 columns

"""
正态性检验,们将使用 Shapiro-Wilk 测试来检查每个化学成分的正态性。
这是一种常用的正态性检验方法,它的零假设是数据来自正态分布。
如果 p 值小于 0.05,我们将拒绝零假设,即认为数据不符合正态分布。
"""
from scipy.stats import shapiro, levene

# Initialize an empty dataframe to store the test results
test_results = pd.DataFrame()

# Loop over each numeric column
for col in numeric_cols[0:]:
    # Initialize an empty dict to store the results for this variable
    col_results = {'Variable': col}
    
    # Normality test
    # Drop NA values before performing the test
    _, p_normal = shapiro(data[col].dropna())
    col_results['Normality p-value'] = p_normal
    col_results['Normal'] = p_normal > 0.05
    
    # Variance equality test (only if the data is normal)
    if col_results['Normal']:
        _, p_equal_var = levene(data.loc[data['表面风化'] == '无风化', col].dropna(), 
                                data.loc[data['表面风化'] == '风化', col].dropna())
        col_results['Equal var p-value'] = p_equal_var
        col_results['Equal var'] = p_equal_var > 0.05
    
    # Append the results to the dataframe
    test_results = test_results.append(col_results, ignore_index=True)

# Now, the test_results dataframe contains the p-values for normality and equal variances
# for each numeric variable, without any transformation applied to the data.

C:\Users\chen'bu'rong\AppData\Local\Temp\ipykernel_15024\777781528.py:30: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
  test_results = test_results.append(col_results, ignore_index=True)
C:\Users\chen'bu'rong\AppData\Local\Temp\ipykernel_15024\777781528.py:30: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
  test_results = test_results.append(col_results, ignore_index=True)
C:\Users\chen'bu'rong\AppData\Local\Temp\ipykernel_15024\777781528.py:30: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
  test_results = test_results.append(col_results, ignore_index=True)
C:\Users\chen'bu'rong\AppData\Local\Temp\ipykernel_15024\777781528.py:30: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
  test_results = test_results.append(col_results, ignore_index=True)
C:\Users\chen'bu'rong\AppData\Local\Temp\ipykernel_15024\777781528.py:30: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
  test_results = test_results.append(col_results, ignore_index=True)
C:\Users\chen'bu'rong\AppData\Local\Temp\ipykernel_15024\777781528.py:30: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
  test_results = test_results.append(col_results, ignore_index=True)
C:\Users\chen'bu'rong\AppData\Local\Temp\ipykernel_15024\777781528.py:30: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
  test_results = test_results.append(col_results, ignore_index=True)
C:\Users\chen'bu'rong\AppData\Local\Temp\ipykernel_15024\777781528.py:30: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
  test_results = test_results.append(col_results, ignore_index=True)
C:\Users\chen'bu'rong\AppData\Local\Temp\ipykernel_15024\777781528.py:30: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
  test_results = test_results.append(col_results, ignore_index=True)
C:\Users\chen'bu'rong\AppData\Local\Temp\ipykernel_15024\777781528.py:30: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
  test_results = test_results.append(col_results, ignore_index=True)
C:\Users\chen'bu'rong\AppData\Local\Temp\ipykernel_15024\777781528.py:30: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
  test_results = test_results.append(col_results, ignore_index=True)
C:\Users\chen'bu'rong\AppData\Local\Temp\ipykernel_15024\777781528.py:30: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
  test_results = test_results.append(col_results, ignore_index=True)
C:\Users\chen'bu'rong\AppData\Local\Temp\ipykernel_15024\777781528.py:30: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
  test_results = test_results.append(col_results, ignore_index=True)
C:\Users\chen'bu'rong\AppData\Local\Temp\ipykernel_15024\777781528.py:30: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
  test_results = test_results.append(col_results, ignore_index=True)
test_results
Variable Normality p-value Normal Equal var p-value Equal var
0 SiO2 5.434923e-02 True 0.009129 False
1 Na2O 5.631047e-13 False NaN NaN
2 K2O 2.218287e-13 False NaN NaN
3 CaO 8.905178e-06 False NaN NaN
4 MgO 1.066307e-05 False NaN NaN
5 Al2O3 1.085733e-06 False NaN NaN
6 Fe2O3 1.809425e-09 False NaN NaN
7 CuO 3.633815e-09 False NaN NaN
8 PbO 7.531955e-04 False NaN NaN
9 BaO 7.773099e-08 False NaN NaN
10 P2O5 4.346846e-09 False NaN NaN
11 SrO 6.648307e-06 False NaN NaN
12 SnO2 8.658932e-17 False NaN NaN
13 SO2 5.878219e-17 False NaN NaN
data
类型 表面风化 文物采样点 SiO2 Na2O K2O CaO MgO Al2O3 Fe2O3 CuO PbO BaO P2O5 SrO SnO2 SO2
0 高钾 无风化 01 71.027559 0.000000 10.234607 6.474746 0.891302 4.026227 1.782604 3.964758 0.000000 0.000000 1.198648 0.000000 0.0 0.399549
1 铅钡 风化 02 36.319952 0.000000 1.051156 2.342577 1.181299 5.736310 1.862048 0.260286 47.482230 0.000000 3.573931 0.190209 0.0 0.000000
2 高钾 无风化 03部位1 87.050000 0.000000 5.190000 2.010000 0.000000 4.060000 0.000000 0.780000 0.250000 0.000000 0.660000 0.000000 0.0 0.000000
3 高钾 无风化 03部位2 62.408981 0.000000 12.510113 5.936489 1.122573 5.562298 2.184466 5.147654 1.425971 2.892395 0.707929 0.101133 0.0 0.000000
4 高钾 无风化 04 68.582136 0.000000 10.066625 7.412034 1.623985 6.704143 2.144493 2.269415 0.000000 0.000000 0.822403 0.000000 0.0 0.374766
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
62 铅钡 风化 54严重风化点 17.653735 0.000000 0.000000 0.000000 1.145274 3.765993 0.000000 1.382584 60.317788 0.000000 14.579034 1.155592 0.0 0.000000
63 铅钡 无风化 55 50.850799 2.811787 0.000000 1.172442 0.000000 1.504462 0.000000 0.892301 34.156464 8.248599 0.363146 0.000000 0.0 0.000000
64 铅钡 风化 56 31.602342 0.000000 0.000000 1.311795 0.000000 2.005637 0.000000 0.856461 44.720295 16.749783 2.753686 0.000000 0.0 0.000000
65 铅钡 风化 57 27.489997 0.000000 0.000000 1.416676 0.000000 2.357521 0.000000 1.254461 48.772575 18.708770 0.000000 0.000000 0.0 0.000000
66 铅钡 风化 58 30.771567 0.000000 0.344269 3.533819 0.799919 3.564196 0.870798 3.169299 39.844066 7.756177 9.102876 0.243013 0.0 0.000000

67 rows × 17 columns

2.3不满足正态性,进行中心化对数比变换

from scipy.stats.mstats import gmean

data_centralized = data.copy()

# 选择数值列
numeric_data = data_centralized.select_dtypes(include='number')

# 计算每一行的非零元素的几何均值
geo_means = []
for index, row in numeric_data.iterrows():
    non_zero_values = row[row > 0]
    geo_mean = gmean(non_zero_values) if len(non_zero_values) > 0 else 1e-6
    geo_means.append(geo_mean)

# 将每个值除以其所在行的非零元素的几何均值,并取对数
for col in numeric_data.columns:
    data_centralized[col] = np.log(numeric_data[col] / geo_means)

data_centralized.head()

D:\py1.1\envs\pytorch\lib\site-packages\pandas\core\arraylike.py:402: RuntimeWarning: divide by zero encountered in log
  result = getattr(ufunc, method)(*inputs, **kwargs)
类型 表面风化 文物采样点 SiO2 Na2O K2O CaO MgO Al2O3 Fe2O3 CuO PbO BaO P2O5 SrO SnO2 SO2
0 高钾 无风化 01 3.045978 -inf 1.108685 0.650820 -1.332161 0.175740 -0.639014 0.160355 -inf -inf -1.035896 -inf -inf -2.134508
1 铅钡 风化 02 2.676664 -inf -0.865813 -0.064452 -0.749089 0.831113 -0.294026 -2.261677 2.944652 -inf 0.357963 -2.575334 -inf -inf
2 高钾 无风化 03部位1 3.586159 -inf 0.766410 -0.182189 -inf 0.520860 -inf -1.128785 -2.266618 -inf -1.295839 -inf -inf -inf
3 高钾 无风化 03部位2 3.090699 -inf 1.483527 0.738107 -0.927387 0.673001 -0.261639 0.595531 -0.688158 0.019074 -1.388422 -3.334332 -inf -inf
4 高钾 无风化 04 2.968764 -inf 1.049957 0.743836 -0.774386 0.643457 -0.496365 -0.439747 -inf -inf -1.454794 -inf -inf -2.240723

2.3.1 核心步骤-inf用0值替换

# Replace -inf values with NaN for visualization purposes
#plt.rcParams['font.family'] = 'DejaVu Sans'
selected_cols=new_component_cols
data_centralized.replace(-np.inf, 0, inplace=True)
data_centralized

类型 表面风化 文物采样点 SiO2 Na2O K2O CaO MgO Al2O3 Fe2O3 CuO PbO BaO P2O5 SrO SnO2 SO2
0 高钾 无风化 01 3.045978 0.000000 1.108685 0.650820 -1.332161 0.175740 -0.639014 0.160355 0.000000 0.000000 -1.035896 0.000000 0.0 -2.134508
1 铅钡 风化 02 2.676664 0.000000 -0.865813 -0.064452 -0.749089 0.831113 -0.294026 -2.261677 2.944652 0.000000 0.357963 -2.575334 0.0 0.000000
2 高钾 无风化 03部位1 3.586159 0.000000 0.766410 -0.182189 0.000000 0.520860 0.000000 -1.128785 -2.266618 0.000000 -1.295839 0.000000 0.0 0.000000
3 高钾 无风化 03部位2 3.090699 0.000000 1.483527 0.738107 -0.927387 0.673001 -0.261639 0.595531 -0.688158 0.019074 -1.388422 -3.334332 0.0 0.000000
4 高钾 无风化 04 2.968764 0.000000 1.049957 0.743836 -0.774386 0.643457 -0.496365 -0.439747 0.000000 0.000000 -1.454794 0.000000 0.0 -2.240723
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
62 铅钡 风化 54严重风化点 1.216607 0.000000 0.000000 0.000000 -1.518696 -0.328329 0.000000 -1.330386 2.445287 0.000000 1.025244 -1.509727 0.0 0.000000
63 铅钡 无风化 55 2.673354 -0.221722 0.000000 -1.096453 0.000000 -0.847107 0.000000 -1.369493 2.275410 0.854502 -2.268492 0.000000 0.0 0.000000
64 铅钡 风化 56 1.753603 0.000000 0.000000 -1.428231 0.000000 -1.003666 0.000000 -1.854574 2.100799 1.118757 -0.686688 0.000000 0.0 0.000000
65 铅钡 风化 57 1.386720 0.000000 0.000000 -1.578789 0.000000 -1.069491 0.000000 -1.700396 1.960066 1.001890 0.000000 0.000000 0.0 0.000000
66 铅钡 风化 58 2.316326 0.000000 -2.176597 0.152115 -1.333510 0.160674 -1.248610 0.043246 2.574709 0.938225 1.098326 -2.524904 0.0 0.000000

67 rows × 17 columns

data
类型 表面风化 文物采样点 SiO2 Na2O K2O CaO MgO Al2O3 Fe2O3 CuO PbO BaO P2O5 SrO SnO2 SO2
0 高钾 无风化 01 71.027559 0.000000 10.234607 6.474746 0.891302 4.026227 1.782604 3.964758 0.000000 0.000000 1.198648 0.000000 0.0 0.399549
1 铅钡 风化 02 36.319952 0.000000 1.051156 2.342577 1.181299 5.736310 1.862048 0.260286 47.482230 0.000000 3.573931 0.190209 0.0 0.000000
2 高钾 无风化 03部位1 87.050000 0.000000 5.190000 2.010000 0.000000 4.060000 0.000000 0.780000 0.250000 0.000000 0.660000 0.000000 0.0 0.000000
3 高钾 无风化 03部位2 62.408981 0.000000 12.510113 5.936489 1.122573 5.562298 2.184466 5.147654 1.425971 2.892395 0.707929 0.101133 0.0 0.000000
4 高钾 无风化 04 68.582136 0.000000 10.066625 7.412034 1.623985 6.704143 2.144493 2.269415 0.000000 0.000000 0.822403 0.000000 0.0 0.374766
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
62 铅钡 风化 54严重风化点 17.653735 0.000000 0.000000 0.000000 1.145274 3.765993 0.000000 1.382584 60.317788 0.000000 14.579034 1.155592 0.0 0.000000
63 铅钡 无风化 55 50.850799 2.811787 0.000000 1.172442 0.000000 1.504462 0.000000 0.892301 34.156464 8.248599 0.363146 0.000000 0.0 0.000000
64 铅钡 风化 56 31.602342 0.000000 0.000000 1.311795 0.000000 2.005637 0.000000 0.856461 44.720295 16.749783 2.753686 0.000000 0.0 0.000000
65 铅钡 风化 57 27.489997 0.000000 0.000000 1.416676 0.000000 2.357521 0.000000 1.254461 48.772575 18.708770 0.000000 0.000000 0.0 0.000000
66 铅钡 风化 58 30.771567 0.000000 0.344269 3.533819 0.799919 3.564196 0.870798 3.169299 39.844066 7.756177 9.102876 0.243013 0.0 0.000000

67 rows × 17 columns

2.3.2中心化对数比变换效果图

# Visual comparison between raw data and centralized log ratio transformed data for selected columns
plt.rcParams['font.family'] = 'DejaVu Sans'
fig, axs = plt.subplots(len(selected_cols), 2, figsize=(15, len(selected_cols)*3))

for i, col in enumerate(selected_cols):
    # Plot raw data
    axs[i, 0].hist(data_raw[col].dropna(), bins=30, color='skyblue', edgecolor='black', alpha=0.7)
    axs[i, 0].set_title(f'Raw data: {col}')
    
    # Plot centralized log ratio transformed data
    axs[i, 1].hist(data_centralized[col].dropna(), bins=30, color='salmon', edgecolor='black', alpha=0.7)
    axs[i, 1].set_title(f'Centralized Log Ratio: {col}')

plt.tight_layout()
plt.show()

2022数学建模国赛C题官网展示论文C155论文复现_第4张图片

#data_centralized.to_excel('E:\\数学建模国赛\\2022数学建模赛题\\C题\\一二表单合并对数中心化转换数据.xlsx', index=True)
data=data_centralized
# Count the unique values in the '类型' and '表面风化' columns
glass_types = data['类型'].unique()
weathering_states = data['表面风化'].unique()

glass_types, weathering_states

(array(['高钾', '铅钡'], dtype=object), array(['无风化', '风化'], dtype=object))
# Initialize an empty DataFrame to store the results
grouped_stats = pd.DataFrame()


component_cols = ['SiO2', 'Na2O', 'K2O', 'CaO', 'MgO', 
                      'Al2O3', 'Fe2O3', 'CuO', 'PbO', 'BaO', 
                      'P2O5', 'SrO', 'SnO2', 'SO2']
# Calculate descriptive statistics for each chemical component
for component in component_cols:
    component_data = data.groupby(['类型', '表面风化'])[component]
    stats = component_data.agg(['mean', 'max', 'min', 'std', 'var', 'skew'])
    stats['kurt'] = component_data.apply(pd.DataFrame.kurt)
    stats['cv'] = stats['std'] / stats['mean']  # calculate coefficient of variation
    # Add a level to column names
    stats.columns = pd.MultiIndex.from_product([[component], stats.columns])
    grouped_stats = pd.concat([grouped_stats, stats], axis=1)

grouped_stats
SiO2 Na2O ... SnO2 SO2
mean max min std var skew kurt cv mean max ... kurt cv mean max min std var skew kurt cv
类型 表面风化
铅钡 无风化 3.013743 3.871521 1.859524 0.646195 0.417567 -0.301305 -0.956815 0.214416 0.071131 0.876318 ... 3.253187 -2.441987 0.020569 0.267396 0.000000 0.074162 0.005500 3.605551 13.000000 3.605551
风化 2.242329 3.937307 -0.131353 0.923780 0.853370 -0.584811 0.650707 0.411973 0.013371 1.043858 ... 13.632917 -3.664983 0.028021 1.369229 -0.796562 0.336451 0.113199 2.108909 9.857280 12.007019
高钾 无风化 3.165687 3.712288 2.266609 0.363205 0.131918 -1.093726 3.036563 0.114732 -0.013585 0.320182 ... 12.000000 -3.464102 -0.507620 0.000000 -2.240723 0.925901 0.857292 -1.388056 -0.011455 -1.824002
风化 4.187045 4.372977 3.830498 0.187388 0.035114 -1.731995 3.641136 0.044754 0.000000 0.000000 ... 0.000000 NaN 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 NaN

4 rows × 112 columns

# Adjusting the code to avoid renaming columns, instead we will capture the group information in the DataFrame index
tables_dict = {}

for glass_type in glass_types:
    for weathering_state in weathering_states:
        subset = grouped_stats.loc[glass_type, weathering_state].unstack().T
        table_name = f"{glass_type}_{weathering_state}"
        tables_dict[table_name] = pd.DataFrame(subset)  # 显式地转换为pd.DataFrame
        
# Looping through the tables_dict and outputting each DataFrame

tables_dict

{'高钾_无风化':          Al2O3       BaO       CaO       CuO     Fe2O3       K2O       MgO  \
 cv    0.664393 -1.972230  0.893838 -2.321136 -1.626433  0.473990 -0.700958   
 kurt -1.409964  3.016385 -0.156702  1.577446  0.472540  1.635379 -1.292382   
 max   1.508084  0.019074  1.647769  0.595531  0.747950  2.210662  0.000000   
 mean  0.776104 -0.179823  0.599071 -0.262942 -0.390464  1.145963 -0.674968   
 min   0.006978 -1.080913 -0.182189 -1.652716 -1.590841  0.000000 -1.332161   
 skew -0.031480 -1.906416  0.378894 -1.180633 -0.394538 -0.184857  0.061519   
 std   0.515638  0.354653  0.535473  0.610324  0.635064  0.543175  0.473124   
 var   0.265882  0.125778  0.286731  0.372495  0.403306  0.295039  0.223846   
 
            Na2O      P2O5       PbO       SO2      SiO2       SnO2       SrO  
 cv   -19.285768 -0.979906 -1.116780 -1.824002  0.114732  -3.464102 -1.050200  
 kurt   7.015733  0.317255 -1.629147 -0.011455  3.036563  12.000000 -2.376521  
 max    0.320182  0.526955  0.000000  0.000000  3.712288   0.000000  0.000000  
 mean  -0.013585 -0.938500 -0.987338 -0.507620  3.165687  -0.007795 -1.723790  
 min   -0.760277 -2.730275 -2.672140 -2.240723  2.266609  -0.093536 -3.774602  
 skew  -2.150622  0.057567 -0.552251 -1.388056 -1.093726  -3.464102 -0.037176  
 std    0.262001  0.919641  1.102639  0.925901  0.363205   0.027002  1.810324  
 var    0.068645  0.845740  1.215812  0.857292  0.131918   0.000729  3.277274  ,
 '高钾_风化':          Al2O3  BaO       CaO       CuO     Fe2O3       K2O       MgO  Na2O  \
 cv    2.498627  NaN -0.962261 -8.191497 -0.250545 -0.997049 -1.572791   NaN   
 kurt  0.025390  0.0  2.287842  0.619598  1.095297 -0.867476 -1.112631   0.0   
 max   0.961580  0.0  0.215634  0.477459 -1.341006  0.000000  0.000000   0.0   
 mean  0.194529  0.0 -0.664817 -0.060020 -1.714985 -0.328478 -0.286859   0.0   
 min  -0.410081  0.0 -1.760008 -0.889020 -2.470072 -0.824068 -0.983686   0.0   
 skew  0.669913  0.0 -0.709483 -1.043688 -1.369695 -0.588570 -1.095736   0.0   
 std   0.486056  0.0  0.639727  0.491651  0.429681  0.327508  0.451170   0.0   
 var   0.236251  0.0  0.409251  0.241720  0.184626  0.107262  0.203554   0.0   
 
           P2O5  PbO  SO2      SiO2  SnO2  SrO  
 cv   -0.562597  NaN  NaN  0.044754   NaN  NaN  
 kurt  2.101884  0.0  0.0  3.641136   0.0  0.0  
 max   0.000000  0.0  0.0  4.372977   0.0  0.0  
 mean -1.326415  0.0  0.0  4.187045   0.0  0.0  
 min  -2.178840  0.0  0.0  3.830498   0.0  0.0  
 skew  1.134407  0.0  0.0 -1.731995   0.0  0.0  
 std   0.746238  0.0  0.0  0.187388   0.0  0.0  
 var   0.556871  0.0  0.0  0.035114   0.0  0.0  ,
 '铅钡_无风化':          Al2O3       BaO       CaO       CuO     Fe2O3       K2O       MgO  \
 cv    3.716292  0.352188 -0.987216 -1.103642 -2.376125 -0.899079 -1.163923   
 kurt  0.214284  1.405046 -0.671685 -0.661301  4.165086 -1.951127 -0.717171   
 max   0.901223  2.031090  0.340114  0.899535  0.554504  0.000000  0.000000   
 mean  0.138882  1.245669 -0.714861 -0.925721 -0.306467 -1.288085 -0.541147   
 min  -0.847107  0.260264 -1.990837 -2.580097 -2.264904 -2.915489 -1.822866   
 skew -0.716711 -0.562582  0.062455  0.086620 -1.989760  0.104047 -0.750761   
 std   0.516125  0.438710  0.705723  1.021664  0.728205  1.158091  0.629853   
 var   0.266385  0.192466  0.498044  1.043798  0.530282  1.341175  0.396715   
 
           Na2O      P2O5       PbO        SO2      SiO2      SnO2       SrO  
 cv    3.684555 -0.818040  0.266446   3.605551  0.214416 -2.441987 -0.893422  
 kurt  8.623783 -1.684970  6.556376  13.000000 -0.956815  3.253187 -2.023534  
 max   0.876318  0.000000  2.610837   0.267396  3.871521  0.000000  0.000000  
 mean  0.071131 -1.449052  2.160856   0.020569  3.013743 -0.311426 -1.114090  
 min  -0.221722 -3.201927  0.468937   0.000000  1.859524 -2.078030 -2.211561  
 skew  2.741762 -0.069394 -2.363412   3.605551 -0.301305 -2.182647  0.129023  
 std   0.262087  1.185383  0.575751   0.074162  0.646195  0.760497  0.995352  
 var   0.068690  1.405133  0.331490   0.005500  0.417567  0.578356  0.990726  ,
 '铅钡_风化':           Al2O3       BaO       CaO       CuO     Fe2O3       K2O       MgO  \
 cv   -11.231984  0.609170 -1.725044 -1.063677 -1.111626 -1.185000 -0.978333   
 kurt  -0.288489 -0.601793 -0.712685 -0.599524 -0.418132 -1.660810 -1.486424   
 max    2.042802  2.167893  0.497358  0.888513  0.000000  0.000000  0.000000   
 mean  -0.087576  1.035546 -0.375654 -0.824426 -0.723172 -0.967980 -0.693444   
 min   -1.826182 -0.181275 -1.877738 -2.764779 -2.575747 -2.970023 -1.841063   
 skew   0.155720 -0.126683 -0.661062  0.043889 -0.837263 -0.456970 -0.340389   
 std    0.983655  0.630823  0.648019  0.876923  0.803897  1.147056  0.678420   
 var    0.967578  0.397938  0.419929  0.768995  0.646251  1.315737  0.460253   
 
            Na2O       P2O5       PbO        SO2      SiO2       SnO2       SrO  
 cv    27.248350 -11.063796  0.221513  12.007019  0.411973  -3.664983 -0.415568  
 kurt   3.996993   1.120939 -0.753508   9.857280  0.650707  13.632917  1.634188  
 max    1.043858   1.188784  3.510396   1.369229  3.937307   0.000000  0.000000  
 mean   0.013371  -0.102296  2.402080   0.028021  2.242329  -0.119384 -1.827413  
 min   -1.093837  -3.229330  1.389649  -0.796562 -0.131353  -1.944122 -2.930869  
 skew  -0.038016  -1.253158  0.239303   2.108909 -0.584811  -3.788951  1.424906  
 std    0.364329   1.131785  0.532092   0.336451  0.923780   0.437542  0.759414  
 var    0.132736   1.280938  0.283122   0.113199  0.853370   0.191443  0.576710  }
'''
with pd.ExcelWriter('E:\\数学建模国赛\\2022数学建模赛题\\C题\\一二表单合并数据统计性分析.xlsx') as writer:
    for sheet_name, df in tables_dict.items():
        df.to_excel(writer, sheet_name=sheet_name,index=True)
'''
"\nwith pd.ExcelWriter('E:\\数学建模国赛\\2022数学建模赛题\\C题\\一二表单合并数据统计性分析.xlsx') as writer:\n    for sheet_name, df in tables_dict.items():\n        df.to_excel(writer, sheet_name=sheet_name,index=True)\n"

2.4描述性统计

tables_dict['高钾_无风化']
Al2O3 BaO CaO CuO Fe2O3 K2O MgO Na2O P2O5 PbO SO2 SiO2 SnO2 SrO
cv 0.664393 -1.972230 0.893838 -2.321136 -1.626433 0.473990 -0.700958 -19.285768 -0.979906 -1.116780 -1.824002 0.114732 -3.464102 -1.050200
kurt -1.409964 3.016385 -0.156702 1.577446 0.472540 1.635379 -1.292382 7.015733 0.317255 -1.629147 -0.011455 3.036563 12.000000 -2.376521
max 1.508084 0.019074 1.647769 0.595531 0.747950 2.210662 0.000000 0.320182 0.526955 0.000000 0.000000 3.712288 0.000000 0.000000
mean 0.776104 -0.179823 0.599071 -0.262942 -0.390464 1.145963 -0.674968 -0.013585 -0.938500 -0.987338 -0.507620 3.165687 -0.007795 -1.723790
min 0.006978 -1.080913 -0.182189 -1.652716 -1.590841 0.000000 -1.332161 -0.760277 -2.730275 -2.672140 -2.240723 2.266609 -0.093536 -3.774602
skew -0.031480 -1.906416 0.378894 -1.180633 -0.394538 -0.184857 0.061519 -2.150622 0.057567 -0.552251 -1.388056 -1.093726 -3.464102 -0.037176
std 0.515638 0.354653 0.535473 0.610324 0.635064 0.543175 0.473124 0.262001 0.919641 1.102639 0.925901 0.363205 0.027002 1.810324
var 0.265882 0.125778 0.286731 0.372495 0.403306 0.295039 0.223846 0.068645 0.845740 1.215812 0.857292 0.131918 0.000729 3.277274
tables_dict['高钾_风化']
Al2O3 BaO CaO CuO Fe2O3 K2O MgO Na2O P2O5 PbO SO2 SiO2 SnO2 SrO
cv 2.498627 NaN -0.962261 -8.191497 -0.250545 -0.997049 -1.572791 NaN -0.562597 NaN NaN 0.044754 NaN NaN
kurt 0.025390 0.0 2.287842 0.619598 1.095297 -0.867476 -1.112631 0.0 2.101884 0.0 0.0 3.641136 0.0 0.0
max 0.961580 0.0 0.215634 0.477459 -1.341006 0.000000 0.000000 0.0 0.000000 0.0 0.0 4.372977 0.0 0.0
mean 0.194529 0.0 -0.664817 -0.060020 -1.714985 -0.328478 -0.286859 0.0 -1.326415 0.0 0.0 4.187045 0.0 0.0
min -0.410081 0.0 -1.760008 -0.889020 -2.470072 -0.824068 -0.983686 0.0 -2.178840 0.0 0.0 3.830498 0.0 0.0
skew 0.669913 0.0 -0.709483 -1.043688 -1.369695 -0.588570 -1.095736 0.0 1.134407 0.0 0.0 -1.731995 0.0 0.0
std 0.486056 0.0 0.639727 0.491651 0.429681 0.327508 0.451170 0.0 0.746238 0.0 0.0 0.187388 0.0 0.0
var 0.236251 0.0 0.409251 0.241720 0.184626 0.107262 0.203554 0.0 0.556871 0.0 0.0 0.035114 0.0 0.0
tables_dict['铅钡_无风化']
Al2O3 BaO CaO CuO Fe2O3 K2O MgO Na2O P2O5 PbO SO2 SiO2 SnO2 SrO
cv 3.716292 0.352188 -0.987216 -1.103642 -2.376125 -0.899079 -1.163923 3.684555 -0.818040 0.266446 3.605551 0.214416 -2.441987 -0.893422
kurt 0.214284 1.405046 -0.671685 -0.661301 4.165086 -1.951127 -0.717171 8.623783 -1.684970 6.556376 13.000000 -0.956815 3.253187 -2.023534
max 0.901223 2.031090 0.340114 0.899535 0.554504 0.000000 0.000000 0.876318 0.000000 2.610837 0.267396 3.871521 0.000000 0.000000
mean 0.138882 1.245669 -0.714861 -0.925721 -0.306467 -1.288085 -0.541147 0.071131 -1.449052 2.160856 0.020569 3.013743 -0.311426 -1.114090
min -0.847107 0.260264 -1.990837 -2.580097 -2.264904 -2.915489 -1.822866 -0.221722 -3.201927 0.468937 0.000000 1.859524 -2.078030 -2.211561
skew -0.716711 -0.562582 0.062455 0.086620 -1.989760 0.104047 -0.750761 2.741762 -0.069394 -2.363412 3.605551 -0.301305 -2.182647 0.129023
std 0.516125 0.438710 0.705723 1.021664 0.728205 1.158091 0.629853 0.262087 1.185383 0.575751 0.074162 0.646195 0.760497 0.995352
var 0.266385 0.192466 0.498044 1.043798 0.530282 1.341175 0.396715 0.068690 1.405133 0.331490 0.005500 0.417567 0.578356 0.990726
tables_dict['铅钡_风化']
Al2O3 BaO CaO CuO Fe2O3 K2O MgO Na2O P2O5 PbO SO2 SiO2 SnO2 SrO
cv -11.231984 0.609170 -1.725044 -1.063677 -1.111626 -1.185000 -0.978333 27.248350 -11.063796 0.221513 12.007019 0.411973 -3.664983 -0.415568
kurt -0.288489 -0.601793 -0.712685 -0.599524 -0.418132 -1.660810 -1.486424 3.996993 1.120939 -0.753508 9.857280 0.650707 13.632917 1.634188
max 2.042802 2.167893 0.497358 0.888513 0.000000 0.000000 0.000000 1.043858 1.188784 3.510396 1.369229 3.937307 0.000000 0.000000
mean -0.087576 1.035546 -0.375654 -0.824426 -0.723172 -0.967980 -0.693444 0.013371 -0.102296 2.402080 0.028021 2.242329 -0.119384 -1.827413
min -1.826182 -0.181275 -1.877738 -2.764779 -2.575747 -2.970023 -1.841063 -1.093837 -3.229330 1.389649 -0.796562 -0.131353 -1.944122 -2.930869
skew 0.155720 -0.126683 -0.661062 0.043889 -0.837263 -0.456970 -0.340389 -0.038016 -1.253158 0.239303 2.108909 -0.584811 -3.788951 1.424906
std 0.983655 0.630823 0.648019 0.876923 0.803897 1.147056 0.678420 0.364329 1.131785 0.532092 0.336451 0.923780 0.437542 0.759414
var 0.967578 0.397938 0.419929 0.768995 0.646251 1.315737 0.460253 0.132736 1.280938 0.283122 0.113199 0.853370 0.191443 0.576710
'''
均值(Mean):
SiO2(二氧化硅): 在未风化的玻璃中,高钾玻璃的SiO2含量均值显著高于铅钡玻璃。
然而,风化过程中,两者的差异缩小,可能表明风化过程影响了SiO2的含量。
Al2O3(氧化铝): 未风化的玻璃中,高钾玻璃的氧化铝含量均值大于铅钡玻璃。
风化后,铅钡玻璃的氧化铝含量均值超过高钾玻璃,这可能反映了风化对氧化铝的显著影响。
标准差(Std)和变异系数(CV):
Na2O(氧化钠): 未风化玻璃中,铅钡玻璃的氧化钠含量均值较高,但风化后,高钾玻璃的氧化钠含量均值增加。
这可能表明风化过程改变了氧化钠的分布。
CaO(氧化钙): 在所有条件下,铅钡玻璃的氧化钙含量均值均大于高钾玻璃,反映了铅钡玻璃的特有组成。
偏度(Skew)和峰度(Kurt):
PbO(氧化铅)和BaO(氧化钡): 在高钾和铅钡玻璃之间,这些成分的分布偏度和峰度存在显著差异。
这可能反映了不同类型玻璃的结构差异和风化过程的不同影响。
特定元素观察:
二氧化硅 (SiO2): 未风化的高钾玻璃的二氧化硅含量约为铅钡玻璃的两倍,但风化后,两者的差异减小。
这可能反映了风化对二氧化硅含量的影响。
氧化铝 (Al2O3): 风化可能对氧化铝含量有显著影响,特别是在铅钡玻璃中。
'''
'\n均值(Mean):\nSiO2(二氧化硅): 在未风化的玻璃中,高钾玻璃的SiO2含量均值显著高于铅钡玻璃。\n然而,风化过程中,两者的差异缩小,可能表明风化过程影响了SiO2的含量。\nAl2O3(氧化铝): 未风化的玻璃中,高钾玻璃的氧化铝含量均值大于铅钡玻璃。\n风化后,铅钡玻璃的氧化铝含量均值超过高钾玻璃,这可能反映了风化对氧化铝的显著影响。\n标准差(Std)和变异系数(CV):\nNa2O(氧化钠): 未风化玻璃中,铅钡玻璃的氧化钠含量均值较高,但风化后,高钾玻璃的氧化钠含量均值增加。\n这可能表明风化过程改变了氧化钠的分布。\nCaO(氧化钙): 在所有条件下,铅钡玻璃的氧化钙含量均值均大于高钾玻璃,反映了铅钡玻璃的特有组成。\n偏度(Skew)和峰度(Kurt):\nPbO(氧化铅)和BaO(氧化钡): 在高钾和铅钡玻璃之间,这些成分的分布偏度和峰度存在显著差异。\n这可能反映了不同类型玻璃的结构差异和风化过程的不同影响。\n特定元素观察:\n二氧化硅 (SiO2): 未风化的高钾玻璃的二氧化硅含量约为铅钡玻璃的两倍,但风化后,两者的差异减小。\n这可能反映了风化对二氧化硅含量的影响。\n氧化铝 (Al2O3): 风化可能对氧化铝含量有显著影响,特别是在铅钡玻璃中。\n'

2.5 箱线图绘制

import matplotlib.pyplot as plt  # or another font that supports the special character
import seaborn as sns
plt.rcParams['font.family'] = 'DejaVu Sans'
# Correct the condition for each DataFrame
data_high_potassium_erosion = data[(data['类型'] == '高钾') & (data['表面风化'] == '风化')]
data_high_potassium_no_erosion = data[(data['类型'] == '高钾') & (data['表面风化'] == '无风化')]
data_lead_barium_erosion = data[(data['类型'] == '铅钡') & (data['表面风化'] == '风化')]
data_lead_barium_no_erosion = data[(data['类型'] == '铅钡') & (data['表面风化'] == '无风化')]

# Create a new DataFrame for boxplot
boxplot_data_high_potassium_erosion = data_high_potassium_erosion.melt(id_vars=['类型', '表面风化'], value_vars=component_cols)
boxplot_data_high_potassium_no_erosion = data_high_potassium_no_erosion.melt(id_vars=['类型', '表面风化'], value_vars=component_cols)
boxplot_data_lead_barium_erosion = data_lead_barium_erosion.melt(id_vars=['类型', '表面风化'], value_vars=component_cols)
boxplot_data_lead_barium_no_erosion = data_lead_barium_no_erosion.melt(id_vars=['类型', '表面风化'], value_vars=component_cols)
# Set the figure size
plt.figure(figsize=(20, 45))

# Create subplots
fig, axs = plt.subplots(2, 2, figsize=(12, 8))

# Reorder the data and titles to switch the positions of the plots
data_list = [boxplot_data_lead_barium_erosion, boxplot_data_lead_barium_no_erosion, boxplot_data_high_potassium_erosion, boxplot_data_high_potassium_no_erosion]
titles = ['Lead Barium Glass with Erosion', 'Lead Barium Glass without Erosion', 'High Potassium Glass with Erosion', 'High Potassium Glass without Erosion']

# Generate boxplots for each condition
for ax, data, title in zip(axs.flatten(), data_list, titles):
    sns.boxplot(y='variable', x='value', data=data, ax=ax, orient="h")
    ax.set_ylabel('Chemical Component')
    ax.set_xlabel('Content (%)')
    ax.set_title('{}'.format(title))
    ax.invert_yaxis() # Invert the y-axis labels

# Adjust layout
plt.tight_layout()
plt.show()


2022数学建模国赛C题官网展示论文C155论文复现_第5张图片

'''
铅钡玻璃:
风化前后的变化:

中位数下降: 大部分化学成分的中位数在风化过程中有所下降,特别是Al2O3、K2O、SiO2、CaO、MgO、Na2O。
这可能反映了风化过程中这些成分的流失。
离散程度下降: 这些成分的离散程度也在风化过程中减小,表明风化可能使这些成分的含量变得更一致。
特定化学成分观察:

铝氧化物 (Al2O3): 风化使铝氧化物的中位数下降,分布变得更紧密。
硅氧化物 (SiO2): 风化使硅氧化物的中位数下降,分布也更紧密。
钾氧化物 (K2O) 和 钠氧化物 (Na2O): 分布变得更紧密,中位数下降。
高钾玻璃:
风化前后的变化:

中位数下降: 大部分化学成分的中位数也在风化过程中下降,尤其是K2O和Na2O,与铅钡玻璃相似。
离散程度变化: 不同于铅钡玻璃,某些化学成分的分布在风化后变得更广,例如硅氧化物 (SiO2) 和钾氧化物 (K2O)。
特定化学成分观察:

铝氧化物 (Al2O3): 高钾玻璃的铝氧化物分布在风化后变得更广泛。
硅氧化物 (SiO2): 风化过程似乎没有明显改变硅氧化物的中位数,但分布变得更广。
钾氧化物 (K2O) 和 钠氧化物 (Na2O): 中位数大幅下降,分布更广。
总结:
通过这些箱线图,我们可以观察到风化过程对玻璃成分的具体影响。
对于铅钡玻璃和高钾玻璃,风化过程都可能导致某些成分的流失,但具体的影响程度可能会因为玻璃的类型和成分的种类而有所不同。
这些观察有助于我们理解风化对不同类型玻璃化学成分的影响,进而为文物保护和修复提供指导。
'''
'\n铅钡玻璃:\n风化前后的变化:\n\n中位数下降: 大部分化学成分的中位数在风化过程中有所下降,特别是Al2O3、K2O、SiO2、CaO、MgO、Na2O。\n这可能反映了风化过程中这些成分的流失。\n离散程度下降: 这些成分的离散程度也在风化过程中减小,表明风化可能使这些成分的含量变得更一致。\n特定化学成分观察:\n\n铝氧化物 (Al2O3): 风化使铝氧化物的中位数下降,分布变得更紧密。\n硅氧化物 (SiO2): 风化使硅氧化物的中位数下降,分布也更紧密。\n钾氧化物 (K2O) 和 钠氧化物 (Na2O): 分布变得更紧密,中位数下降。\n高钾玻璃:\n风化前后的变化:\n\n中位数下降: 大部分化学成分的中位数也在风化过程中下降,尤其是K2O和Na2O,与铅钡玻璃相似。\n离散程度变化: 不同于铅钡玻璃,某些化学成分的分布在风化后变得更广,例如硅氧化物 (SiO2) 和钾氧化物 (K2O)。\n特定化学成分观察:\n\n铝氧化物 (Al2O3): 高钾玻璃的铝氧化物分布在风化后变得更广泛。\n硅氧化物 (SiO2): 风化过程似乎没有明显改变硅氧化物的中位数,但分布变得更广。\n钾氧化物 (K2O) 和 钠氧化物 (Na2O): 中位数大幅下降,分布更广。\n总结:\n通过这些箱线图,我们可以观察到风化过程对玻璃成分的具体影响。\n对于铅钡玻璃和高钾玻璃,风化过程都可能导致某些成分的流失,但具体的影响程度可能会因为玻璃的类型和成分的种类而有所不同。\n这些观察有助于我们理解风化对不同类型玻璃化学成分的影响,进而为文物保护和修复提供指导。\n'

你可能感兴趣的:(数学建模python库,数学建模,python,scikit-learn)