文章目录
- 6 热图Heatmapplot
-
- 1. 基础热图绘制 Basic Heatmap plot
- 2. 热图外观设定 Customize seaborn heatmap
- 3. 热图上使用标准化 Use normalization on heatmap
- 4. 树状图与热图 Dendrogram with heatmap
6 热图Heatmapplot
(代码下载)
热图是指通过将矩阵单个的值表示为颜色的图形表示。热力图显示数值数据的一般视图非常有用,制作热图很简单,且不需要提取特定数据点。在seaborn中使用heatmap函数绘制热力图,此外我们也使用clustermap函数绘制树状图与热图。该章节主要内容有:
- 基础热图绘制 Basic Heatmap plot
- 热图外观设定 Customize seaborn heatmap
- 热图上使用标准化 Use normalization on heatmap
- 树状图与热图 Dendrogram with heatmap
import seaborn as sns
import pandas as pd
import numpy as np
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'
1. 基础热图绘制 Basic Heatmap plot
- 普通热图 Basic Heatmap
- 相关矩阵热图 Correlation matrix
- 相关矩阵半热图 an half heatmap of correlation matrix
- 多数据热力图制作 Basic Heatmap of long format data
df = pd.DataFrame(np.random.random((5,5)), columns=["a","b","c","d","e"])
df
p1 = sns.heatmap(df)
|
a |
b |
c |
d |
e |
0 |
0.260319 |
0.749665 |
0.534837 |
0.077599 |
0.645868 |
1 |
0.455260 |
0.088954 |
0.876201 |
0.468024 |
0.679460 |
2 |
0.422090 |
0.029897 |
0.652491 |
0.492516 |
0.112680 |
3 |
0.016669 |
0.979161 |
0.274547 |
0.093439 |
0.965549 |
4 |
0.039159 |
0.851814 |
0.794167 |
0.796855 |
0.109723 |
![[seaborn] seaborn学习笔记6-热图HEATMAPPLOT_第1张图片](http://img.e-com-net.com/image/info8/ea1f69fe28ef42fda75bdbddc358bd35.png)
df = pd.DataFrame(np.random.random((100,5)), columns=["a","b","c","d","e"])
df.head()
corr_matrix=df.corr()
corr_matrix
sns.heatmap(corr_matrix, cmap='PuOr')
|
a |
b |
c |
d |
e |
0 |
0.447492 |
0.083233 |
0.054378 |
0.528246 |
0.839064 |
1 |
0.966619 |
0.718003 |
0.584444 |
0.454353 |
0.319515 |
2 |
0.165938 |
0.500661 |
0.221050 |
0.304151 |
0.470321 |
3 |
0.012819 |
0.206002 |
0.317296 |
0.998902 |
0.546637 |
4 |
0.168106 |
0.935917 |
0.081234 |
0.652118 |
0.988459 |
|
a |
b |
c |
d |
e |
a |
1.000000 |
0.062998 |
0.219805 |
0.095833 |
0.160799 |
b |
0.062998 |
1.000000 |
0.173022 |
0.040480 |
-0.101984 |
c |
0.219805 |
0.173022 |
1.000000 |
-0.049702 |
-0.066863 |
d |
0.095833 |
0.040480 |
-0.049702 |
1.000000 |
0.179716 |
e |
0.160799 |
-0.101984 |
-0.066863 |
0.179716 |
1.000000 |
![[seaborn] seaborn学习笔记6-热图HEATMAPPLOT_第2张图片](http://img.e-com-net.com/image/info8/aa33b0aceedf4f7ab324addc53498941.png)
df = pd.DataFrame(np.random.random((100,5)), columns=["a","b","c","d","e"])
corr_matrix=df.corr()
mask = np.zeros_like(corr_matrix)
indices=np.triu_indices_from(mask)
indices
mask[np.triu_indices_from(mask)] = True
with sns.axes_style("white"):
p2 = sns.heatmap(corr_matrix, mask=mask, square=True)
(array([0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4], dtype=int64),
array([0, 1, 2, 3, 4, 1, 2, 3, 4, 2, 3, 4, 3, 4, 4], dtype=int64))
![[seaborn] seaborn学习笔记6-热图HEATMAPPLOT_第3张图片](http://img.e-com-net.com/image/info8/7a2038ad5f93485eb2c3bea052f4714f.png)
people=np.repeat(("A","B","C","D","E"),5)
feature=list(range(1,6))*5
value=np.random.random(25)
df=pd.DataFrame({
'feature': feature, 'people': people, 'value': value })
df_wide=df.pivot_table( index='people', columns='feature', values='value' )
p2=sns.heatmap( df_wide, square=True)
![[seaborn] seaborn学习笔记6-热图HEATMAPPLOT_第4张图片](http://img.e-com-net.com/image/info8/e909775c2cf74fa08bb10d52e7a82301.png)
2. 热图外观设定 Customize seaborn heatmap
- 单元格值的显示 Annotate each cell with value
- 自定义网格线 Custom grid lines
- 轴的显示 Remove X or Y labels
- 标签隐藏 Hide a few axis labels to avoid overlapping
- 颜色条坐标显示范围设置 Coordinate range setting of color bar
df = pd.DataFrame(np.random.random((10,10)), columns=["a","b","c","d","e","f","g","h","i","j"])
sns.heatmap(df, annot=True, annot_kws={
"size": 7});
![[seaborn] seaborn学习笔记6-热图HEATMAPPLOT_第5张图片](http://img.e-com-net.com/image/info8/e3406e16f08d4802b00dfd66d3b71d3d.jpg)
sns.heatmap(df, linewidths=2, linecolor='yellow');
![[seaborn] seaborn学习笔记6-热图HEATMAPPLOT_第6张图片](http://img.e-com-net.com/image/info8/c75c47746b614fe29791454c26a0e9b2.png)
sns.heatmap(df, yticklabels=False, cbar=False);
![[seaborn] seaborn学习笔记6-热图HEATMAPPLOT_第7张图片](http://img.e-com-net.com/image/info8/1b3ca355bd354a9f9b8adf90b79c851f.png)
sns.heatmap(df, xticklabels=3);
![[seaborn] seaborn学习笔记6-热图HEATMAPPLOT_第8张图片](http://img.e-com-net.com/image/info8/eaa822198c3641b3b282528d56ee2e2c.png)
sns.heatmap(df, vmin=0, vmax=0.5);
![[seaborn] seaborn学习笔记6-热图HEATMAPPLOT_第9张图片](http://img.e-com-net.com/image/info8/d51fe64f82664ec3b28e8361402df589.png)
3. 热图上使用标准化 Use normalization on heatmap
- 列的规范化 Column normalization
- 行的规范化 Row normalization
df = pd.DataFrame(np.random.randn(10,10) * 4 + 3)
df[1]=df[1]+40
sns.heatmap(df, cmap='viridis');
![[seaborn] seaborn学习笔记6-热图HEATMAPPLOT_第10张图片](http://img.e-com-net.com/image/info8/a7c29943fb49404aaa9996bab0c37c99.png)
df_norm_col=(df-df.mean())/df.std()
sns.heatmap(df_norm_col, cmap='viridis');
![[seaborn] seaborn学习笔记6-热图HEATMAPPLOT_第11张图片](http://img.e-com-net.com/image/info8/c4e409c8a0584399b6e90fad8bd28a22.png)
df = pd.DataFrame(np.random.randn(10,10) * 4 + 3)
df.iloc[2]=df.iloc[2]+40
sns.heatmap(df, cmap='viridis');
![[seaborn] seaborn学习笔记6-热图HEATMAPPLOT_第12张图片](http://img.e-com-net.com/image/info8/4f3c0664a3424a2f95fcbc1abce1f8b4.png)
df_norm_row=df.sub(df.mean(axis=1), axis=0)
df_norm_row=df_norm_row.div( df.std(axis=1), axis=0 )
sns.heatmap(df_norm_row, cmap='viridis');
![[seaborn] seaborn学习笔记6-热图HEATMAPPLOT_第13张图片](http://img.e-com-net.com/image/info8/bd6105226e6844399bdd9d3976cfeb2b.png)
4. 树状图与热图 Dendrogram with heatmap
- 基础树状图与热图绘制 Dendrogram with heat map and coloured leaves
- 树形图与热图规范化 normalize of Dendrogram with heatmap
- 树形图与热图距离参数设定 distance of Dendrogram with
- 树形图与热图聚类方法参数设定 cluster method of Dendrogram with heatmap
- 图像颜色设定 Change color palette
- 离群值设置 outliers set
树状图就是层次聚类的表现形式。层次聚类的合并算法通过计算两类数据点间的相似性,对所有数据点中最为相似的两个数据点进行组合,并反复迭代这一过程。简单的说层次聚类的合并算法是通过计算每一个类别的数据点与所有数据点之间的距离来确定它们之间的相似性,距离越小,相似度越高。并将距离最近的两个数据点或类别进行组合,生成聚类树。在树状图中通过线条连接表示两类数据的距离。
from matplotlib import pyplot as plt
import pandas as pd
url = 'https://python-graph-gallery.com/wp-content/uploads/mtcars.csv'
df = pd.read_csv(url)
df = df.set_index('model')
df.head()
|
mpg |
cyl |
disp |
hp |
drat |
wt |
qsec |
vs |
am |
gear |
carb |
model |
|
|
|
|
|
|
|
|
|
|
|
Mazda RX4 |
21.0 |
6 |
160.0 |
110 |
3.90 |
2.620 |
16.46 |
0 |
1 |
4 |
4 |
Mazda RX4 Wag |
21.0 |
6 |
160.0 |
110 |
3.90 |
2.875 |
17.02 |
0 |
1 |
4 |
4 |
Datsun 710 |
22.8 |
4 |
108.0 |
93 |
3.85 |
2.320 |
18.61 |
1 |
1 |
4 |
1 |
Hornet 4 Drive |
21.4 |
6 |
258.0 |
110 |
3.08 |
3.215 |
19.44 |
1 |
0 |
3 |
1 |
Hornet Sportabout |
18.7 |
8 |
360.0 |
175 |
3.15 |
3.440 |
17.02 |
0 |
0 |
3 |
2 |
my_palette = dict(zip(df.cyl.unique(), ["orange","yellow","brown"]))
my_palette
row_colors = df.cyl.map(my_palette)
row_colors
sns.clustermap(df, metric="correlation", method="single", cmap="Blues", standard_scale=1, row_colors=row_colors)
{6: 'orange', 4: 'yellow', 8: 'brown'}
model
Mazda RX4 orange
Mazda RX4 Wag orange
Datsun 710 yellow
Hornet 4 Drive orange
Hornet Sportabout brown
Valiant orange
Duster 360 brown
Merc 240D yellow
Merc 230 yellow
Merc 280 orange
Merc 280C orange
Merc 450SE brown
Merc 450SL brown
Merc 450SLC brown
Cadillac Fleetwood brown
Lincoln Continental brown
Chrysler Imperial brown
Fiat 128 yellow
Honda Civic yellow
Toyota Corolla yellow
Toyota Corona yellow
Dodge Challenger brown
AMC Javelin brown
Camaro Z28 brown
Pontiac Firebird brown
Fiat X1-9 yellow
Porsche 914-2 yellow
Lotus Europa yellow
Ford Pantera L brown
Ferrari Dino orange
Maserati Bora brown
Volvo 142E yellow
Name: cyl, dtype: object
![[seaborn] seaborn学习笔记6-热图HEATMAPPLOT_第14张图片](http://img.e-com-net.com/image/info8/d70f675a05cf4ce5abb14671370d3d2f.jpg)
sns.clustermap(df, standard_scale=1)
sns.clustermap(df, z_score=1)
![[seaborn] seaborn学习笔记6-热图HEATMAPPLOT_第15张图片](http://img.e-com-net.com/image/info8/b9f28b9c1af5466fa98ae9d6ffecffac.jpg)
![[seaborn] seaborn学习笔记6-热图HEATMAPPLOT_第16张图片](http://img.e-com-net.com/image/info8/361a86c283684f93b5003d44827ba575.jpg)
sns.clustermap(df, metric="correlation", standard_scale=1)
sns.clustermap(df, metric="euclidean", standard_scale=1)
![[seaborn] seaborn学习笔记6-热图HEATMAPPLOT_第17张图片](http://img.e-com-net.com/image/info8/9e1d3d1aa1e14bfcad588e99a8bb1b5e.jpg)
![[seaborn] seaborn学习笔记6-热图HEATMAPPLOT_第18张图片](http://img.e-com-net.com/image/info8/e605f0b1ec2844c0a518ae1446a2636f.jpg)
sns.clustermap(df, metric="euclidean", standard_scale=1, method="single")
sns.clustermap(df, metric="euclidean", standard_scale=1, method="ward")
![[seaborn] seaborn学习笔记6-热图HEATMAPPLOT_第19张图片](http://img.e-com-net.com/image/info8/9d893b047b3c463b9ddfb37755e52fa4.jpg)
![[seaborn] seaborn学习笔记6-热图HEATMAPPLOT_第20张图片](http://img.e-com-net.com/image/info8/5c4b22534b904abf937230bcf018dacd.jpg)
sns.clustermap(df, metric="euclidean", standard_scale=1, method="ward", cmap="mako")
sns.clustermap(df, metric="euclidean", standard_scale=1, method="ward", cmap="viridis")
![[seaborn] seaborn学习笔记6-热图HEATMAPPLOT_第21张图片](http://img.e-com-net.com/image/info8/030a492d0c4342afabc5941a6fd4f601.jpg)
![[seaborn] seaborn学习笔记6-热图HEATMAPPLOT_第22张图片](http://img.e-com-net.com/image/info8/f24ad2bdd3a944a2a200089f6a4e36d2.jpg)
df.iloc[15,5] = 1000
sns.clustermap(df, robust=True)
sns.clustermap(df, robust=False)
![[seaborn] seaborn学习笔记6-热图HEATMAPPLOT_第23张图片](http://img.e-com-net.com/image/info8/20b53d2a0eb34763a5a35d90ec3dcda3.jpg)
![[seaborn] seaborn学习笔记6-热图HEATMAPPLOT_第24张图片](http://img.e-com-net.com/image/info8/1878abb8b63442abbdecf2c6e91cedbe.jpg)