integration的初衷是想要去除数据的批次效应,但是在去除批次效应的同时有可能去除了数据之间本来就有的生物差异。scib这个方法就是从批次效应去除和生物差异保留这两方面来衡量integration的效果。
scib这篇文章同时还比较了现有的主流integration方法,在做了排名的同时还对不同应用场景下应该用哪个方法做了简易。可以说是非常好的一篇文章了。
Benchmarking atlas-level data integration in single-cell genomics
好了,现在贴一下我自己常用的评估函数。
def compute_scib_metrics(adata_post, emb_key, label_key, batch_key, model_name):
"""
Run neighbors first, this program takes a long time
:param adata_post:
:param emb_key:
:param label_key:
:param batch_key:
:param model_name:
:return:
"""
print('-' * 10 + 'start to compute scib metrics' + '-' * 10)
from scib.metrics.lisi import lisi_graph
from scib.metrics.silhouette import silhouette, silhouette_batch
from scib.metrics.isolated_labels import isolated_labels
from scib.metrics.kbet import kBET
import timeit
import numpy as np
import pandas as pd
start = timeit.default_timer()
order = ['clisi', 'sil_labels', 'isolated_labels', 'ilisi', 'sil_batch', 'kBET']
df = pd.DataFrame(index=[model_name], columns=order)
df["ilisi"], df["clisi"] = lisi_graph(adata_post, batch_key=batch_key, label_key=label_key)
df["sil_labels"] = silhouette(adata_post, group_key=label_key, embed=emb_key)
# if "dpt_pseudotime" in adata_pre.obs.columns:
# df["trajectory_conservation"] = trajectory_conservation(adata_pre, adata_post, label_key=label_key)
# else:
# df["trajectory_conservation"] = 'None'
df["isolated_labels"] = isolated_labels(adata_post, label_key=label_key, batch_key=batch_key, embed=emb_key)
df["sil_batch"] = silhouette_batch(adata_post, batch_key=batch_key, group_key=label_key, embed=emb_key)
df['kBET'] = kBET(adata_post, batch_key=batch_key, label_key=label_key, embed=emb_key)
l_bio = df.iloc[0].values[:3]
l_batch = df.iloc[0].values[3:]
overall_score = 0.6 * (np.mean(l_bio)) + 0.4 * (np.mean(l_batch))
df['overall_score'] = overall_score
end = timeit.default_timer()
print(str(end - start) + ' sec')
return df
使用了
‘clisi’, ‘sil_labels’, ‘isolated_labels’, ‘ilisi’, ‘sil_batch’, ‘kBET’
六个指标
前三个是生物保护性指标,后三个是批次效应去除指标
指标已经放缩到0——1,值越大效果越好
需要环境:
Linux or UNIX systemPython >= 3.7
R >= 3.6
需要包:
pip install scib
更详细的使用和说明大家可以看这里
scib