hello,大家好, 这一次给大家分享一下有关各个通讯软件之间的结果是否具有一致性,当然,做细胞通讯的软件非常多了,我也分享了很多,但是分享不是目的,用起来才是我们的终极目的,哪个软件该用,软件哪个好,优劣势都是什么,今天我们就来看一下。
Comparison of Resources and Methods to infer Cell-Cell Communication from Single-cell RNA Data
Abstract
1、做细胞通讯的软件很多,Each of them consists of a resource of intercellular interactions prior knowledge and a method to predict potential cell-cell communication events.(每个软件的配受体库和算法都不一样),Yet the impact of the choice of resource and method on the resulting predictions is largely unknown.
2、不同软件之间的分析比较,We found few unique interactions and a varying degree of overlap among the resources(配受体库的差异), and observed uneven coverage in terms of pathways and biological categories.
3、在用同一个数据进行测试的时候,We found major differences among the highest ranked intercellular interactions inferred by each method even when using the same resources.(方法之间的差异也很大)。
4、The varying predictions lead to fundamentally different biological interpretations, highlighting the need to benchmark resources and methods.(不同的软件分析出来的结果不一样,该用哪个???)
主要的结论
1、Different methods and resources provided notably different results(意料之中的事情,项目做的多了,早就发现了这个问题)。
2、The observed disagreement among the methods could have a considerable impact on the interpretation of results(结果不同,当然生物学解释就不同,用哪个呢??)。
Introduction
1、细胞通讯的意义,CCC commonly refers to interactions between secreted ligands and plasma membrane receptors(质膜受体 ). This picture can be broadened to include secreted enzymes, extracellular matrix proteins, transporters, and interactions that require the physical contact between cells, such as cell-cell adhesion proteins and gap junctions。CCC events are essential for homeostasis, development, and disease, and their estimation is becoming a routine approach in scRNA-seq data analysis(细胞通讯的研究确实非常重要)。
2.1、软件对于细胞通讯的预测,These CCC tools typically use gene expression information obtained by scRNA-Seq. In general, single cells are clustered by their gene expression profile and cell type identities are assigned to the clusters based on known gene markers.(首先对单细胞数据聚类和定义)。
2.2、CCC tools can predict intercellular crosstalk between any pair of clusters, one cluster being the source and the other the target of a CCC event.
3、每个软件都是一个配受体数据库,The information about which transmitter binds to which receiver is extracted from diverse sources of prior knowledge.(配受体库都是先验知识的积累)。
4、Roughly, CCC tools then estimate the likelihood of crosstalk based on the expression level of the transmitter and the receiver in the source and target clusters, respectively.(基本都是这么做的)。
5、每个软件有两个主要的组成部分,a resource of prior knowledge on CCC (interactions), and a method to estimate CCC from the known interactions and the dataset at hand
6、虽然每个软件的配受体和方法都不一样,但是原则上,any resource could be combined with any method.
7、软件之间的方法差异(6个软件),In turn, these different approaches result in diverse scoring systems that are difficult to compare and evaluate.(方法很多,选择哪一个??缺少一个好的标准)。
关于Cellchat,大家可以参考文章10X单细胞(10X空间转录组)通讯分析之CellChat、10X单细胞(10X空间转录组)通讯分析CellChat之多样本通讯差异分析,关于Squidpy,大家可以参加文章空间转录组细胞类型的距离分析之二---代码实现,10X空间转录组通讯分析章节3、关于Connectome,大家可以参考文章10X单细胞之细胞通讯篇章-----Connectome,关于iTALK,大家可以参考文章细胞通讯-iTALK使用方法,关于NATMI,大家可以参考文章单细胞数据细胞通讯分析软件NATMI。
8、软件之间配受体的不同,The available prior knowledge resources are typically distinct but often show partial overlap。Some of these resources also provide additional details for the interactions such as information about protein complexes、subcellular localisation、and classification into signalling pathways and categories。CCC resources are often manually curated and/or built from other resources, with varying proportions of expert curation and literature support,Some databases gather and harmonize the information contained in the individual resources(数据库的来源五花八门,数据库的影响也是今天研究的一个重点)。
表注:We defined unique and shared interactions, receivers and transmitters between the CCC resources if they could be found in only one or at least two of the resources, respectively.
9.1、软件之间的比较测试,First, we explored the degree of overlap among resources and whether certain resources are biased toward specific biological terms, such as pathways and functional cancer states
9.2、we analysed how different combinations of resources and methods influence CCC inference, by decoupling the methods from their corresponding resources(数据库和软件的拆开组合,策略如下图)。
我们来看看分析得到的结果,讲实话,我很惊讶,我知道每个软件分析结果不同,但是没有想到差异这么大。
结果1、Resource Uniqueness and Overlap(数据库比较,结论就是大家都不相同,之间的相似性差异也很大)。
首先是各个软件数据库的来源,Many of these resources share the same original data sources, including general biological databases such as KEGG, Reactome, and STRING,当然,还有很多其他的数据库。
来看看配受体对的差异,As a consequence of their common origins, we noted limited uniqueness across the resources, with mean percentages of 4.6 unique receivers, 5.3 unique transmitters, and 16.8% unique interactions, for all resources(共有性非常低,各个软件都有其第一无二的配受体对,而且占比差异很大,如下图)。
Despite the sparse uniqueness among the resources, the pairwise overlap between them varied,有的软件之间的相似性很高。
关于Jaccard Index,大家可以参加百度百科Jaccard系数,简单来讲就是两个数据集的交集除以两个数据集的并集。
图注:Upset plots representing the shared Interactions, Receivers, and Transmitters between all resources (A-C) and all resources except OmniPath (D-F).。
每个配受体数据库,contained on average more than 65% the interactions present in the other resources。
图注,A) Interactions B) Receivers and C) Transmitters present in each resource when taken from the rest of the resources. Note these plots are asymmetric and represent the % of interactions from the resources on the X axis found in each resource on the Y axis.
配受体库差异总结,In summary, our results indicate that many of the transmitters, receivers, and interactions are not unique to any single resource, due to their common origins. However, different resources include varying proportions of the collective CCC prior knowledge.(反正都有差异,只是比例大小不同)。
结果2、Resource Prior Knowledge Bias
首先来看Subcellular Localisation,On average 90% of transmitters and 79% of receivers were annotated as secreted and transmembrane proteins, respectively。(看来分泌型的配受体占主流)。further used the localisations of transmitters and receivers to categorize the interactions as secreted or direct-contact signaling.
图注,Numbers and Percentages of Subcellular locations annotations of Receivers (A-B) and Transmitters (C-D) for each CCC resource. S, P and T stand for Secreted, Peripheral plasma membrane(外周质膜), and Transmembrane plasma membrane proteins, respectively.
observed that all resources were predominantly (74% on average) composed of interactions associated with secreted signalling, while direct-contact signalling constituted a substantially smaller (16% on average) proportion of interactions(分泌型的信号占据主流)。
图注,Interactions categorized as neither secreted nor direct-contact were labeled as ‘Other’ and made up the remainder of the interactions
每个数据库分泌型和接触型的信号占比均不相同,CellChatDB showed an overrepresentation of interactions matched to the category Other
配受体细胞定位的结论,Our results suggest that localisations of transmitters and receivers were largely uniformly distributed and that secreted signalling was predominant across all resources. Yet, differences were noted between the relative abundance of secreted and direct-contact signalling interactions.(分泌型和接触型的配受体,每个数据库的比例均不相同).
Functional Term Enrichment(配受体通路的不同),每个数据库覆盖的通路及数量都有差别。
interactions associated with innate immune pathways and T-cell receptor categories were under-represented in Guide to Pharmacology, Baccin2019, EMBRACE, Kirouac2010, ICELLNET, CellPhoneDB, and HMPR(免疫相关的通路差异比较大,有的数据库甚至没有,但同时也很注释的数据库有关)。
图注,Number of matches to A) Interactions, B) Receivers and C) Transmitters, Enrichment Scores for their Receivers and Transmitters (D-E), and the Percentages of Interactions, Receivers and Transmitters (F-H) matched to the NetPath database per resource。
These observations for the WNT pathway were further supported by the relative abundance of HGNC。(不同的数据库注释也带来了很大的差异)。
图注,Number of matches to A) Merged Sets of Receivers and Transmitters, B) Receivers and C) Transmitters, their corresponding Enrichment Scores (D-F), and Percentages (G-I) per resource matched to the HGNC database.
Functional cancer cell states from CancerSEA were also unevenly represented in sets of receivers and transmitters across the resources(差异太大了,大到没有思路了)。
图注,Number of matches to A) Merged Sets of Receivers and Transmitters, B)Receivers and C) Transmitters and their corresponding (D-F) Enrichment Scores, and Percentages (G-I) per resource matched to the CancerSEA database.
运用一个注释好的数据来判断软件结果的一致性,这里我们关注数据的 the interactions between tumour cells subclassified by their resemblance of CRC consensus molecular subtypes (CMS) and immune cells from tumour samples,reasoning that this subset of cell types represents a complex example where CCC events are known to have an important role.
第一个结果,Interaction overlap
We then used each method-resource combination to infer CCC interactions, assuming that different methods should generally agree on the most relevant CCC events for the same resource and expression data.(这个假设~~~~~~~)。To measure the agreement between method-resource combinations, we looked at the overlap between the 500 highest ranked interactions as predicted by each method。Whenever available, author recommendations were used to filter out the false-positive interactions.
结论1、Our analysis showed considerable differences in the interactions predicted by each of the methods regardless of the resource used(我们的分析表明,无论使用何种资源,每种方法预测的相互作用都有很大差异 ),as the mean Jaccard index per resource ranged from 0.01 to 0.06 (mean = 0.024) when using different methods(真够低的)。These large discrepancies in the results were further supported by the pairwise comparisons between methods using the same resource, with mean Jaccard indices ranging from 0.063 (CellChat-SingleCellSignalR) to 0.110 (Connectome-NATMI).(也很低)。The overlap among the top predicted interactions was slightly higher when using the same method but with different resources, as Jaccard indices ranged from 0.113 to 0.203 per method (mean = 0.167)(相同的方法,不同的数据库的分析结果一致性提高了一点,但是绝对值还是很低,我都怀疑之前的分析到底对不对了)。
图注,Jaccard indices for the 500 highest ranked interactions obtained from each method-resource combination.
结论2、Consequently, the highest ranked interactions for each method-resource combination largely showed stronger clustering by method than resource(方法对结果的影响更大 suggesting that the overlap between these combinations occurs predominantly when using the same method regardless of the resource)。
图注,Overlap in the 500 highest ranked CCC interactions between different combinations of methods and resources. Method-resource combinations were clustered according to binary ( Jaccard index) distances. SCA refers to the SingleCellSignalR method
关于配受体的复合物,This analysis showed that the proportion of complexes among the highest ranked hits was 2-23% for CellChat and 10-38% for Squidpy, largely reflecting the relative complex content in each resource.(差异也很大)。
结论2的总结1,Our results suggest that the overlap between methods when using the same resource was low
图注,Upset plot showing the overlap between the 500 highest ranked interactions using the same method with all resources.
结论2的总结2,The overlap when using the same method with different resources, albeit higher than that between different methods, was also modest(相同的方法不同的数据库,差异也比较大)。Hence, our results indicate that both the method and the resource had a considerable impact on the predicted interactions.
图注,Upset plot showing overlap of most relevant interactions for each method with the same resource
结论3,Next, we asked whether the discrepancies observed between the methods stem from the differences in the cell types inferred as most active in terms of CCC interactions。To this end, we used the 500 highest ranked interactions to examine the cell type activities, defined as the proportion of interactions per cell type, separately as a source and a target of CCC events。结论和上面的差不多,不同的方法影响很大,as each method largely clustered by itself, regardless of the resource used, including the reshuffled resource. 采用不同的方法,对结果的影响很大。As a consequence, the disagreement between the methods in which cell types are the most active is expected to have a major impact on the biological interpretation of CCC communication predictions.
图注,PCA of normalized average interaction rank frequencies per cell pair
目前推断细胞通讯的缺陷
1、CCC events are mainly predicted based on the average gene expression at the cluster or cell type/state level. Such an assumption inherently suggests that gene expression is informative of the activity of transmitters and receivers However, gene expression provided by scRNA-Seq is typically limited to protein coding genes and the cells within the dataset, and hence does not capture secreted signalling events driven by non-protein molecules or long-distance endocrine signalling events.(这个缺点光靠单细胞数据无法解决).
2、CCC inference from scRNA-Seq data assumes that the product of the gene expression of a transmitter and a receiver is a good proxy for their joint activity, and thus does not consider any of the processes preceding transmitter-receiver interactions, including protein translation and processing, secretion, and diffusion.(更无法解决)。
Conclusion
方法尚未完善,我们仍需努力
到底做细胞通讯用哪个方法好呢????不知道读者有什么想法
生活很好,有你更好~~~