用PrediXcan基因型数据预测基因表达的准确性在大陆人群之间和内部各不相同

Transcriptomic predictors of inflammation-induced depressed mood

Using genetic data to predict gene expression has garnered significant attention in recent years. PrediXcan has become one of the most widely used gene-based methods for testing associations between predicted gene expression values and a phenotype, which has facilitated novel insights into the relationship between complex traits and the component of gene expression that can be attributed to genetic variation. The gene expression prediction models for PrediXcan were developed using supervised machine learning methods and training data from the Depression Genes and Networks (DGN) study and the Genotype-Tissue Expression (GTEx) project, where the majority of subjects are of European descent. Many genetic studies, however, include samples from multi-ethnic populations, and in this paper we evaluate the accuracy of PrediXcan for predicting gene expression in diverse populations. Using transcriptomic data from the GEUVADIS (Genetic European Variation in Disease) RNA sequencing project and whole genome sequencing data from the 1000 Genomes project, we evaluate and compare the predictive performance of PrediXcan in an African population (Yoruban) and four European ancestry populations for thousands of genes. We evaluate a range of models from the PrediXcan weight databases and use Pearson's correlation coefficient to assess gene expression prediction accuracy with PrediXcan. From our evaluation, we find that the predictive performance of PrediXcan varies substantially among populations from different continents (F-test p-value < 2.2 × 10-16), where prediction accuracy is lower in the Yoruban population from West Africa compared to the European-ancestry populations. Moreover, not only do we find differences in predictive performance between populations from different continents, we also find highly significant differences in prediction accuracy among the four European ancestry populations considered (F-test p-value < 2.2 × 10-16). Finally, while there is variability in prediction accuracy across different PrediXcan weight databases, we also find consistency in the qualitative performance of PrediXcan for the five populations considered, with the African ancestry population having the lowest accuracy across databases.

近年来,利用基因数据预测基因表达已经引起了广泛关注。PrediXcan已成为最广泛使用的基于基因的方法之一,用于测试预测的基因表达值和表型之间的关联,这有助于对复杂性状和可归因于遗传变异的基因表达成分之间的关系的新见解。Predixcan的基因表达预测模型是使用受监督的机器学习方法和来自抑郁症基因与网络(DGN)研究和基因型组织表达(GTEX)项目的训练数据开发的,其中大多数受试者是欧洲血统。然而,许多遗传研究包括来自多种族群体的样本,在本文中,我们评估了PrediXcan预测不同种群中基因表达的准确性。使用来自GEUVADIS(疾病的遗传欧洲变异)RNA测序项目的转录组学数据和来自1000 Genomes项目的全基因组测序数据,我们评估和比较PrediXcan在非洲人群(约鲁巴)和4个欧洲血统群体中成千上万基因的预测性能。我们从Predixcan权重数据库中评估了一系列模型,并使用皮尔逊相关系数来评估Predixcan基因表达预测的准确性。从我们的评估中,我们发现PrediXcan的预测性能在不同大陆的人群中有很大差异(F-test p-value < 2.2 × 10-16),其中西非约鲁巴人群的预测精度低于欧洲血统人群。此外,我们不仅发现来自不同大陆的人群之间预测性能的差异,我们还发现在所考虑的四个欧洲血统人群中预测准确性存在极显著差异(F-test p-value < 2.2 × 10-16)。最后,虽然不同PrediXcan权重数据库的预测准确性存在差异,但我们也发现PrediXcan对于所考虑的五个群体的定性性能具有一致性,非洲血统群体在数据库中的准确度最低。

2019 Apr 3 发表于Front Genet(IF3.517)

你可能感兴趣的:(文献摘要翻译,depression,genomic,transcripto)