Synteny and Collinearity in Plant Genomes (Science)
植物基因组中的同线性和共线性
Plant genomes complement, particularly over modest (intrafamilial) evolutionary distances. Comparative data will also facilitate a broader understanding of the dynamics of gene duplication and TE accumulation.
植物基因组互补,特别是在适度的(家族内)进化距离上。 比较数据还将有助于更广泛地了解基因复制和 TE 积累的动态。
Nevertheless, additional comparative and population-genetic data alone will not yield a complete understanding of selection on plant genomes or on the processes that govern genome size variation. There is first a pressing need for additional theoretical advances to provide a conceptual framework to interpret polymorphism data, especially in the context of demographic change in structured populations. Similarly, the theory of the population genetics of gene duplication is in its infancy, as is our understanding of whether standing genetic variation commonly contributes to adaptation. In addition, we need to better understand biological factors that affect the process of selection but are usually not included in molecular-evolutionary or population-genetic models; such factors include paramutation, methylation, epistasis, and gene conversion. Finally, there is always a need to complement inferences about selection with functional assays, particularly if the goal is to correctly identify the genetic variants that have been targeted by selection. With the need for additional data and theoretical models, we clearly are only beginning to understand the complex interplay among phenotypic diversity, genome size, and natural selection.
然而,仅靠额外的比较和种群遗传数据还不能完全理解植物基因组的选择或控制基因组大小变异的过程。首先,迫切需要更多的理论进展,以提供一个概念框架来解释多态性数据,特别是在结构化种群的人口变化的背景下。同样,基因复制的群体遗传学理论也处于起步阶段,我们对长期遗传变异是否通常有助于适应的理解也处于起步阶段。此外,我们需要更好地理解影响选择过程的生物因素,但这些因素通常不包括在分子进化或种群遗传模型中;这些因素包括副变异、甲基化、表观和基因置换。最后,总是需要用功能分析来补充关于选择的推论,特别是如果目标是正确地识别被选择瞄准的遗传变异。由于需要更多的数据和理论模型,我们显然才刚刚开始了解表型多样性、基因组大小和自然选择之间的复杂相互作用。
摘要:
Correlated gene arrangements among taxa provide a valuable framework for inference of shared ancestry of genes and for the utilization of findings from model organisms to study less-well understood systems. In angiosperms, comparisons of gene arrangements are complicated by recurring polyploidy and extensive genome rearrangement. New genome sequences and improved analytical approaches are clarifying angiosperm evolution and revealing patterns of differential gene loss after genome duplication and differential gene retention associated with evolution of some morphological complexity. Because of variability in DNA substitution rates among taxa and genes, deviation from collinearity might be a more reliable phylogenetic character.
分类群之间的相关基因排列为推断基因的共同祖先和利用模式生物的发现来研究不太了解的系统提供了有价值的框架。 在被子植物中,基因排列的比较因反复出现的多倍体和广泛的基因组重排而变得复杂。 新的基因组序列和改进的分析方法正在阐明被子植物的进化并揭示基因组复制后差异基因丢失的模式以及与某些形态复杂性的进化相关的差异基因保留。 由于类群和基因之间 DNA 替代率的变异性,偏离共线性可能是更可靠的系统发育特征。
Eukaryotic genomes differ in the degree to which genes remain on corresponding chromosomes (synteny) and in corresponding orders (collinearity) over time (1). For example, most eutherian (placental mammal) orders have incurred only moderate reshuffling of chromosomal segments since descent from common ancestors ~130 million years ago (2). Indeed, karyotype evolution along major vertebrate lineages appears to have been slow since an inferred whole-genome duplication occurred ~500 million years ago (3). Accordingly, accurate identification of orthologs across eutherian taxa is relatively routine, and deduction of synteny and collinearity is often straightforward with best-in-genome criteria (4), identifying one-to-one best matching chromosomal regions in pairwise genome comparisons
真核生物基因组的不同之处在于基因保留在相应染色体上的程度(同线性)和随着时间的推移以相应的顺序(共线性)(1)。例如,大多数真兽(胎盘哺乳动物)目自1.3亿年前共同祖先的后代以来只发生了适度的染色体片段重组(2)。事实上,自 5 亿年前推断的全基因组复制发生以来,沿主要脊椎动物谱系的核型进化似乎一直很缓慢 (3)。因此,在整个真动物类群中准确地识别同源是相对常规的,并且通常用基因组中最佳标准直接推断同线性和共线性(4),在两两基因组比较中确定一对一最佳匹配的染色体区域。
Angiosperm (flowering plant) genomes fluctuate remarkably in size and arrangement even within close relatives, with recurring whole genome duplications occurring over the past ~200 million years accompanied by wholesale gene loss that has fractionated ancestral gene linkages across multiple chromosomes (5). Angiosperm genome sizes span more than 1000-fold (6), with much of the difference between some well-studied genomes in heterochromatin (7). Additionally, the reshuffling of short DNA segments by mobile elements nearly eliminates large-scale collinearity in heterochromatic regions (7).
被子植物(开花植物)基因组的大小和排列即使在近亲中也有显着波动,在过去约 2 亿年中反复发生全基因组复制,伴随着大量基因丢失,导致多条染色体上的祖先基因连锁分离(分散了多个染色体上的祖先基因联系)(5)。 被子植物基因组大小跨度超过 1000 倍(6),在异染色质中一些经过充分研究的基因组之间存在很大差异(7)。 此外,移动元件对短 DNA 片段的重新洗牌几乎消除了异染色质区域中的大规模共线性(7)。
Despite recurring whole-genome duplications, angiosperm chromosome numbers are more static than genome size, mostly within a range of less than 50-fold (6). Condensation of two chromosomes into one is known in many lineages; a particularly striking case involved the demonstration that n = 10 (chromosome number) members of the Sorghum genus are ancestral to n = 5 members of the genus (8). Indeed, Sorghum bicolor (sorghum) and Zea mays (maize) have the same chromosome number (n = 10), although maize has been through a whole-genome duplication since their divergence (9), whereas the most recent duplication in sorghum is shared with all other cereals (10). The occurrence of several condensations may explain why single arms of several maize chromosomes (10 and 5) correspond to entire sorghum chromosomes (6 and 4) (11).
尽管反复出现全基因组复制,被子植物的染色体数目比基因组大小更稳定,大部分在小于50倍的范围内(6)。将两条染色体浓缩成一条染色体在许多谱系中是已知的;一个特别突出的例子是证明高粱属的n=10(染色体数目)成员是该属n=5成员的祖先(8)。事实上,高粱(Sorghum bicolor)和玉米(Zea mays)具有相同的染色体数目(n = 10),尽管玉米在它们分化后经历了一次全基因组的复制(9),而高粱最近的一次复制是与所有其他谷类共享的(10)。几次缩合的发生可能解释了为什么几个玉米染色体的单臂(10和5)对应于整个高粱染色体(6和4)(11)。
Fully sequenced genomes promise to improve deductions of correspondence, toward a unified framework for comparative evolutionary analysis. In angiosperms, analysis of synteny and paleopolyploidy are inextricably intertwined because comparative genomics in angiosperm sequences require strategies to mitigate the effects of genome duplication and fractionation. For example, Arabidopsis thaliana (thale cress) has undergone three paleo-polyploidies, including two doublings (5) and one tripling (12), resulting in ~12 copies of its ancestral chromosome set in a ~160-Mb genome. Further complicating the comparison of A. thaliana to other angiosperms are an additional 9 to 10 chromosomal rearrangements in the past few million years since its divergence from A. lyrata (rock cress) and Capsella rubella (pink shepherd's purse), including condensation of six chromosomes into three, bringing the chromosome number from n = 8 to n =5 (13).
完全测序的基因组有望改进对应的推论,朝着比较进化分析的统一框架发展。 在被子植物中,同线性和古多倍体的分析密不可分,因为被子植物序列中的比较基因组需要策略来减轻基因组复制和分级的影响。 例如,拟南芥 (thale cress) 经历了三种古多倍体,包括两次加倍 (5) 和一次三倍 (12),从而在约 160 Mb 的基因组中产生约 12 个其祖先染色体的拷贝。 使拟南芥与其他被子植物的比较更加复杂的是,自从它与 A. lyrata 和 Capsella rubella 分歧以来的过去几百万年中,又发生了 9 到 10 次染色体重排,包括六条染色体的凝聚一分为三,使染色体数目从 n = 8 变为 n = 5 (13)。
。。。
之后作者对植物界,动物界、昆虫、酵母都做了一些染色体多倍化的描述,感觉像一篇顶级的综述,对算法的描述似乎一概而过,这就是Science文章的模式?——其实,MCscan真正发表的文献应该是另一篇,下期翻译。这篇文章有兴趣的小伙伴可以下载来看看,对于学习基因组多倍化有很深远的帮助。