2021-11-19

https://www.cell.com/cell/fulltext/S0092-8674(17)31136-4

Summary

Cancer develops as a result of somatic mutation and clonal selection, but quantitative measures of selection in cancer evolution are lacking. We adapted methods from molecular evolution and applied them to 7,664 tumors across 29 cancer types. Unlike species evolution, positive selection outweighs negative selection during cancer development. On average, <1 coding base substitution/tumor is lost through negative selection, with purifying selection almost absent outside homozygous loss of essential genes. This allows exome-wide enumeration of all driver coding mutations, including outside known cancer genes. On average, tumors carry ∼4 coding substitutions under positive selection, ranging from <1/tumor in thyroid and testicular cancers to >10/tumor in endometrial and colorectal cancers. Half of driver substitutions occur in yet-to-be-discovered cancer genes. With increasing mutation burden, numbers of driver mutations increase, but not linearly. We systematically catalog cancer genes and show that genes vary extensively in what proportion of mutations are drivers versus passengers.
key: 肿瘤是一个体细胞突变和克隆选择的过程，类似于进化的过程。发现driver gene, dndscv tools
how: 自然选择可以分为三类：正向选择（适应），负向选择（不适应）和中性选择（没什么影响）。类似的，肿瘤的体细胞突变也可以分为三类：正向突变（分裂和增值）、负向突变（死亡）和中性突变（没什么影响）。
Data: 7,664 tumors across 29 cancer types (WES, TCGA)

https://www.nature.com/articles/s41586-020-1969-6

Abstract

Cancer is driven by genetic change, and the advent of massively parallel sequencing has enabled systematic documentation of this variation at the whole-genome scale. Here we report the integrative analysis of 2,658 whole-cancer genomes and their matching normal tissues across 38 tumour types from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA). We describe the generation of the PCAWG resource, facilitated by international data sharing using compute clouds. On average, cancer genomes contained 4–5 driver mutations when combining coding and non-coding genomic elements; however, in around 5% of cases no drivers were identified, suggesting that cancer driver discovery is not yet complete. Chromothripsis, in which many clustered structural variants arise in a single catastrophic event, is frequently an early event in tumour evolution; in acral melanoma, for example, these events precede most somatic point mutations and affect several cancer-associated genes simultaneously. Cancers with abnormal telomere maintenance often originate from tissues with low replicative activity and show several mechanisms of preventing telomere attrition to critical levels. Common and rare germline variants affect patterns of somatic mutation, including point mutations, structural variants and somatic retrotransposition. A collection of papers from the PCAWG Consortium describes non-coding mutations that drive cancer beyond those in the TERT promoter; identifies new signatures of mutational processes that cause base substitutions, small insertions and deletions and structural variation; analyses timings and patterns of tumour evolution; describes the diverse transcriptional consequences of somatic mutation on splicing, expression levels, fusion genes and promoter activity and evaluates a range of more-specialized features of cancer genomes.

key: 全基因组泛癌分析
data: 2,658 tumor-normal WGS (ICGC and TCGA)
results: 大约5%的case中没有检测到驱癌基因，表明驱癌基因的发现还没有完成；生殖细胞的突变也会影响体细胞的突变

https://www.nature.com/articles/s41588-021-00928-6

Abstract

Esophageal squamous cell carcinoma (ESCC) shows remarkable variation in incidence that is not fully explained by known lifestyle and environmental risk factors. It has been speculated that an unknown exogenous exposure(s) could be responsible. Here we combine the fields of mutational signature analysis with cancer epidemiology to study 552 ESCC genomes from eight countries with varying incidence rates. Mutational profiles were similar across all countries studied. Associations between specific mutational signatures and ESCC risk factors were identified for tobacco, alcohol, opium and germline variants, with modest impacts on mutation burden. We find no evidence of a mutational signature indicative of an exogenous exposure capable of explaining differences in ESCC incidence. Apolipoprotein B mRNA-editing enzyme, catalytic polypeptide-like (APOBEC)-associated mutational signatures single-base substitution (SBS)2 and SBS13 were present in 88% and 91% of cases, respectively, and accounted for 25% of the mutation burden on average, indicating that APOBEC activation is a crucial step in ESCC tumor development.
key: 食管鳞状细胞癌的突变特征
what is mutational signature :
体细胞突变存在于人体的所有细胞中，并且在整个生命过程中都会发生。它们是多种突变过程的结果，包括 DNA 复制机制的内在轻微不忠、外源性或内源性诱变剂暴露、DNA 的酶促修饰和 DNA 修复缺陷。不同的突变过程会产生独特的突变类型组合，称为“突变特征”。

This is illustrated in the figure below using a framework of 6 classes of single base substitutions, and three distinct mutational processes, whose respective strengths vary throughout a patient’s life. At the beginning, all mutations were due to the activity of the endogenous mutational process. As time progresses, the other processes get activated and the mutational spectrum of the cancer genome continues to change.

Data: 552 ESCC genomes from eight countries with varying incidence rates

a, Tumor mutation burden (TMB) plot showing the frequency and mutations per Mb for each of the extracted de novo signatures. b, TMB plot showing the frequency and mutations per Mb for each COSMIC reference signature identified in the ESCC cohort. c, Twenty de novo signatures extracted from 552 ESCC cases.

https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0893-4

Abstract

Background
Analysis of somatic mutations provides insight into the mutational processes that have shaped the cancer genome, but such analysis currently requires large cohorts. We develop deconstructSigs, which allows the identification of mutational signatures within a single tumor sample.

Results
Application of deconstructSigs identifies samples with DNA repair deficiencies and reveals distinct and dynamic mutational processes molding the cancer genome in esophageal adenocarcinoma compared to squamous cell carcinomas.

Conclusions
deconstructSigs confers the ability to define mutational processes driven by environmental exposures, DNA repair abnormalities, and mutagenic processes in individual tumors with implications for precision cancer medicine.
key: SigProfilerExtractor 需要大量的样本，不能对单一的样本进行signature 分析
deconstructSigs 用已有的signature 解析单个样品的signature

deconstructSigs workflow and output. a Given an input tumor profile and reference input signatures, deconstructSigs iteratively infers the weighted contributions of each reference signature until an empirically chosen error threshold is reached. b Example of the plot generated by the command ‘plotSignatures’. The top panel is the tumor mutational profile displaying the fraction of mutations found in each trinucleotide context, the middle panel is the reconstructed mutational profile created by multiplying the calculated weights by the signatures, and the bottom panel is the error between the tumor mutational profile and reconstructed mutational profile, with SSE annotated

https://www.nature.com/articles/s41588-018-0111-2

Abstract

Chromosome conformation capture (3C) technologies can be used to investigate 3D genomic structures. However, high background noise, high costs, and a lack of straightforward noise evaluation in current methods impede the advancement of 3D genomic research. Here we developed a simple digestion-ligation-only Hi-C (DLO Hi-C) technology to explore the 3D landscape of the genome. This method requires only two rounds of digestion and ligation, without the need for biotin labeling and pulldown. Non-ligated DNA was efficiently removed in a cost-effective step by purifying specific linker-ligated DNA fragments. Notably, random ligation could be quickly evaluated in an early quality-control step before sequencing. Moreover, an in situ version of DLO Hi-C using a four-cutter restriction enzyme has been developed. We applied DLO Hi-C to delineate the genomic architecture of THP-1 and K562 cells and uncovered chromosomal translocations. This technology may facilitate investigation of genomic organization, gene regulation, and (meta)genome assembly.

a, Cells were double cross-linked with EGS and formaldehyde (step 1) and then digested with a restriction enzyme (step 2). The digested chromatin fragments were divided into two tubes, ligated with different 20-bp half-linkers by simultaneous digestion and ligation (step 3), and subsequently mixed for in-gel proximity ligation (step 4). The ligated DNA was purified and digested by MmeI (step 5), which released specific 80-bp DLO Hi-C DNA fragments. These DLO Hi-C DNA fragments were ligated with Illumina adaptors and subjected to high-throughput sequencing (step 6). The two rounds of digestion and ligation are highlighted with boxes. b, Details of ligating the half-linkers to the digested chromatin by simultaneous digestion and ligation of dead-end reactions.

what is hi-c: 染色体的三维结构和染色体的相互作用
key: 传统的hi-c方法：降低背景噪音，降低成本，复杂的制备过程
digestion-ligation-only Hi-C: 酶消酶连

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7140825/

Abstract

It is becoming increasingly important to understand the mechanism of regulatory elements on target genes in long-range genomic distance. 3C (chromosome conformation capture) and its derived methods are now widely applied to investigate three-dimensional (3D) genome organizations and gene regulation. Digestion-ligation-only Hi-C (DLO Hi-C) is a new technology with high efficiency and cost-effectiveness for whole-genome chromosome conformation capture. Here, we introduce the DLO Hi-C tool, a flexible and versatile pipeline for processing DLO Hi-C data from raw sequencing reads to normalized contact maps and for providing quality controls for different steps. It includes more efficient iterative mapping and linker filtering. We applied the DLO Hi-C tool to different DLO Hi-C datasets and demonstrated its ability in processing large data with multithreading. The DLO Hi-C tool is suitable for processing DLO Hi-C and in situ DLO Hi-C datasets. It is convenient and efficient for DLO Hi-C data processing.

Keywords: 3D genomics, DLO Hi-C, linker detection, iteration mapping

The schematic pipeline for the digestion-ligation-only Hi-C (DLO Hi-C) tool. The entire DLO Hi-C tool pipeline consists of four main steps, as four separate modules to process DLO Hi-C data. Boxes in dark gray indicate the filtered-out reads.

Abstract

Quantifying the pathogenicity of protein variants in human disease-related genes would have a marked effect on clinical decisions, yet the overwhelming majority (over 98%) of these variants still have unknown consequences. In principle, computational methods could support the large-scale interpretation of genetic variants. However, state-of-the-art methods have relied on training machine learning models on known disease labels. As these labels are sparse, biased and of variable quality, the resulting models have been considered insufficiently reliable. Here we propose an approach that leverages deep generative models to predict variant pathogenicity without relying on labels. By modelling the distribution of sequence variation across organisms, we implicitly capture constraints on the protein sequences that maintain fitness. Our model EVE (evolutionary model of variant effect) not only outperforms computational approaches that rely on labelled data but also performs on par with, if not better than, predictions from high-throughput experiments, which are increasingly used as evidence for variant classification. We predict the pathogenicity of more than 36 million variants across 3,219 disease genes and provide evidence for the classification of more than 256,000 variants of unknown significance. Our work suggests that models of evolutionary information can provide valuable independent evidence for variant interpretation that will be widely useful in research and clinical settings.

image.png

For each protein, a Bayesian VAE (top) learns a distribution over amino acid sequences in a multiple sequence alignment (MSA) of evolutionary data. This enables the computation of the evolutionary index (bottom left) for each single-variant sequence, which approximates the negative log-likelihood ratio of variant (Xv) versus wild-type (XWT) sequences. A global-local mixture of a Gaussian mixture model (bottom right) separates variants into benign (blue dashed line) and pathogenic (red) clusters based on that index. The outcome of the model is both a continuous score that reflects pathogenicity propensity, and probabilistic assignment to benign and pathogenic classes (blue and red shaded areas, respectively) below a user-defined uncertainty threshold

key: evolutionary index

https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkab857/6381136

Abstract

T-cell receptors (TCRs) and B-cell receptors (BCRs) are critical in recognizing antigens and activating the adaptive immune response. Stochastic V(D)J recombination generates massive TCR/BCR repertoire diversity. Single-cell immune profiling with transcriptome analysis allows the high-throughput study of individual TCR/BCR clonotypes and functions under both normal and pathological settings. However, a comprehensive database linking these data is not yet readily available. Here, we present the human Antigen Receptor database (huARdb), a large-scale human single-cell immune profiling database that contains 444 794 high confidence T or B cells (hcT/B cells) with full-length TCR/BCR sequence and transcriptomes from 215 datasets. All datasets were processed in a uniform workflow, including sequence alignment, cell subtype prediction, unsupervised cell clustering, and clonotype definition. We also developed a multi-functional and user-friendly web interface that provides interactive visualization modules for biologists to analyze the transcriptome and TCR/BCR features at the single-cell level. HuARdb is freely available at https://huarc.net/database with functions for data querying, browsing, downloading, and depositing. In conclusion, huARdb is a comprehensive and multi-perspective atlas for human antigen receptors.

Overview of huARdb workflow and modules. All single-cell immune profiling datasets were retrieved from public databases (SRA, GEO, and GSA). All datasets were uniformly processed with our workflow. All processed data were stored in PostgreSQL database and HDF5 files, using sample ID as primary keys. For the web user interface, various interactive visualization and data analysis modules were provided for analyzing transcriptome and clonotype features.

key: 单细胞水平的T细胞和B细胞表面受体数据库
Data： 444 794 high confidence T or B cells (hcT/B cells) with full-length TCR/BCR sequence and transcriptomes from 215 datasets
Q: clonetype 和cell subtype的区别

image.png

https://www.science.org/doi/10.1126/science.1132939

Abstract

To pursue a systematic approach to the discovery of functional connections among diseases, genetic perturbation, and drug action, we have created the first installment of a reference collection of gene-expression profiles from cultured human cells treated with bioactive small molecules, together with pattern-matching software to mine these data. We demonstrate that this “Connectivity Map” resource can be used to find connections among small molecules sharing a mechanism of action, chemicals and physiological processes, and diseases and drugs. These results indicate the feasibility of the approach and suggest the value of a large-scale community Connectivity Map project.
key: a reference collection of gene-expression profiles from cultured human cells treated with bioactive small molecules, together with pattern-matching software to mine these data With only 164 drug perturbations in only 3 cancer cell lines

The Connectivity Map Concept. Gene-expression profiles derived from the treatment of cultured human cells with a large number of perturbagens populate a reference database. Gene-expression signatures represent any induced or organic cell state of interest (left). Pattern-matching algorithms score each reference profile for the direction and strength of enrichment with the query signature (center). Perturbagens are ranked by this “connectivity score”; those at the top (“positive”) and bottom (“negative”) are functionally connected with the query state (right) through the transitory feature of common gene-expression changes

key: 用药物处理不同细胞系基因表达谱
Data: 164种药物 3个癌症细胞系
how: reference database: 药物处理不同的细胞系，得到不同的表达谱。
connection: 药物与疾病强相关；不相关；负相关

Summary

We previously piloted the concept of a Connectivity Map (CMap), whereby genes, drugs, and disease states are connected by virtue of common gene-expression signatures. Here, we report more than a 1,000-fold scale-up of the CMap as part of the NIH LINCS Consortium, made possible by a new, low-cost, high-throughput reduced representation expression profiling method that we term L1000. We show that L1000 is highly reproducible, comparable to RNA sequencing, and suitable for computational inference of the expression levels of 81% of non-measured transcripts. We further show that the expanded CMap can be used to discover mechanism of action of small molecules, functionally annotate genetic variants of disease genes, and inform clinical trials. The 1.3 million L1000 profiles described here, as well as tools for their analysis, are available at https://clue.io.

key: low cost, hig-throughput 1.3 million L1000 profiles. 更好地探索分子、药物和疾病之间的关联

Summary

Genomic medicine, which uses DNA variation to individualise and improve human health, is the subject of this Series of papers. The idea that genetic variation can be used to individualise drug therapy—the topic addressed here—is often viewed as within reach for genomic medicine. We have reviewed general mechanisms underlying variability in drug action, the role of genetic variation in mediating beneficial and adverse effects through variable drug concentrations (pharmacokinetics) and drug actions (pharmacodynamics), available data from clinical trials, and ongoing efforts to implement pharmacogenetics in clinical practice.

key: In the 1990s, a large survey suggested that ADRs occurring in hospitals were the fourth to sixth leading cause of in-hospital mortality in the USA,
and a follow-up survey in 2010 showed no improvement.
Fewer data are available on the consequences of the lack of efficacy, beyond recognising that only a proportion of a given patient population derives benefit from a given medication.
what is pharmacogenomics?
基因对药物反应的影响

Profile of drug responses as influenced by a single pharmacogene variant (A) or multiple gene variants (B)

image.png

https://onlinelibrary.wiley.com/doi/10.1111/cas.13505

Abstract

Explosive advances in next-generation sequencer (NGS) and computational analyses have enabled exploration of somatic protein-altered mutations in most cancer types, with coding mutation data intensively accumulated. However, there is limited information on somatic mutations in non-coding regions, including introns, regulatory elements and non-coding RNA. Structural variants and pathogen in cancer genomes remain widely unexplored. Whole genome sequencing (WGS) approaches can be used to comprehensively explore all types of genomic alterations in cancer and help us to better understand the whole landscape of driver mutations and mutational signatures in cancer genomes and elucidate the functional or clinical implications of these unexplored genomic regions and mutational signatures. This review describes recently developed technical approaches for cancer WGS and the future direction of cancer WGS, and discusses its utility and limitations as an analysis platform and for mutation interpretation for cancer genomics and cancer precision medicine. Taking into account the diversity of cancer genomes and phenotypes, interpretation of abundant mutation information from WGS, especially non-coding and structure variants, requires the analysis of large-scale WGS data integrated with RNA-Seq, epigenomics, immuno-genomic and clinic-pathological information.

A, Whole genome sequencing (WGS) by next-generation sequencer (NGS) can detect non-coding mutations, structural variants (SV), including copy number alterations (CNA), mitochondria mutations and pathogen detection, as well as protein-coding mutations; B, A representative Circos plot of cancer genome structure from WGS analysis, which indicates SV and CNA in all human chromosomes (1-22+XY). Chromothripsis was observed in chromosomes 1 and 14. SNV, single nucleotide variants

key: Non-conding regions. WGS. mutation interpretation, precision medicine.
WGS 可以用来检测非编码区的突变，结构变异（CNV），线粒体变异，病原检测，编码区的变异。结构变异SV

image.png

Abstract

The outbreak of Coronavirus disease 2019 (COVID-19) has evolved into an emergent global pandemic. Many drugs without established efficacy are being used to treat COVID-19 patients either as an offlabel/compassionate use or as a clinical trial. Although drug repurposing is an attractive approach with reduced time and cost, there is a need to make predictions on success before the start of therapy. For the optimum use of these repurposed drugs, many factors should be considered such as drug–gene or dug–drug interactions, drug toxicity, and patient co-morbidity. There is limited data on the pharmacogenomics of these agents and this may constitute an obstacle for successful COVID-19 therapy. This article reviewed the available human genome interactions with some promising repurposed drugs for COVID-19 management. These drugs include chloroquine (CQ), hydroxychloroquine (HCQ), azithromycin, lopinavir/ritonavir (LPV/r), atazanavir (ATV), favipiravir (FVP), nevirapine (NVP), efavirenz (EFV), oseltamivir, remdesivir, anakinra, tocilizumab (TCZ), eculizumab, heme oxygenase 1 (HO-1) regulators, renin–angiotensin–aldosterone system (RAAS) inhibitors, ivermectin, and nitazoxanide. Drug-gene variant pairs that may alter the therapeutic outcomes in COVID-19 patients are presented. The major drug variant pairs that associated with variations in clinical efficacy include CQ/HCQ (CYP2C8, CYP2D6, ACE2, and HO-1); azithromycin (ABCB1); LPV/r (SLCO1B1, ABCB1, ABCC2 and CYP3A); NVP (ABCC10); oseltamivir (CES1 and ABCB1); remdesivir (CYP2C8, CYP2D6, CYP3A4, and OATP1B1); anakinra (IL-1a); and TCZ (IL6R and FCGR3A). The major drug variant pairs that associated with variations in adverse effects include CQ/HCQ (G6PD; hemolysis and ABCA4; retinopathy), ATV (MDR1 and UGT1A128; hyperbilirubinemia; and APOA5; dyslipidemia), NVP (HLA-DRB101, HLA-B*3505 and CYP2B6; skin rash and MDR1; hepatotoxicity), and EFV (CYP2B6; depression and suicidal tendencies).

key: 基因是通过何种方式影响药物代谢的？PGx is the genome-wide analysis of genetic determinants of drug metabolizing enzymes, receptors, transporters, and targets that influence therapeutic efficacy and safety
可用的人类基因组相互作用和covid-19的潜在重定位药物。

image.png

Abstract:

Breast cancer is the fifth cause of cancer death among women worldwide and represents a global health concern due to the lack of effective therapeutic regimens that could be applied to all disease groups. Nowadays, strategies based on pharmacogenomics constitute novel approaches that minimize toxicity while maximizing drug efficacy; this being of high importance in the oncology setting. Besides, genetic profiling of malignant tumors can lead to the development of targeted therapies to be included in effective drug regimens. Advances in molecular diagnostics have revealed that breast cancer is a multifaceted disease, characterized by inter-tumoral and intra-tumoral heterogeneity and, unlike the past, molecular classifications based on the expression of individual biomarkers have led to devising novel therapeutic strategies that improve patient survival. In this review, we report and discuss the molecular classification of breast cancer subtypes, the heterogeneity resource, and the advantages and disadvantages of current drug regimens with consideration of pharmacogenomics in response and resistance to treatment.
https://www.dovepress.com/personalized-medicine-in-breast-cancer-pharmacogenomics-approaches-peer-reviewed-fulltext-article-PGPM

key: 乳腺癌的精准医疗
乳腺癌存在异质性，本文介绍了乳腺癌亚型的五种分子分型以及对应的治疗方式和抗药性。

Breast cancer classification based on molecular profiling

image.png

https://www.nature.com/articles/s41586-021-03836-1

Abstract

Somatic mutations that accumulate in normal tissues are associated with ageing and disease^1,2. Here we performed a comprehensive genomic analysis of 1,737 morphologically normal tissue biopsies of 9 organs from 5 donors. We found that somatic mutation accumulations and clonal expansions were widespread, although to variable extents, in morphologically normal human tissues. Somatic copy number alterations were rarely detected, except for in tissues from the oesophagus and cardia. Endogenous mutational processes with the SBS1 and SBS5 mutational signatures are ubiquitous among normal tissues, although they exhibit different relative activities. Exogenous mutational processes operate in multiple tissues from the same donor. We reconstructed the spatial somatic clonal architecture with sub-millimetre resolution. In the oesophagus and cardia, macroscopic somatic clones that expanded to hundreds of micrometres were frequently seen, whereas in tissues such as the colon, rectum and duodenum, somatic clones were microscopic in size and evolved independently, possibly restricted by local tissue microstructures. Our study depicts a body map of somatic mutations and clonal expansions from the same individual.

key: 细胞分裂的过程中会出现体细胞突变。同一个个体不同组织的样本。WES。激光显微解剖
Data: 9 organs from 5 donors

image.png

SUMMARY

Small cell lung cancer (SCLC) is an aggressive malignancy that includes subtypes defined by differential
expression of ASCL1, NEUROD1, and POU2F3 (SCLC-A, -N, and -P, respectively). To define the heterogeneity of tumors and their associated microenvironments across subtypes, we sequenced 155,098 transcriptomes from 21 human biospecimens, including 54,523 SCLC transcriptomes. We observe greater tumor
diversity in SCLC than lung adenocarcinoma, driven by canonical, intermediate, and admixed subtypes.
We discover a PLCG2-high SCLC phenotype with stem-like, pro-metastatic features that recurs across subtypes and predicts worse overall survival. SCLC exhibits greater immune sequestration and less immune infiltration than lung adenocarcinoma, and SCLC-N shows less immune infiltrate and greater T cell dysfunction
than SCLC-A. We identify a profibrotic, immunosuppressive monocyte/macrophage population in SCLC
tumors that is particularly associated with the recurrent, PLCG2-high subpopulation.

key: 小细胞肺癌，预后差，易转移
细胞亚型，PLCG2高表达与促转移，复发显著相关，差的预后
免疫细胞的隔离，与低浸润。SCLC-N与SCLC-A相比，T细胞的功能障碍（T细胞的耗竭，Treg增高）

Data: 155,098 transcriptomes from 21 human biospecimens

image.png

https://www.science.org/doi/10.1126/science.abf3067

image.png

key: rare mutation. 分类，根据他们参与的失调的细胞组件或生物过程进行分类。生物学过程跨度很大，从单个的残基到蛋白的一个功能域，关注其中一个方面会忽视另外一个方面，所以讲基因编码的蛋白整合成一个层次的网络。用于解释突变。

2021-11-19

Summary

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Summary

Summary

Abstract

Abstract

Abstract:

Abstract

SUMMARY

你可能感兴趣的:(2021-11-19)