MIT 6.047 | 基因组学机器学习(2020·完整版)2022-11-28-1-intro

图片.png
motif是什么?

Previous research in Computational Biology在这门课上之前的一些项目

topic 题目
Analyzed RNA seq data for the Pseudomonas diversity in cystic fibrosis sputum vs. lung samples 分析了囊性纤维化痰vs肺样本中假单胞菌多样性的RNA序列数据
Functional clustering of single cell transcriptomes 单细胞转录组的功能聚类
Image analysis of time-lapse fluorescent microscopy movies of C. elegans germline stem cell mitosis 秀丽隐杆线虫种系干细胞有丝分裂的延时荧光显微镜影像分析
Codon optimization algorithm 密码子优化算法
Established a bioinformatic pipeline to identify novel interdomain horizontal gene transfers 建立了一个生物信息学管道来识别新的区域间水平基因转移
Previously, worked to improve algorithms that take as input aligned genotypes to do things like identify segmental duplications or resolve haplotypes. But I was very well guided!! 致力于改进以输入对齐的基因型为输入的算法,以进行识别片段重复或解析单倍型等工作
My computational biology project in the Bartel lab at MIT involved building a pipeline to predict potential triggers of target-directed microRNA degradation based on their ability to pair to the microRNA 3' end (sequence complementarity) as well as conservation of that pairing. 建立一个流程,根据它们与microRNA 3'端配对的能力(序列互补)以及配对的守恒性来预测靶定向microRNA降解的潜在触发因素。
Gene interaction inference & spatial regions clustering from Spatial Transcriptome data 空间转录组数据中的基因相互作用推理和空间区域聚类
Systems biology field
Genomics sequencing analysis, phylogenetic, functional genomics analysis 基因组测序分析,系统发育,功能基因组学分析
Microbial 16S metagenomics and functional prediction 微生物16S宏基因组学与功能预测
When I UROPed with the Garg lab, I worked with a gene matrix gathered from another paper studying patients with malignant melanoma. Tasks of mine included clustering the cells and finding genes upregulated and downregulated in melanoma in R. 使用的基因矩阵是从另一篇研究恶性黑色素瘤患者的论文中收集来的。我的任务包括对细胞聚类,用R发现黑色素瘤中基因上调和下调
Populations dynamics
Built a polymer model of the e. coli chromosome in python to better understand the forces that control its dynamics. 用python建立了大肠杆菌染色体的聚合物模型,以更好地了解控制其动态变化的力量
Park Lab at Harvard Medical School - CNV calling and gene expression analysis in liver cancer TCGA data 肝癌TCGA数据中的CNV调用及基因表达分析
Pu Lab at Boston Children's Hospital - identifying regulatory networks involved in heart morphogenesis with ChIP-seq and RNA-seq to differentiate atrial and ventricular development 用ChIP-seq和RNA-seq识别参与心脏形态发生的调节网络,以区分心房和心室发育
Regev Lab - comparing integration methods for scRNA-seq that correct for batch effect and developed (still developing) method to call CNVs from Slide-seq data with a breast cancer dataset 比较纠正批处理效应的scRNA-seq集成方法和从乳腺癌数据集Slide-seq数据中调用cnv的开发(仍在开发中)方法
Single cell transcriptomics of cancer cells
I did a linear regression on a binary matrix of oncogenic mutations and MHC genotypes to find the predictive power of MHC genotype on oncological mutations 对致癌突变和MHC基因型的二元矩阵进行了线性回归,以发现MHC基因型对肿瘤突变的预测能力
Predicting PAM profiles of generated cas9 proteins with ML with may group at the Media Lab (random forests and gradient descent). Generating new potential drug molecules with Transformer models during internship at Pfizer. 在媒体实验室(随机森林和梯度下降)预测生成的cas9蛋白与ML的PAM图谱,利用Transformer模型生成新的潜在药物分子。
I have not delved much into pure computational biology (most of my aforementioned projects are in the realm of medicine) but I did work on a project this summer to utilize AlphaFold2 and its generated embeddings to develop more comprehensive binding affinity prediction and protein characteristic prediction models. 利用AlphaFold2及其生成的嵌入开发更全面的结合亲和性预测和蛋白质特征预测模型
Worked on creating a whole-cell model generation pipeline (whole-cell modeling is the building of a computational model of an individual cell) 致力于创建全细胞模型生成流程(全细胞建模是建立单个细胞的计算模型)
Identified enriched CD55 genetic variants in malaria-endemic regions by analyzing population genomic data 通过分析人群基因组数据,确定疟疾流行地区富集CD55遗传变异
I researched on ribosomal mutation in lymphoma. I worked on RNAseq and Riboseq analyses. 淋巴瘤中的核糖体突变,我从事RNAseq和Riboseq分析。
ML technqiues applied to EHR records and MRI scans (more Healthcare than compbio research) ML技术应用于EHR记录和MRI扫描(更多的医疗保健研究)
Development of sequencing analysis software
图片.png

图片.png

Q:其实对基因调控整个过程还不太了解?


图片.png

AT之间是双键,CG之间是三键。每个细胞都有2米长的DNA

你可能感兴趣的:(MIT 6.047 | 基因组学机器学习(2020·完整版)2022-11-28-1-intro)