愿武艺晴小朋友一定得每天都开心
7.1 去读入MPAL的样本
>################## 1.读取Fragments信息文件 ################### > #input文件路径,ArchR只需传入样本的atac_fragments.tsv.gz文件 > input.file.list <-c("./Fragments/GSM4138898_scATAC_MPAL1_T1.fragments.tsv.gz", + "./Fragments/GSM4138899_scATAC_MPAL1_T2.fragments.tsv.gz", + "./Fragments/GSM4138900_scATAC_MPAL2_T1.fragments.tsv.gz", + "./Fragments/GSM4138901_scATAC_MPAL2_T2.fragments.tsv.gz", + "./Fragments/GSM4138902_scATAC_MPAL3_T1.fragments.tsv.gz", + "./Fragments/GSM4138903_scATAC_MPAL3_T2.fragments.tsv.gz", + "./Fragments/GSM4138904_scATAC_MPAL4_T1.fragments.tsv.gz", + "./Fragments/GSM4138905_scATAC_MPAL4_T2.fragments.tsv.gz", + "./Fragments/GSM4138906_scATAC_MPAL5_T1.fragments.tsv.gz", + "./Fragments/GSM4138907_scATAC_MPAL5_T2.fragments.tsv.gz", + "./Fragments/GSM4138908_scATAC_MPAL5R_T1.fragments.tsv.gz", + "./Fragments/GSM4138909_scATAC_MPAL5R_T2.fragments.tsv.gz") > #设置样本名 > sampleNames=c("MPAL1_T1","MPAL1_T2","MPAL2_T1","MPAL2_T2","MPAL3_T1","MPAL3_T2", + "MPAL4_T1","MPAL4_T2","MPAL5_T1","MPAL5_T2","MPAL5R_T1","MPAL5R_T2") > ##################### 2.创建Arrow文件 ################ > ArrowFiles <- createArrowFiles( + inputFiles = input.file.list, + sampleNames = sampleNames, + minTSS = 9.3, #这个参数要调整去适应data + minFrags = 1100, + minFragSize = 10, # 默认genomeAnnotation = genomeAnnotation, + maxFragSize = 2000, # 默认 + excludeChr = c("chrM", "chrY"), #排除线粒体DNA污染和Y染色体对数据的干扰 + addTileMat = TRUE, + force = TRUE, #强制覆盖之前的Arrow文件 + addGeneScoreMat = TRUE) #在当前目录下生成一个"QualityControl"目录
备注:#作者使用的是minTSS=8;minFrags=1000;但对我来说如果用这两个参数;细胞总数和作者对不上;因此进行调整.
> #在样本间评估双细胞分数for every single cell > doubScores <- addDoubletScores(input = ArrowFiles, + k = 10, + knnMethod = "UMAP", + LSIMethod = 1) ArchR logging to : ArchRLogs/ArchR-addDoubletScores-e989208fe2a3-Date-2024-04-26_Time-13-07-41.log If there is an issue, please report to github with logFile! 2024-04-26 13:07:43 : Batch Execution w/ safelapply!, 0 mins elapsed. 2024-04-26 13:07:43 : MPAL2_T1 (1 of 12) : Computing Doublet Statistics, 0.002 mins elapsed. Filtering 1 dims correlated > 0.75 to log10(depth + 1) Found more than one class "dist" in cache; using the first, from namespace 'BiocGenerics' Also defined by ‘spam’ Found more than one class "dist" in cache; using the first, from namespace 'BiocGenerics' Also defined by ‘spam’ MPAL2_T1 (1 of 12) : UMAP Projection R^2 = 0.99534
************************************************************ 2024-04-26 13:13:05 : ERROR Found in ggplot for MPAL2_T1 (1 of 12) : LogFile = ArchRLogs/ArchR-addDoubletScores-e989208fe2a3-Date-2024-04-26_Time-13-07-41.log总出现这个报错;尴尬了,ggplot2版本的依赖太多了 ************************************************************
#12个样本运行这一步的时间较长;耐心等待;R^2是用于评价样本中细胞的异质性(一般R^2大于0.9);
> #使用ArchRProject过滤doublets > projHeme2 <- filterDoublets(projHeme1,filterRatio = 2.16) #filterRatio = 1 默认#filterRatio = 1.5 #让Matrix的版本=1.6.1.1 Filtering 2983 cells from ArchRProject! MPAL2_T1 : 496 of 4794 (10.3%) MPAL2_T2 : 333 of 3930 (8.5%) MPAL5_T2 : 115 of 2313 (5%) MPAL5_T1 : 143 of 2578 (5.5%) MPAL1_T1 : 351 of 4034 (8.7%) MPAL1_T2 : 350 of 4030 (8.7%) MPAL4_T1 : 410 of