7.ArchR的整合(1)

愿武艺晴小朋友一定得每天都开心 


>How MPAL 样本 onto healthy 样本? 即projection

7.ArchR的整合(1)_第1张图片     即作者的Fig2b中的部分

7.1 去读入MPAL的样本

>################## 1.读取Fragments信息文件 ###################
> #input文件路径,ArchR只需传入样本的atac_fragments.tsv.gz文件
> input.file.list <-c("./Fragments/GSM4138898_scATAC_MPAL1_T1.fragments.tsv.gz",
+                     "./Fragments/GSM4138899_scATAC_MPAL1_T2.fragments.tsv.gz",
+                     "./Fragments/GSM4138900_scATAC_MPAL2_T1.fragments.tsv.gz",
+                     "./Fragments/GSM4138901_scATAC_MPAL2_T2.fragments.tsv.gz", 
+                     "./Fragments/GSM4138902_scATAC_MPAL3_T1.fragments.tsv.gz",
+                     "./Fragments/GSM4138903_scATAC_MPAL3_T2.fragments.tsv.gz",
+                     "./Fragments/GSM4138904_scATAC_MPAL4_T1.fragments.tsv.gz",
+                     "./Fragments/GSM4138905_scATAC_MPAL4_T2.fragments.tsv.gz",
+                     "./Fragments/GSM4138906_scATAC_MPAL5_T1.fragments.tsv.gz",
+                     "./Fragments/GSM4138907_scATAC_MPAL5_T2.fragments.tsv.gz",
+                     "./Fragments/GSM4138908_scATAC_MPAL5R_T1.fragments.tsv.gz",
+                     "./Fragments/GSM4138909_scATAC_MPAL5R_T2.fragments.tsv.gz")
> #设置样本名
> sampleNames=c("MPAL1_T1","MPAL1_T2","MPAL2_T1","MPAL2_T2","MPAL3_T1","MPAL3_T2",
+               "MPAL4_T1","MPAL4_T2","MPAL5_T1","MPAL5_T2","MPAL5R_T1","MPAL5R_T2")
>  ##################### 2.创建Arrow文件 ################
> ArrowFiles <- createArrowFiles(
+   inputFiles = input.file.list,
+   sampleNames = sampleNames,
+   minTSS = 9.3,   #这个参数要调整去适应data
+   minFrags = 1100,  
+   minFragSize = 10, # 默认genomeAnnotation = genomeAnnotation,
+   maxFragSize = 2000, # 默认
+   excludeChr = c("chrM", "chrY"), #排除线粒体DNA污染和Y染色体对数据的干扰
+   addTileMat = TRUE,
+   force = TRUE,   #强制覆盖之前的Arrow文件
+   addGeneScoreMat = TRUE) #在当前目录下生成一个"QualityControl"目录

备注:#作者使用的是minTSS=8;minFrags=1000;但对我来说如果用这两个参数;细胞总数和作者对不上;因此进行调整.

> #在样本间评估双细胞分数for every single cell
> doubScores <- addDoubletScores(input = ArrowFiles,
+                                k = 10, 
+                                knnMethod = "UMAP", 
+                                LSIMethod = 1)
ArchR logging to : ArchRLogs/ArchR-addDoubletScores-e989208fe2a3-Date-2024-04-26_Time-13-07-41.log
If there is an issue, please report to github with logFile!
2024-04-26 13:07:43 : Batch Execution w/ safelapply!, 0 mins elapsed.
2024-04-26 13:07:43 : MPAL2_T1 (1 of 12) :  Computing Doublet Statistics, 0.002 mins elapsed.
Filtering 1 dims correlated > 0.75 to log10(depth + 1)
Found more than one class "dist" in cache; using the first, from namespace 'BiocGenerics'
Also defined by ‘spam’
Found more than one class "dist" in cache; using the first, from namespace 'BiocGenerics'
Also defined by ‘spam’
MPAL2_T1 (1 of 12) : UMAP Projection R^2 = 0.99534  
************************************************************
2024-04-26 13:13:05 : ERROR Found in ggplot for MPAL2_T1 (1 of 12) :  
LogFile = ArchRLogs/ArchR-addDoubletScores-e989208fe2a3-Date-2024-04-26_Time-13-07-41.log

总出现这个报错;尴尬了,ggplot2版本的依赖太多了
************************************************************

#12个样本运行这一步的时间较长;耐心等待;R^2是用于评价样本中细胞的异质性(一般R^2大于0.9);

> #使用ArchRProject过滤doublets
> projHeme2 <- filterDoublets(projHeme1,filterRatio = 2.16)  #filterRatio = 1 默认#filterRatio = 1.5 #让Matrix的版本=1.6.1.1
Filtering 2983 cells from ArchRProject!
	MPAL2_T1 : 496 of 4794 (10.3%)
	MPAL2_T2 : 333 of 3930 (8.5%)
	MPAL5_T2 : 115 of 2313 (5%)
	MPAL5_T1 : 143 of 2578 (5.5%)
	MPAL1_T1 : 351 of 4034 (8.7%)
	MPAL1_T2 : 350 of 4030 (8.7%)
	MPAL4_T1 : 410 of

你可能感兴趣的:(ArchR,表观遗传,scATAC-seq,r语言,聚类)