2022-09-16-10X-single cell 上游分析流程

step1_从网站上下载.sra文件:

#asper 下载
 id= ”批量链接“
cat id|while read id ;do (ascp -v -QT -l 400m -P33001 -k1 -i asper安装路径/etc/asperaweb_id_dsa.openssh  $id ./);done 
#wget下载(时快时慢)
cd /存放文件/路径
wget   "下载链接“
#sratoolkit下载(id 以SRR等开头)
cat id|while read id ;do (/sratoolkit安装路径/bin/prefetch $id);done

step2_将.sra文件 转为fastq文件:

#利用 sratoolkit 将sra文件 转fastq文件 ,单细胞数据拆分为3个fastq 文件

ls /存放文件/路径 |while read id ;do (/sratoolkit安装路径/bin/fastq-dump  --gzip --split-files $id -O raw);done


step3_利用 fastqc 进行质控检验,输出网页文件:


ls /存放文件/路径/raw/*_2.fastq.gz | while read id;do(fastqc $id -o /存放文件/路径/raw);done
#利用multiqc整合质控结果:
multiqc  /存放文件/路径/raw/  -o  /存放multiqc文件/路径/


#利用 trim_galore 去除低质量reads和接头
#(paired)
#trim_galore -q 25 --phred33 --length 55 -e 0.1 --stringency 3 --paired  /存放文件/路径/raw/SRR*_1.fastq.gz    /存放文件/路径/raw/SRR*_1_2.fastq.gz -o  /存放文件/路径/clean
#(单个)
#trim_galore  --quality 25  --phred33   --length 36    /存放文件/路径/raw/SRR*_1.fastq.gz   -o  /存放文件/路径/clean

#批量操作
#ls /存放文件/路径/raw//*_.fastq.gz | while read id;do(trim_galore  --quality 25  --phred33   --length 36 $id -o  /存放文件/路径/clean);done


step4_批量改名

rename [目前存在的字符串] [想要的字符串]  * 

step5_标准流程
修改gtf文件

cellranger mkgtf genes.gtf Mus_musculus.GRCm39.filtered.gtf \
                 --attribute=gene_biotype:protein_coding \
                 --attribute=gene_biotype:lincRNA \
                 --attribute=gene_biotype:antisense \
                 --attribute=gene_biotype:miRNA \
                 --attribute=gene_biotype:IG_LV_gene \
                 --attribute=gene_biotype:IG_V_gene \
                 --attribute=gene_biotype:IG_V_pseudogene \
                 --attribute=gene_biotype:IG_D_gene \
                 --attribute=gene_biotype:IG_J_gene \
                 --attribute=gene_biotype:IG_J_pseudogene \
                 --attribute=gene_biotype:IG_C_gene \
                 --attribute=gene_biotype:IG_C_pseudogene \
                 --attribute=gene_biotype:TR_V_gene \
                 --attribute=gene_biotype:TR_V_pseudogene \
                 --attribute=gene_biotype:TR_D_gene \
                 --attribute=gene_biotype:TR_J_gene \
                 --attribute=gene_biotype:TR_J_pseudogene \
                 --attribute=gene_biotype:TR_C_gene

建立索引

#index 
cellranger mkref --genome=GRCm39 \
--fasta= Mus_musculus.GRCm39.dna.primary_assembly.fa \
--genes=Mus_musculus.GRCm39.filtered.gtf 

出现以下算成功

Apr 15 14:36:45 ..... started STAR run
Apr 15 14:36:45 ... starting to generate Genome files
Apr 15 14:38:52 ... starting to sort Suffix Array. This may take a long time...
Apr 15 14:39:03 ... sorting Suffix Array chunks and saving them to disk...
Apr 15 16:40:45 ... loading chunks from disk, packing SA...
Apr 15 16:41:47 ... finished generating suffix array
Apr 15 16:41:47 ... generating Suffix Array index
Apr 15 16:46:07 ... completed Suffix Array index
Apr 15 16:46:07 ..... processing annotations GTF
Apr 15 16:46:19 ..... inserting junctions into the genome indices
Apr 15 16:55:08 ... writing Genome to disk ...
Apr 15 16:55:23 ... writing Suffix Array to disk ...
Apr 15 16:56:00 ... writing SAindex to disk
Apr 15 16:56:08 ..... finished successfully
Creating new reference folder at 

...done
 
Writing genome FASTA file into reference folder...
...done
 
Indexing genome FASTA file...
...done
 
Writing genes GTF file into reference folder...
...done
 
Generating STAR genome index (may take over 8 core hours for a 3Gb genome)...
...done.
 
Writing genome metadata JSON file into reference folder...
Computing hash of genome FASTA file...
...done
 
Computing hash of genes GTF file...
...done
 
...done
 
>>> Reference successfully created! <<<
 
You can now specify this reference on the command line:
cellranger --transcriptome=

定量

#count 
 cellranger count --id=存放文件的名字 \
--transcriptome=/index得到的结果/GRCm39/ \
--fastqs=/样本fastq文件路径 \
--sample=要分析的样本 \
--localmem=100 \
--localcore=10

PR_ANALYSIS
2022-09-15 18:05:22 [runtime] (disabled)        ID.2022_9_14_mouse_brain.SC_RNA_COUNTER_CS.SC_RNA_COUNTER._CRISPR_ANALYZER.SUMMARIZE_CRISPR_ANALYSIS
2022-09-15 18:05:22 [runtime] (disabled)        ID.2022_9_14_mouse_brain.SC_RNA_COUNTER_CS.SC_RNA_COUNTER._CRISPR_ANALYZER
2022-09-15 18:05:26 [runtime] (chunks_complete) ID.2022_9_14_mouse_brain.SC_RNA_COUNTER_CS.SC_RNA_COUNTER.SUMMARIZE_REPORTS
2022-09-15 18:05:26 [runtime] (ready)           ID.2022_9_14_mouse_brain.SC_RNA_COUNTER_CS.CLOUPE_PREPROCESS
2022-09-15 18:05:26 [runtime] (run:local)       ID.2022_9_14_mouse_brain.SC_RNA_COUNTER_CS.CLOUPE_PREPROCESS.fork0.split
2022-09-15 18:05:27 [runtime] (split_complete)  ID.2022_9_14_mouse_brain.SC_RNA_COUNTER_CS.CLOUPE_PREPROCESS
2022-09-15 18:05:27 [runtime] (run:local)       ID.2022_9_14_mouse_brain.SC_RNA_COUNTER_CS.CLOUPE_PREPROCESS.fork0.chnk0.main
2022-09-15 18:05:40 [runtime] (chunks_complete) ID.2022_9_14_mouse_brain.SC_RNA_COUNTER_CS.CLOUPE_PREPROCESS
2022-09-15 18:05:40 [runtime] (run:local)       ID.2022_9_14_mouse_brain.SC_RNA_COUNTER_CS.CLOUPE_PREPROCESS.fork0.join
2022-09-15 18:05:41 [runtime] (join_complete)   ID.2022_9_14_mouse_brain.SC_RNA_COUNTER_CS.CLOUPE_PREPROCESS

Outputs:
- Run summary HTML:                         /文件存放路径/outs/web_summary.html
- Run summary CSV:                          /文件存放路径/outs/metrics_summary.csv
- BAM:                                      /文件存放路径/outs/possorted_genome_bam.bam
- BAM index:                                /文件存放路径/outs/possorted_genome_bam.bam.bai
- Filtered feature-barcode matrices MEX:    /文件存放路径/outs/filtered_feature_bc_matrix
- Filtered feature-barcode matrices HDF5:   /文件存放路径/outs/filtered_feature_bc_matrix.h5
- Unfiltered feature-barcode matrices MEX:  /文件存放路径/outs/raw_feature_bc_matrix
- Unfiltered feature-barcode matrices HDF5: /文件存放路径/outs/raw_feature_bc_matrix.h5
- Secondary analysis output CSV:            /文件存放路径/outs/analysis
- Per-molecule read information:            /文件存放路径/outs/molecule_info.h5
- CRISPR-specific analysis:                 null
- Loupe Cell Browser file:                  /文件存放路径/outs/cloupe.cloupe
- Feature Reference:                        null

Waiting 6 seconds for UI to do final refresh.
Pipestance completed successfully!

2022-09-15 18:05:47 Shutting down.
Saving pipestance info to "2022_9_14_mouse_brain/2022_9_14_mouse_brain.mri.tgz"


你可能感兴趣的:(2022-09-16-10X-single cell 上游分析流程)