2022-09-16-10X-single cell 上游分析流程


#asper 下载
 id= ”批量链接“
cat id|while read id ;do (ascp -v -QT -l 400m -P33001 -k1 -i asper安装路径/etc/asperaweb_id_dsa.openssh  $id ./);done 
cd /存放文件/路径
wget   "下载链接“
#sratoolkit下载(id 以SRR等开头)
cat id|while read id ;do (/sratoolkit安装路径/bin/prefetch $id);done

step2_将.sra文件 转为fastq文件:

#利用 sratoolkit 将sra文件 转fastq文件 ,单细胞数据拆分为3个fastq 文件

ls /存放文件/路径 |while read id ;do (/sratoolkit安装路径/bin/fastq-dump  --gzip --split-files $id -O raw);done

step3_利用 fastqc 进行质控检验,输出网页文件:

ls /存放文件/路径/raw/*_2.fastq.gz | while read id;do(fastqc $id -o /存放文件/路径/raw);done
multiqc  /存放文件/路径/raw/  -o  /存放multiqc文件/路径/

#利用 trim_galore 去除低质量reads和接头
#trim_galore -q 25 --phred33 --length 55 -e 0.1 --stringency 3 --paired  /存放文件/路径/raw/SRR*_1.fastq.gz    /存放文件/路径/raw/SRR*_1_2.fastq.gz -o  /存放文件/路径/clean
#trim_galore  --quality 25  --phred33   --length 36    /存放文件/路径/raw/SRR*_1.fastq.gz   -o  /存放文件/路径/clean

#ls /存放文件/路径/raw//*_.fastq.gz | while read id;do(trim_galore  --quality 25  --phred33   --length 36 $id -o  /存放文件/路径/clean);done


rename [目前存在的字符串] [想要的字符串]  * 


cellranger mkgtf genes.gtf Mus_musculus.GRCm39.filtered.gtf \
                 --attribute=gene_biotype:protein_coding \
                 --attribute=gene_biotype:lincRNA \
                 --attribute=gene_biotype:antisense \
                 --attribute=gene_biotype:miRNA \
                 --attribute=gene_biotype:IG_LV_gene \
                 --attribute=gene_biotype:IG_V_gene \
                 --attribute=gene_biotype:IG_V_pseudogene \
                 --attribute=gene_biotype:IG_D_gene \
                 --attribute=gene_biotype:IG_J_gene \
                 --attribute=gene_biotype:IG_J_pseudogene \
                 --attribute=gene_biotype:IG_C_gene \
                 --attribute=gene_biotype:IG_C_pseudogene \
                 --attribute=gene_biotype:TR_V_gene \
                 --attribute=gene_biotype:TR_V_pseudogene \
                 --attribute=gene_biotype:TR_D_gene \
                 --attribute=gene_biotype:TR_J_gene \
                 --attribute=gene_biotype:TR_J_pseudogene \


cellranger mkref --genome=GRCm39 \
--fasta= Mus_musculus.GRCm39.dna.primary_assembly.fa \


Apr 15 14:36:45 ..... started STAR run
Apr 15 14:36:45 ... starting to generate Genome files
Apr 15 14:38:52 ... starting to sort Suffix Array. This may take a long time...
Apr 15 14:39:03 ... sorting Suffix Array chunks and saving them to disk...
Apr 15 16:40:45 ... loading chunks from disk, packing SA...
Apr 15 16:41:47 ... finished generating suffix array
Apr 15 16:41:47 ... generating Suffix Array index
Apr 15 16:46:07 ... completed Suffix Array index
Apr 15 16:46:07 ..... processing annotations GTF
Apr 15 16:46:19 ..... inserting junctions into the genome indices
Apr 15 16:55:08 ... writing Genome to disk ...
Apr 15 16:55:23 ... writing Suffix Array to disk ...
Apr 15 16:56:00 ... writing SAindex to disk
Apr 15 16:56:08 ..... finished successfully
Creating new reference folder at 

Writing genome FASTA file into reference folder...
Indexing genome FASTA file...
Writing genes GTF file into reference folder...
Generating STAR genome index (may take over 8 core hours for a 3Gb genome)...
Writing genome metadata JSON file into reference folder...
Computing hash of genome FASTA file...
Computing hash of genes GTF file...
>>> Reference successfully created! <<<
You can now specify this reference on the command line:
cellranger --transcriptome=


 cellranger count --id=存放文件的名字 \
--transcriptome=/index得到的结果/GRCm39/ \
--fastqs=/样本fastq文件路径 \
--sample=要分析的样本 \
--localmem=100 \

2022-09-15 18:05:22 [runtime] (disabled)        ID.2022_9_14_mouse_brain.SC_RNA_COUNTER_CS.SC_RNA_COUNTER._CRISPR_ANALYZER.SUMMARIZE_CRISPR_ANALYSIS
2022-09-15 18:05:22 [runtime] (disabled)        ID.2022_9_14_mouse_brain.SC_RNA_COUNTER_CS.SC_RNA_COUNTER._CRISPR_ANALYZER
2022-09-15 18:05:26 [runtime] (chunks_complete) ID.2022_9_14_mouse_brain.SC_RNA_COUNTER_CS.SC_RNA_COUNTER.SUMMARIZE_REPORTS
2022-09-15 18:05:26 [runtime] (ready)           ID.2022_9_14_mouse_brain.SC_RNA_COUNTER_CS.CLOUPE_PREPROCESS
2022-09-15 18:05:26 [runtime] (run:local)       ID.2022_9_14_mouse_brain.SC_RNA_COUNTER_CS.CLOUPE_PREPROCESS.fork0.split
2022-09-15 18:05:27 [runtime] (split_complete)  ID.2022_9_14_mouse_brain.SC_RNA_COUNTER_CS.CLOUPE_PREPROCESS
2022-09-15 18:05:27 [runtime] (run:local)       ID.2022_9_14_mouse_brain.SC_RNA_COUNTER_CS.CLOUPE_PREPROCESS.fork0.chnk0.main
2022-09-15 18:05:40 [runtime] (chunks_complete) ID.2022_9_14_mouse_brain.SC_RNA_COUNTER_CS.CLOUPE_PREPROCESS
2022-09-15 18:05:40 [runtime] (run:local)       ID.2022_9_14_mouse_brain.SC_RNA_COUNTER_CS.CLOUPE_PREPROCESS.fork0.join
2022-09-15 18:05:41 [runtime] (join_complete)   ID.2022_9_14_mouse_brain.SC_RNA_COUNTER_CS.CLOUPE_PREPROCESS

- Run summary HTML:                         /文件存放路径/outs/web_summary.html
- Run summary CSV:                          /文件存放路径/outs/metrics_summary.csv
- BAM:                                      /文件存放路径/outs/possorted_genome_bam.bam
- BAM index:                                /文件存放路径/outs/possorted_genome_bam.bam.bai
- Filtered feature-barcode matrices MEX:    /文件存放路径/outs/filtered_feature_bc_matrix
- Filtered feature-barcode matrices HDF5:   /文件存放路径/outs/filtered_feature_bc_matrix.h5
- Unfiltered feature-barcode matrices MEX:  /文件存放路径/outs/raw_feature_bc_matrix
- Unfiltered feature-barcode matrices HDF5: /文件存放路径/outs/raw_feature_bc_matrix.h5
- Secondary analysis output CSV:            /文件存放路径/outs/analysis
- Per-molecule read information:            /文件存放路径/outs/molecule_info.h5
- CRISPR-specific analysis:                 null
- Loupe Cell Browser file:                  /文件存放路径/outs/cloupe.cloupe
- Feature Reference:                        null

Waiting 6 seconds for UI to do final refresh.
Pipestance completed successfully!

2022-09-15 18:05:47 Shutting down.
Saving pipestance info to "2022_9_14_mouse_brain/2022_9_14_mouse_brain.mri.tgz"

