Single-cell 10x Cell Ranger analysis

first step download SRR data

#这是批量下载
nohup prefetch -X 100GB --option-file SRR_Acc_List.txt & 
nohup fastq-dump --gzip --split-files -A ./SRR13633760  -O /home/scRNA/ &

next Build a custom reference using Cell Ranger mkref

首先,找到您物种的参考基因组 FASTA 和 GTF 文件。如果该物种可从 Ensembl 数据库中获得,我们建议使用那里的文件。来自 Ensembl 的 GTF 文件包含可选标签,使过滤变得容易。如果 Ensembl 无法获得您感兴趣的物种,也可以使用其他来源的 GTF 和 FASTA 文件。请注意,GTF 文件是必需的,而不支持 GFF 文件。(请参阅 GFF/GTF 文件格式 - 定义和支持的选项)

这个是 Ensembl 的链接,进行选取物种进行制作参考基因组文件

Single-cell 10x Cell Ranger analysis_第1张图片

这里以大鼠进行演示

分别打开图中的红色框中的内容
Single-cell 10x Cell Ranger analysis_第2张图片

点击红色框中的内容

Single-cell 10x Cell Ranger analysis_第3张图片

选取选择了顶级的FASTA文件下载

Single-cell 10x Cell Ranger analysis_第4张图片

到这里已经完成第一个文件的下载,下面展示第二个文件下载,需要打开下面图的框,选择Download GTF

Single-cell 10x Cell Ranger analysis_第5张图片

选择这个下载

Single-cell 10x Cell Ranger analysis_第6张图片

至此我们全部完成下载内容

接下来进行构建参考基因组

这一步不是在我理解不是非必要,在做一些分析时,可以根据这个排除以基因,例如 仅添加–attribute=gene_biotype:protein_coding 只做编码蛋白在所有细胞中的分析

看一下说明

(base) hwsw@shpc-2596-instance-GkVAxmvG:~$ $cellranger mkgtf
Usage:
    mkgtf <input_gtf> <output_gtf> [--attribute=KEY:VALUE...]
    mkgtf -h | --help | --version
(base) hwsw@shpc-2596-instance-GkVAxmvG:~$ $cellranger mkgtf -h
Genes GTF tool for 10x Genomics Cell Ranger.

Filter user-supplied GTF files for use as Cell Ranger-compatible
genes files for mkref tool.

The commands below should be preceded by 'cellranger':

Usage:
    mkgtf <input_gtf> <output_gtf> [--attribute=KEY:VALUE...]
    mkgtf -h | --help | --version

Arguments:
    input_gtf           Path to input genes GTF file.
    output_gtf          Path to filtered output genes GTF file.

Options:
    --attribute=<key:value>
                        Key-value pair in attributes field to be kept in the GTF
                            file.
    -h --help           Show this message.
    --version           Show version.

这个是选择想要的表型然后进行过滤

#Filter GTF
cellranger=/home/hwsw/cellranger-7.1.0/cellranger 
$cellranger mkgtf \
Rattus_norvegicus.mRatBN7.2.105.gtf Rattus_norvegicus.mRatBN7.2.105.filtered.gtf \
--attribute=gene_biotype:protein_coding \
--attribute=gene_biotype:lncRNA \
--attribute=gene_biotype:antisense \
--attribute=gene_biotype:IG_LV_gene \
--attribute=gene_biotype:IG_V_gene \
--attribute=gene_biotype:IG_V_pseudogene \
--attribute=gene_biotype:IG_D_gene \
--attribute=gene_biotype:IG_J_gene \
--attribute=gene_biotype:IG_J_pseudogene \
--attribute=gene_biotype:IG_C_gene \
--attribute=gene_biotype:IG_C_pseudogene \
--attribute=gene_biotype:TR_V_gene \
--attribute=gene_biotype:TR_V_pseudogene \
--attribute=gene_biotype:TR_D_gene \
--attribute=gene_biotype:TR_J_gene \
--attribute=gene_biotype:TR_J_pseudogene \
--attribute=gene_biotype:TR_C_gene

准备单细胞分析参考基因组

cellranger mkref --help 查看命令参数

Reference preparation tool for 10x Genomics Cell Ranger.
 
Build a Cell Ranger-compatible reference folder from user-supplied genome FASTA and gene GTF files. Creates a new folder named after the genome.
 
The commands below should be preceded by 'cellranger':
 
Usage:
    mkref
        --genome=NAME ...
        --fasta=PATH ...
        --genes=PATH ...
        [options]
    mkref -h | --help | --version
 
Arguments:
    genome  #输出文件夹      Unique genome name(s), used to name output folder
                            [a-zA-Z0-9_-]+. Specify multiple genomes by
                            specifying the --genome argument multiple times; the
                            output folder will be <name1>_and_<name2>.
    fasta   #FASTA参考基因组绝对路径 
                            Path(s) to FASTA file containing your genome reference.
                            Specify multiple genomes by specifying the --fasta
                            argument multiple times.
    genes   #.filtered.gtf注释文件绝对路径
                            Path(s) to genes GTF file(S) containing annotated genes
                            for your genome reference. Specify multiple genomes
                            by specifying the --genes argument multiple times.
 
Options:
    --nthreads=<num>    This option is currently ignored due to a bug, and will be re-enabled
                          in the next Cell Ranger release.
    --memgb=<num>       Maximum memory (GB) used when aligning reads with STAR.
                            Defaults to 16.
    --ref-version=<str> Optional reference version string to include with
                            reference.
    -h --help           Show this message.
    --version           Show version.

进行分析时间很长花费2-3小时

#Run mkref
cellranger mkref \
--genome=mRatBN7 \
--fasta=Rattus_norvegicus.mRatBN7.2.dna.toplevel.fa \
--genes=Rattus_norvegicus.mRatBN7.2.110.filtered.gtf \
--ref-version=1.0.0

我截取别人图运行截图

Apr 15 14:36:45 ..... started STAR run
Apr 15 14:36:45 ... starting to generate Genome files
Apr 15 14:38:52 ... starting to sort Suffix Array. This may take a long time...
Apr 15 14:39:03 ... sorting Suffix Array chunks and saving them to disk...
Apr 15 16:40:45 ... loading chunks from disk, packing SA...
Apr 15 16:41:47 ... finished generating suffix array
Apr 15 16:41:47 ... generating Suffix Array index
Apr 15 16:46:07 ... completed Suffix Array index
Apr 15 16:46:07 ..... processing annotations GTF
Apr 15 16:46:19 ..... inserting junctions into the genome indices
Apr 15 16:55:08 ... writing Genome to disk ...
Apr 15 16:55:23 ... writing Suffix Array to disk ...
Apr 15 16:56:00 ... writing SAindex to disk
Apr 15 16:56:08 ..... finished successfully
Creating new reference folder at /home/hanjiangang/single_Cell/example/ref/ovis_aries/ovis_aries
...done
 
Writing genome FASTA file into reference folder...
...done
 
Indexing genome FASTA file...
...done
 
Writing genes GTF file into reference folder...
...done
 
Generating STAR genome index (may take over 8 core hours for a 3Gb genome)...
...done.
 
Writing genome metadata JSON file into reference folder...
Computing hash of genome FASTA file...
...done
 
Computing hash of genes GTF file...
...done
 
...done
 
>>> Reference successfully created! <<<
You can now specify this reference on the command line:
cellranger --transcriptome=/home/hanjiangang/single_Cell/example/ref/ovis_aries/ovis_aries ..

这里面最后会生成–transcriptome=/home/hmsw/Rattus 这个文件夹

直接进行10x的标准流程

/home/hwsw/cellranger-7.1.0/cellranger count --id=SRR19145616 \
--transcriptome=/home/hwsw/Rattus \
--fastqs=/home/scRNA/SRR19145616 \
--sample=SRR19145616 \
--localcores=30 \
--localmem=300 \
--nosecondary 

你可能感兴趣的:(r语言)