indrop数据分析

软件:https://github.com/indrops/indrops

第一步先将软件下载下来,采用git clone https://github.com/indrops/indrops.git
根据说明先装requires,python,RSEM,bowtie,samtools,java,

indrop数据分析_第1张图片
image.png

再根据说明建index,

mkdir -pv DOWNLOAD_DIR
cd DOWNLOAD_DIR

# Download the soft-masked, primary assembly Genome Fasta file
wget ftp://ftp.ensembl.org/pub/release-85/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa.gz

# Download the corresponding GTF file.
wget ftp://ftp.ensembl.org/pub/release-85/gtf/homo_sapiens/Homo_sapiens.GRCh38.85.gtf.gz

# This command will go through all the steps for creating the index
python indrops.py project.yaml build_index \
    --genome-fasta-gz DOWNLOAD_DIR/Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa.gz \
    --ensembl-gtf-gz DOWNLOAD_DIR/Homo_sapiens.GRCh38.85.gtf.gz

跑这一步需要用到project.yaml。
这里是我配置的文件,

project_name : "test"
project_dir : "/work/03.indrop_data"

paths : 
  bowtie_index : "/work/03.indrop_data/DOWNLOAD_DIR"  # 由于bowtie index要建的地址,一定要写到DOWNLEAD_DIR,否则会报错找不到ref。
  bowtie_dir : "/software/biosoftware/bowtie-1.2.2-linux-x86_64" # 这是bowtie安装路径,下载,解压就可以了,
  python_dir : "/root/anaconda2/bin" # python 安装路径,
  samtools_dir : "/software/biosoftware/samtools-1.3.1/bin/samtools" #samtools 安装路径
  rsem_dir : "/software/biosoftware/RSEM-1.3.1/" # rsem 安装路径
  java_dir : "/usr/bin/"  # java安装路径

sequencing_runs : 
  - name : "Test_du"  # 随便起名
    version : 'v1'
    dir : "/work/03.indrop_data/"  # 这里是data的路径
    fastq_path : "{library_prefix}_{split_affix}_{read}_001.fastq.gz"  read是R1,R2两个,
    split_affixes : ["L007"]
    libraries : 
      - {library_name: "L007", library_prefix: "WBJPE18020236_HMWMYCCXY_L7_WBJPE18020236_20180818_P_S1"}
# 所以fastq名称应该是 WBJPE18020236_HMWMYCCXY_L7_WBJPE18020236_20180818_P_S1_L007_R1_001.fastq.gz
parameters : # OPTIONAL PARAMETERS # 这些都是默认参数。
  umi_quantification_arguments:
    m : 10 #Ignore reads with more than M alignments, after filtering on distance from transcript end.
    u : 1 #Ignore counts from UMI that should be split among more than U genes.
    d : 600 #Maximal distance from transcript end, NOT INCLUDING THE POLYA TAIL
    split-ambigs: False #If umi is assigned to m genes, add 1/m to each gene's count (instead of 1)
    min_non_polyA: 15 #Require reads to align to this much non-polyA sequence. (Set to 0 to disable filtering on this parameter.)
  output_arguments:
    output_unaligned_reads_to_other_fastq: False
    filter_alignments_to_softmasked_regions: False
    # low_complexity_mask: False
  bowtie_arguments:
    m : 200
    n : 1
    l : 15
    e : 80
  trimmomatic_arguments:
    LEADING: "28"
    SLIDINGWINDOW: "4:20"
    MINLEN: "16"
    argument_order: ['LEADING','SLIDINGWINDOW','MINLEN']
  low_complexity_filter_arguments:
    max_low_complexity_fraction: 0.50

你可能感兴趣的:(indrop数据分析)