snakemake分析双端测序的RNA-seq数据

上一次讲述了如何利用snakemake进行RNA-seq分析流程的构建:
RNAseq分析流程:从测序Rawdata到Count输出,但是只针对单端测序数据。这次我又改进了一点,对双端测序结果也适用。

cat pairEndsnakemake
configfile: "pairedEnd.yaml"

rule all:
        input:
                expand("results/{sample}.count",sample=config["samples"])
rule bwa:
        input:
                fw=lambda wildcards: expand("samples/{sample}_R1.fq.gz", sample=wildcards.sample),
                re=lambda wildcards: expand("samples/{sample}_R2.fq.gz", sample=wildcards.sample)

        params:INDEX=config["annotation"]["reference"]

        output:
                "results/{sample}.sam"
        shell:
                "bowtie2 -x {params.INDEX} -1 {input.fw} -2 {input.re} -S {output}"

rule sam2bam:
        input:
                "results/{sample}.sam"
        output:
                "results/{sample}.bam"
        shell:
                "samtools view -Sb {input} > {output}"

rule samSort:
        input:
                "results/{sample}.bam"
        output:
                "sorted/{sample}_sort.bam"
        shell:
                "samtools sort {input} -o {output}"

rule bamIndex:
        input:
                "sorted/{sample}_sort.bam"
        output:
                "results/{sample}.bam.bai"
        shell:
                "samtools index {input} {output}"
rule bamCount:
        input:
                sample="sorted/{sample}_sort.bam",
                annotation=config["annotation"]["gtf"]
        output:
                "results/{sample}.count"
        shell:
                "htseq-count -f bam -r name -s no {input.sample} {input.annotation} > {output}"

有需要代码指导的可以联系我,我也会持续更新,先暂时讲到这里。

你可能感兴趣的:(snakemake分析双端测序的RNA-seq数据)