Running Trinity in multiple steps

Running Trinity in multiple steps

Trinity (trinityrnaseq.sourceforge.net) is a software package combining three independent software modules (Inchworm, Chrysalis, Butterfly) to process large volumes of RNA-seq reads.  Running Trinity from beginning to end on large data sets may exceed the walltime limit for a single job.  Trinity provides a mechanism to run the workflow in four separate steps.  Each step may be run as its own job, providing a workaround for the single job walltime limit.   This page describes how to run Trinity in this manner under the SLURM scheduler and provides example submit scripts.

Generally, the same Trinity command is run for each step, aside from one option that determines how far Trinity will progress before stopping.  On the last step, the Trinity command is run as normal.  For example,

# Step 1
Trinity.pl <options> --no_run_chrysalis
# Step 2
Trinity.pl <options> --no_run_quantifygraph
# Step 3
Trinity.pl <options> --no_run_butterfly
# Step 4
Trinity.pl <options>

 

SLURM submit scripts that will request 16 CPUs and 200GB of RAM for each step are given as examples.

trinity_step1.submit
#!/bin/sh
#SBATCH --job-name=trinity_step1
#SBATCH --time=168:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=16
#SBATCH --mem=200gb
#SBATCH --output=trinity_step1.stdout
#SBATCH --error=trinity_step1.stderr
 
module load trinity/r2013-02-25bowtie/1.0.0
Trinity.pl --output trinity_out  --seqType fq --JM 200G --left leftreads.fastq \
--right rightreads.fastq --CPU $SLURM_NTASKS_PER_NODE  --inchworm_cpu $SLURM_NTASKS_PER_NODE \
--bflyCPU $SLURM_NTASKS_PER_NODE --no_run_chrysalis
trinity_step2.submit
#!/bin/sh
#SBATCH --job-name=trinity_step2
#SBATCH --time=168:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=16
#SBATCH --mem=200gb
#SBATCH --output=trinity_step2.stdout
#SBATCH --error=trinity_step2.stderr
 
module load trinity/r2013-02-25bowtie/1.0.0
Trinity.pl --output trinity_out  --seqType fq --JM 200G --left leftreads.fastq \
--right rightreads.fastq --CPU $SLURM_NTASKS_PER_NODE  --inchworm_cpu $SLURM_NTASKS_PER_NODE \
--bflyCPU $SLURM_NTASKS_PER_NODE --no_run_quantifygraph
trinity_step3.submit
#!/bin/sh
#SBATCH --job-name=trinity_step3
#SBATCH --time=168:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=16
#SBATCH --mem=200gb
#SBATCH --output=trinity_step3.stdout
#SBATCH --error=trinity_step3.stderr
 
module load trinity/r2013-02-25bowtie/1.0.0
Trinity.pl --output trinity_out  --seqType fq --JM 200G --left leftreads.fastq \
--right rightreads.fastq --CPU $SLURM_NTASKS_PER_NODE  --inchworm_cpu $SLURM_NTASKS_PER_NODE \
--bflyCPU $SLURM_NTASKS_PER_NODE --no_run_butterfly
trinity_step4.submit
#!/bin/sh
#SBATCH --job-name=trinity_step4
#SBATCH --time=168:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=16
#SBATCH --mem=200gb
#SBATCH --output=trinity_step4.stdout
#SBATCH --error=trinity_step4.stderr
 
module load trinity/r2013-02-25bowtie/1.0.0
Trinity.pl --output trinity_out  --seqType fq --JM 200G --left leftreads.fastq \
--right rightreads.fastq --CPU $SLURM_NTASKS_PER_NODE  --inchworm_cpu $SLURM_NTASKS_PER_NODE \
--bflyCPU $SLURM_NTASKS_PER_NODE

 

The job dependency feature of SLURM can be used to run each step sequentially as the previous step completes.  All four jobs can be submitted at once and they will run in the proper order without needing any further interaction from the user.  The job ID of each step is used in the submit command for the next to order the jobs.  Assuming the four scripts above are saved in the working directory with the input dataset, they would be submitted as follows:

 

Example Trinity submission
$ sbatch trinity_step1.submit
Submitted batch job 366910
$ sbatch -d afterok:366910 trinity_step2.submit
Submitted batch job 366911
$ sbatch -d afterok:366911 trinity_step3.submit
Submitted batch job 366912
$ sbatch -d afterok:366912 trinity_step4.submit
Submitted batch job 366913

The -d afterok option instructs SLURM to only run the submitted job if the existing specified job completes successfully.  If for some reason Trinity exits with an error code for one step, SLURM will not run the next step.

 

Tips: Check Command

1.Check the status of your job:

Example: Check Your Job Status
$ squeue -u <username>

Output:

JobID                        JobName      State ExitCode               Start                 End    Elapsed 
------------ ------------------------------ ---------- -------- ------------------- ------------------- ---------- 
[<username>@login.tusker ~]$ squeue -u <username>
             JOBID PARTITION     NAME     USER    ST       TIME    NODES  NODELIST(REASON)
            426290     batch trinity_ <username>  PD       0:00      1   (Dependency)
            426291     batch trinity_ <username>  PD       0:00      1   (Dependency)
            426289     batch trinity_ <username>   R    10:33:59     1   c2417

 

2.Check a specific JOB,such as JOBID=426289

Example to check JOBID:426289
$scontrol show job426289
[<username>@login.tusker ~]$ scontrol show job 426289
JobId=426289 Name=trinity_step2
   UserId=<username>(3557) GroupId=<groupname>(11156)
   Priority=30208 Account=<groupname> QOS=normal
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 ExitCode=0:0
   RunTime=10:38:38 TimeLimit=7-00:00:00 TimeMin=N/A
   SubmitTime=2013-08-19T15:12:44 EligibleTime=2013-08-21T00:36:51
   StartTime=2013-08-21T00:37:09 EndTime=2013-08-28T00:37:09
   PreemptTime=None SuspendTime=None SecsPreSuspend=0
   Partition=batch AllocNode:Sid=login:62036
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=c2417
   BatchHost=c2417
   NumNodes=1 NumCPUs=16 CPUs/Task=1 ReqS:C:T=*:*:*
   MinCPUsNode=16 MinMemoryNode=250G MinTmpDiskNode=0
   Features=(null) Gres=(null) Reservation=(null)
   Shared=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=/lustre/work/entomology/hwang4/WCR_RNAseq_2013/Fallarmyworm/trinity_step2.submit
   WorkDir=/lustre/work/entomology/hwang4/WCR_RNAseq_2013/Fallarmyworm

 

 

3.Check your job history after a specific date.  For example, all jobs run since 08-14-2013.
Example: Check Your Job History After A Specific Date
$ sacct -u <username> -S081413-o JobId,JobName%30,State,ExitCode,Start,End,Elapse

Output:

JobID                        JobName      State ExitCode               Start                 End    Elapsed 
------------ ------------------------------ ---------- -------- ------------------- ------------------- ---------- 
382339                        trinity_step1  COMPLETED      0:0 2013-08-13T09:47:18 2013-08-13T22:03:39   12:16:21 
382339.batc+                          batch  COMPLETED      0:0 2013-08-13T09:47:18 2013-08-13T22:03:39   12:16:21 
382846                        trinity_step2 CANCELLED+      0:0 2013-08-13T22:03:39 2013-08-14T15:40:45   17:37:06 
426288                        trinity_step1    RUNNING      0:0 2013-08-20T15:24:23             Unknown   00:14:21 
426289                        trinity_step2    PENDING      0:0             Unknown             Unknown   00:00:00 
426290                        trinity_step3    PENDING      0:0             Unknown             Unknown   00:00:00 
426291                        trinity_step4    PENDING      0:0             Unknown             Unknown   00:00:00


你可能感兴趣的:(trinity,rnaseq,分步)