SLUPipe has been developed to be compatible with High Throughput Computing, using SLURM Job Scheduling.
SLUPipe Execution:
Step 1: Construct and provide a base JSON configuration file providing same arguments as before with the inclusion of two new key values:
- Number of Nodes : Nodes used during HPC Workflow
- Node Samples : Samples processed per node during HPC Workflow
Please Note: SLUPipe in HTC mode will process ALL samples found within the input directory.
HPC Base Configuration File Example
[
{
"Pipeline_Mode":"-T",
"Variant_Callers":["Pindel","Platypus"],
"Input_Directory":"/student/foo/SLUPipe/src/input",
"Output_Directory":"student/foo/SLUPipe/src/output",
"Chromosome_Range": "chr1:16,000,000-215,000,000",
"vep_ScriptPath": "/student/foo/.conda/envs/SLUPipe/share/ensembl-vep-95.3-0",
"vep_CachePath": "/student/foo/.vep",
"reference_directory": "/student/foo/referenceFiles",
"nodes": "2",
"node_samples": [] <- Must always be empty list
}
]
Step 2: Execute the following script to adapt workload for SLURM compatibility:
$ python3 gen_batches.py <base_configuration_file>
This scripts divides all the samples found in the input directory into smaller jobs by generating new JSON files, each representing a portion of a the total workload:
Example:
Input Directory:
-> Demo1_T.bam
-> Demo1_N.bam
-> Demo2_T.bam
-> Demo2_N.bam
2 Samples / 2 Nodes = 1 Sample Per Job:
Auto Generated JSON 1:
[
{
"Pipeline_Mode":"-T",
"Variant_Callers":["Pindel","Platypus"],
"Input_Directory":"/student/foo/SLUPipe/src/input",
"Output_Directory":"student/foo/SLUPipe/src/output",
"Chromosome_Range": "chr1:16,000,000-215,000,000",
"vep_ScriptPath": "/student/foo/.conda/envs/SLUPipe/share/ensembl-vep-95.3-0",
"vep_CachePath": "/student/foo/.vep",
"reference_directory": "/student/foo/referenceFiles",
"nodes": "2",
"node_samples:["Demo1_T.bam","Demo1_N.bam"]
}
]
Auto Generated JSON 2:
[
{
"Pipeline_Mode":"-T",
"Variant_Callers":["Pindel","Platypus"],
"Input_Directory":"/student/foo/SLUPipe/src/input",
"Output_Directory":"student/foo/SLUPipe/src/output",
"Chromosome_Range": "chr1:16,000,000-215,000,000",
"vep_ScriptPath": "/student/foo/.conda/envs/SLUPipe/share/ensembl-vep-95.3-0",
"vep_CachePath": "/student/foo/.vep",
"reference_directory": "/student/foo/referenceFiles",
"nodes": "2",
"node_samples:["Demo2_T.bam","Demo2_N.bam"]
}
]
Step 3: Create SLURM compatible BASH script to send jobs to SLURM Job Scheduler:
#1/bin/bash
source activate SLUPipe
for FILE in *.json:
echo ${FILE}; do
sbatch -n 2 -t 1-00:00 --job-name=SLUPipe --cpus-per-task=10 --partition=medmem --wrap="python3 slupipe_apex.py ${FILE}"
sleep 1
done
Step 4: Run BASH Script
$ ./run_slupipe_hpc.sh
Each job’s results will be placed in the output directory specified in base configuration JSON file.