Expanding phylociraptors functionality¶
Phylociraptor comes with a large number of tools already built in. However it is by no way a closed system. It is entirely possible to add additional tools the expand the functionality of the phylociraptor framework. As an example, we will walk you through how you can add an additional fictional aligner called MYALIGNER to phylociraptor.
Prerequisits¶
Several things are necessary to be able to add your tool of choice:
A software container with the executable you would like to use. For example you could search for a container on BioContainer , Docker Hub or you could create your own container.
Basic knowledge of Snakemake.
Steps to add a new aligner¶
The required steps to add a new aligner include:
Creating a new rule file or modifying an old one for the aligner and placing it in the
rules/aligners/directory.Creating a new entry in the
config.yaml.templatefile you would like to use.
Optional steps:
Add the container for the software to the
data/containers.yamlfile.Expanding the cluster config file to include the new tool.
Now let’s look at the steps in detail:
1. Creating/Modifying the rule file¶
The easiest way to do this is to look at a rule file of an already present aligner. We can the rule for the Clustal-O as an example.
First we need to make a copy of the file which we will later modify.
Warning
In this example we will use the term MYALIGNER as placeholder. If you apply this to your own tool, you will have the substitute MYALIGNER with the actual name of the tool. This must be done for ALL occurences that are mentioned here.
$ cp rules/aligners/clustalo.smk rules/aligners/MYALIGNER.smk
$ cat rules/aligners/MYALIGNER.smk
rule clustalo:
input:
"results/alignments/full/clustalo.{hash}/parameters.align.clustalo.{hash}.yaml",
sequence_file = "results/orthology/single-copy-orthologs_deduplicated." + hashes["filter-orthology"]["global"] + "/{busco}_all.fas",
output:
alignment = "results/alignments/full/clustalo.{hash}/{busco}_aligned.fas",
benchmark:
"results/statistics/benchmarks/align/clustalo_align_{busco}.{hash}.txt"
log:
"log/align/clustalo/clustalo_align_{busco}.{hash}.log.txt"
singularity:
containers["clustalo"]
threads:
int(config["alignment"]["threads"])
params:
config["alignment"]["options"]["clustalo"]
shell:
"""
clustalo -i {input.sequence_file} --threads={threads} $(if [[ "{params}" != "None" ]]; then echo {params}; fi) 1> {output.alignment} 2> {log}
"""
You can see the code which is used to run the Clustal-O aligner. We have to change all occrences of clustalo in this file to our new aligner MYALIGNER. This is how it should look:
1rule MYALIGNER:
2 input:
3 "results/alignments/full/MYALIGNER.{hash}/parameters.align.MYALIGNER.{hash}.yaml",
4 sequence_file = "results/orthology/single-copy-orthologs_deduplicated." + hashes["filter-orthology"]["global"] + "/{busco}_all.fas",
5 output:
6 alignment = "results/alignments/full/MYALIGNER.{hash}/{busco}_aligned.fas",
7 benchmark:
8 "results/statistics/benchmarks/align/MYALIGNER_align_{busco}.{hash}.txt"
9 log:
10 "log/align/MYALIGNER/MYALIGNER_align_{busco}.{hash}.log.txt"
11 singularity:
12 containers["clustalo"]
13 threads:
14 int(config["alignment"]["threads"])
15 params:
16 config["alignment"]["options"]["MYALIGNER"]
17 shell:
18 """
19 clustalo -i {input.sequence_file} --threads={threads} $(if [[ "{params}" != "None" ]]; then echo {params}; fi) 1> {output.alignment} 2> {log}
20 """
As you can see all occurences of clustalo have been replaced with MYALIGNER except for two lines which are highlighted. This first line references the file which contains a list of the containers which should be used. Here we will have to add the container for our new aligner. Theoretically it is also possible to use an executable which is locally installed and completely omit the container directive. However we discourage this to maintain reproducibility of analyses. Hence we will change the respective lines to:
singularity:
"docker://repository/MYALIGNER:TAG"
Mind you that this is really just a placeholder. This will need to be modified depending on where the image of your container is located.
Next we will change the command line of the program to what MYALIGNER is using. This corresponds to the second highlighted line.
MYALIGNER -i {input.sequence_file} -t {threads} $(if [[ "{params}" != "None" ]]; then echo {params}; fi) -o {output.alignment}
This is already it for the first part. Since the command above looks a bit different lets quickly explain it:
The -i is the hypothetical input flag of MYALIGNER followed by {input.sequence_file}. This is by convention a placeholder for the unaligned input file. When phylociraptor is run, this will be replaced with anactual file path. Make sure that your write this exactly as it is giving (incl. curly brackets) otherwise phylociraptor will report an error message. After that, we specify how many threads should be used with the -t flag followed by another placeholder {threads}. This information is taken fom the config.yaml file.
The next part is this: $(if [[ "{params}" != "None" ]]; then echo {params}; fi). This section adds additional parameters to the command in case something has been specified in the config.yaml file.
Finally with specify where the alignment should be written with the -o flag of MYALIGNER and another placeholder {output.alignment}.
This is all there is to it. Just make sure to spell all the placeholder correctly and don’t forget the curly brackets and everything should work.
Let us have another look at the final file:
1rule MYALIGNER:
2 input:
3 "results/alignments/full/MYALIGNER.{hash}/parameters.align.MYALIGNER.{hash}.yaml",
4 sequence_file = "results/orthology/single-copy-orthologs_deduplicated." + hashes["filter-orthology"]["global"] + "/{busco}_all.fas",
5 output:
6 alignment = "results/alignments/full/MYALIGNER.{hash}/{busco}_aligned.fas",
7 benchmark:
8 "results/statistics/benchmarks/align/MYALIGNER_align_{busco}.{hash}.txt"
9 log:
10 "log/align/MYALIGNER/MYALIGNER_align_{busco}.{hash}.log.txt"
11 singularity:
12 "docker://repository/MYALIGNER:TAG"
13 threads:
14 int(config["alignment"]["threads"])
15 params:
16 config["alignment"]["options"]["MYALIGNER"]
17 shell:
18 """
19 MYALIGNER -i {input.sequence_file} -t {threads} $(if [[ "{params}" != "None" ]]; then echo {params}; fi) -o {output.alignment}
20 """
Modifying the config yaml file¶
The last small step is to modify the config.yaml file to include a section for additional parameters for the new aligner. This is found in the section align. Let us have a look:
alignment:
method: ["clustalo","mafft","muscle"] #, "clustalo", "muscle"]
threads: 2
options:
mafft: "--quiet --auto"
clustalo: ""
muscle: ""
Under options we have to add a new entry for our new aligner:
alignment:
method: ["clustalo","mafft","muscle", "MYALIGNER"] #, "clustalo", "muscle"]
threads: 2
options:
mafft: "--quiet --auto"
clustalo: ""
muscle: ""
MYALIGNER: ""
Note how we have added MYALIGNER to the list of aligners in the method section.
Between the quotes in the new option section we could add additional parameters which should be used by MYALIGNER. This information is transferred automatically to all alignment jobs for MYALIGNER when phylociraptor is run.
This is already it. You have now successfully added an additional aligner to phylociraptor.