Hands on course in bioinformatics
Every Thursday from 1-5 PM from September 7 – November 30, 2017 (except September 14 and October 19 = 11 times)
208 006
Course material:
Unix basics and usage of Computerome - pages 01..28
Extra material:
Unix exercise from Immunological Bioinformatics course
to try it change the path from /usr/opt/www/pub/CBS/courses/27685.imm/exercise_unix/immu00
to /home/projects/pr_hackinars/data/immu00
Course material:
Unix basics and usage of Computerome - pages 29..50
Show available modules
module avail
Find all samtools available
module avail samtools
To be able to see samtools (and many other tools) you first have to load tools
module load tools
Load module
module load name_of_tool_including_version
Find module (-i makes it case insensitive)
module apropos -i python
Display more details about module (for example dependencies)
module display velvet/1.2.10
Unload module
module unload anaconda/2.1.0
Modules may have dependencies i.e. velvet
module display velvet/1.2.10 ------------------------------------------------------------------- /services/tools/modulefiles/velvet/1.2.10: module-whatis Velvet 1.2.10 - sequence assembler for very short reads conflict velvet conflict perl/5.8.9 prereq perl prereq anaconda prepend-path PATH /services/tools/velvet/1.2.10/bin setenv VELVET_ROOT /services/tools/velvet/1.2.10 -------------------------------------------------------------------
So you may need to load more modules (in addition to tools) to be able to run velvet
module load perl/5.20.1 module load anaconda/2.1.0 module load velvet/1.2.10
Module information
module tab tab module whitespace tab tab
Remove all modules
module purge
Autoload modules at login
module initapp anaconda/2.1.0
Remove autoloading of module: module initrm
module initrm anaconda/2.1.0
See which modules are pre loaded
module initlist
Source bashrc to load modules from the login environment again
.bashrc
Load ngs modules (some are not in tools)
module load ngs
load own modules described in home directory (for developing)
module load use.own
Some public tools (like MGmpper) are maintained by tnp Admin make directory structure and make user the owner
List of what I have loaded
module list
Modules I (Ole) often load
module load tools module load moab torque module load perl/5.20.1 module load anaconda/2.2.0 module load velvet/1.2.10 module load ncbi-blast/2.2.31+ module load ngs module load sratoolkit/2.8.1
Load queue modules
module load tools module load moab torque
Queue submission commands (qsub recommended, but it is a matter of taste which to use)
qsub
or
msub
and the wrapped versions (which may be easier - but, because it is wrapped, also one step further away from the raw code)
xqsub
or
xmsub
Start interactive shell (1core, 1node, 4GB RAM, 86400 seconds (=1day))
qsub -l nodes=1:ppn=1,mem=4gb,walltime=86400 -q batch -I -X
Interactive way to get an interactive shell
iqsub
Submit job to queue with msub
msub -W group_list=pr_hackinars -Apr_hackinars -d my_working_directory -l nodes=1:ppn=1,mem=4gb,walltime=86400 -q batch my_command
or with qxsub (here using 1:00:00:00 to get one day format: d:hh:mm:ss)
xqsub -V -d my_working_directory -l nodes=1:ppn=1,mem=4gb,walltime=1:00:00:00 -q batch -re my_std_error_file -ro my_std_output_file -de my_command
Simple example of program which waits for 60 seconds and then prints the date (output go to STDIN.oJob_ID)
echo 'sleep 60;date'|msub -W group_list=pr_hackinars -Apr_hackinars -l nodes=1:ppn=1,mem=4gb,walltime=86400
See job status (for user myusername). Here you can also see the Job_ID (which is a unique number each job gets)
showq -umyusername
or
qstat -umyusername
Stop one job
canceljob Job_ID
Cancel all myusername's jobs in the queue (don't do this if you don't know what you are doing)
showq -umyusername| grep unning| gawk '{print $1}'|xargs canceljob
Estimate starting time for job
showstart Job_ID
Overview of system status
module load pestat pestat
Interact with queuing system graphically
firefox
If there is a security warning select:
Save the world, if not possible in one go, then proceed with one sequence at a time till you reach the goal.
All paths mentioned in the following are relative to the “/home/projects/pr_hackinars/” directory or absolute paths. If an “X” (capital x) occurs in the commands below, it means that you yourself has to change something there.
Choose some samples to work with from: data/art_seqs/SE and data/art_seqs/PE, and save them in your own directory. Both directories contain artificial sequence data from the ResFinder database, both single and paired end. Remember which files you choose, because you are going to use them and some of the results obtain today for the exercises the next three weeks. For all tasks below you should try with both the single and paired end files.
Choose one isolate from: data/ECs/, and save them in your own directory. These are Escherichia coli isolates from the German / French outbreak in 2011. Again, remember which one you choose, as this is going to be used again for the exercises in the coming weeks.
Lets load the needed modules to do the exercises for today:
module load tools module load perl/5.24.0 module load anaconda2/4.0.0 module load ncbi-blast/2.2.26 module load SPAdes/3.9.0 module load cgepipeline/20170113
In order to assemble our reads with Velvet we are going to find the optimal k, with a program that tries a series of different k’s and then chooses the best one. Try to assemble the artificial reads with the following commands.
Paired end:
wrapper_denovo_2.0.py velvet --shortPaired fastq chopDBX_1.fastq.gz chopDBX_2.fastq.gz \ --sample outX_velvet_pe --n 6 --m 16 --add_velvetg "-very_clean yes"
Single end:
wrapper_denovo_2.0.py velvet --short fastq chopDBX.fastq.gz \ --sample outX_velvet_se --n 6 --m 16 --add_velvetg "-very_clean yes"
There are several options available for velvet and the wrapper that calls it, try to check if something may have been done differently.
wrapper_denovo_2.0.py velvet -h
In order to check the quality of the assembly of the artificial reads, you may want to see if the assembly can be found in the ResFinder database. We do that with BLAST.
blastall -p blastn -i contig.fsa -d data/ResFinderFSA/ResFinder.fsa > blast.out
The best hits for each contig is under the line:
"Sequences producing significant alignments:"
Remember the output of the best template from BLAST, as we are going to use this the next weeks.
When you understand the program and the output then qeue your E. coli reads, it might take some time to complete so we will come back to that.
Like Velvet, SPAdes also has a wrapper program that executes the assembly program itself. Like for Velvet several different values of k are used, but in difference to Velvet, several different sizes k are used to construct the final assembly. Start by making an assembly of your artificial sequences like before:
Paired end:
spades.py -t 6 -m 16 -o outX_spades_pe --careful \ --pe1-1 chopDBX_1.fastq.gz --pe1-2 chopDBX_2.fastq.gz
Single end:
spades.py -t 6 -m 16 -o outX_spades_se --careful \ -s chopDBX.fastq.gz
Note that SPAdes has a tendency not to detect the phred-scale, espicially if the reads have been trimed and are Phred scale 33. All the artificial reads are phred-scale 33, therefore you can add the option “--phred-offset 33”, so that SPAdes does not crash here. Make a quality check with BLAST on the artificial reads, like we did with Velvet. Remember to save the best matching templates here as well.
Try to study the output and different options of SPAdes (spdes.py -h), and check if something may have been done smarter. When you understand the different options and the output of SPAdes, you may qeue your E. coli reads for assembly (without the "--careful" option).
You are almost done and deserve a break before you continue, that way the assemblies might be done when you come back.
First of all, check that your assemblies are done and ended successfully.
When assembling bacterial genomes, one of the best quality checks you can do is to do MLST. For standard MLST typing 7 genes are used, and if all of these genes can be found in your assembly, when it is probably not half-bad. To determine the MLST we are going to use MLSTfinder, which is based on BLAST. Note that using BLAST is drastically improved because of the assembly for most cases.
mlst.pl -d /home/databases/cgepipeline/mlst/ -i SRR3415X.fasta -o MLST -s ecoli
Try to see if it is possible to call MLST for both assemblies, if not which assembly method was best.
Finally, we are going to identify antibiotic resistance genes in the E. coli samples. Whole genomes may contain several resistance genes, or even several genes on the same contig, which are going to complicate the BLAST output. Therefore, we are going to use ResFinder for this. Where ResFinder has some post-processing of the BLAST output, so that only the best matches are reported for each region of each contig.
resfinder.pl -i SRR3415X.fasta -d data/ResFinderFSA/ -a ResFinder -o Xres_out -k 95.00 -l 0.60
Again, remember where you save the output of ResFinder, this too is going to be used in the coming weeks.
All paths mentioned in the following are relative to the “/home/projects/pr_hackinars/” directory or absolute paths.
If an “X” (capital x) occurs in the commands below, it means that you yourself have to change something there.
Remeber what isolates and sequences you analyzed last time, because we are going to compare the results of different appraoches.
Last was assembly + BLAST methods, today we go straight for mapping.
You may want to analyze some extra samples today, as mapping is faster than assembly.
Depending on your choice of artificial sequences last time, you might want to choose some extra ones, i can recommend sample "1450".
The artificial data is in: data/art_seqs/SE and data/art_seqs/PE.
And the Escherichia coli in: data/ECs.
Lets load the needed modules to do the exercises for today:
module load ngs tools perl/5.20.2 module load bwa/0.7.15 module load bowtie2/2.3.2 module load samtools
For modern mapping methods, such as Bowtie2 and BWA, we need to index the fasta files of our references first. Even though Bowtie2 and BWA are both using the FM-index, they use slightly different variations of them. Which case means that we are going to index the databases twice, one for use with Bowtie2 and one for BWA.
Bowtie2 indexing:
bowtie2-build X.fsa X.index
BWA indexing:
bwa index X.fsa -p X.index
Today we are going to map against resistance genes, therefore we need to index a fasta file containing the wanted genes.
For this purpose we are going to use the ResFinder database, as used last time with BLAST.
The database fasta file are located in:
"data/ResFinderFSA/ResFinder.fsa"
Make an indexing of this fasta file, for both Bowtie2 and BWA, and give the obtained indexing some appropiate name.
Mostly when mapping with Bowtie2 or BWA, you will want to use the standard options:
Bowtie2 single end:
bowtie2 -U X.fastq.gz -x X.index
Bowtie2 paired end:
bowtie2 -1 X_1.fastq.gz -2 X_2.fastq.gz -x X.index
BWA single end:
bwa mem X.index X.fastq.gz
BWA paired end, (might need the "-p" option):
bwa mem X.index X_1.fastq.gz X_2.fastq.gz
Both of them outputs sam-format to stdout (default), which is a fairly large format.
Therefore it is usually good practice to compress them to bam-format (binary sam), this is done easily with samtools through a pipe.
| samtools view -Sb - > X.bam
If you then want to inspect the bam file afterwards you can use this command.
samtools view X.bam | less
Try to map your reads against the ResFinder database, using the indexing you made before.
The E. coli reads might take a few minutes to complete.
Take a quick look at the mapped reads, is the results as expected, try to compare to the results obtained from last time.
If you are not familiar with the sam-format, then these links might be helpful:
SAM-format
sam flags
The sam-format, as you probably has seen, is not very human-readable.
It is however a nice format for a computer to parse, as seen above samtools is desinged for handling these files.
We have already used samtools to compress sam files, now we are going to analyze the reads a bit further.
The most important options to remeber here, is the: "-f", "-F" and "-q".
"-f" and "-F" is used to select alignment based on flags specified on them. For example "4" is unmapped and "8" is mate unmapped, meaning that "4+8=12" means that both pairs must not map.
By using the "-F" insted of "-f" you get the negation, meaning that "12" is both pairs map.
The "-q" option sets the threshold for mapping quality, here a low quality means a poor alignment quality and a high one a better alignment quality.
Depending on who you ask, there are many "standards". Usually a threshold below 10 is not used.
A quality of 10 usually means that there are about 10% chance that the read is wrongly mapped, and a quality of 0 means that there are close to 0% chance that the read is mapped correctly.
Even though the sam-format is standard, the columns in it is not. Meaning that the mapping quality is calculated differently with Bowti2 and BWA, which then goes on for different mapping methods.
Now for some selection of results obtained before.
Only accept mapped reads:
samtools view -b -F 4 -q 1 X.bam > X.mapped.bam
Only properly paired reads:
samtools view -b -F 12 -f 2 -q 1 X.bam > X.mapped.bam
Try some different combinations of flags, and try to vary the quality threshold.
An easy check to see how many reads map to what in your sample can be obtained by following command, make shure you undestand it before executing.
samtools view X.bam | cut -f 3 | sort | uniq -c > X.template_count.tab
You can add the flags directly to the samtools command above, and pipe the output to less in order to quickly test different settings of samtools.
Try to see if you can get the same results on the artificial dataset as you got last time.
Do you see something strange in the results, obtained by either mapping or assembly + BLAST.
Which do you trust more for this example, remember the assumptions of each method.
Try to compare the results with the E. coli from last time too. If the assembly did not finish last time, then the SPAdes assemblies is available here: "/home/projects/pr_hackinars/data/assembly/SPAdes/".
A very important use of mapping is within the area of bacterial outbreak detection and monitoring. We will conduct and exercise, where we employ the tool CSI Phylogeny that relies heavily on correct mapping results from BWA.
# Load general modules for queue etc. module load tools ngs pestat gcc/6.2.0 perl R torque moab anaconda/4.4.0 # Load SNP pipeline tools module load bwa/0.7.10 bedtools/2.26.0 samtools/0.1.18 FastTree/2.1.8 # Load perl environment export PERL5LIB=/home/projects/pr_hackinars/data/perl_modules/lib
The data is raw sequencing reads and a couple of assemblies. The data is located in the folder:
/home/projects/pr_hackinars/data/outbreaks/Haiti_Cholerae/
The reference sequence is found within the "ref" folder.
It is a good idea to run CSI Phylogeny within a screen session, as it runs for quite a while and it will wait for all jobs to finish and in the meantime it writes status messages to STDOUT.
Ks_SNP_tree_pipeline.pl --qbot_dir <dir> --out_dir <dir> --reference <fasta> <arg1> ..<argN>
You will encounter the following warning:
rm: cannot remove ‘/home/projects/pr_hackinars/people/user/test_execise/qbot_dir/scripts/*’: No such file or directory rm: cannot remove ‘/home/projects/pr_hackinars/people/user/test_execise/qbot_dir/output/*’: No such file or directory rm: cannot remove ‘/home/projects/pr_hackinars/people/user/test_execise/qbot_dir/error/*’: No such file or directory
Don't worry about it. CSI Phylogeny is just making sure that no temporary files from a previous run interferes with the current run.
-o <dir> All output files will be stored here. --temp_dir | -d <dir> Location of tmp files. Default is tmp folder inside the qbot_dir. --filter_depth | -fd <int> default: 10. --filter_relative_depth | -frd <int> default: 10. Meaning 10 pct of avg. --filter_min_depth | -fmd <int> default: 10. Only applicable if using rel dp option. --filter_prune | -fp <int> default: 10. Meaning 10 SNPs. --filter_map_quality | -fmq <int> default: 0. Recommended 25. --filter_snp_quality | -fsq <int> default: 0. Recommended 30. --filter_z | -fz <double> default: 0. Recommended 1.96. --as_prune | -ap <int> default: 0. Recommended 10 SNPs. Pruning for assembled data. --queue_filter_mem <str> default: 3900m. Increase only if your filter job gets killed due to memory.
perl /home/projects/pr_hackinars/data/Ks_SNP_tree_pipeline.pl -qb qbot_dir -o CIS_out_dir -r NC_000913.3.fsa -fmq 25 -fsq 30 / -fp 10 -ap 10 -fz 1.96 -frd 10 -fmd 10 /path/to/data/*.gz /path/to/more/data/*.fa
All paths mentioned in the following are relative to the “/home/projects/pr_hackinars/” directory or absolute paths.
If an “X” (capital x) occurs in the commands below, it means that you yourself have to change something there.
Remeber what isolates and sequences you analyzed last time, because we are going to compare the results of the different appraoches.
Today you might want to analyze all of the Escherichia coli, as KMA is extremely fast. Compared to other methods that is "rapid", KMA is operating at near WARP speed.
Today there is no dependencies to load, we are however going to add KMA to path for convience.
export PATH=$PATH:/home/projects/pr_hackinars/apps/
In order for KMA to map against a reference database, we need to index the databases liek last time. Today we are going to map resistance genes and species, the databases has however been indexed for you in: "/home/projects/pr_hackinars/data/kma_databases"
If you want to index your own database, then this is how you do it:
kma_index -i DB.fsa -o DB.index
When mapping the resistance we would like full alignment of our sequences, while we can do with the identification of the proper reference for the species mapping. Therefore there are two main mappings with KMA, one full alignemnt and one "sparse" where the alignments are skipped.
Full alignemnt with KMA, on resistance genes:
kma -i X_1.fastq.gz X_2.fastq.gz -o X -t_db /home/projects/pr_hackinars/data/kma_databases/ResFinder -SW -ID 80
"-i" gives the input, "-o" the output, "-t_db" is the indexing and "-SW" specifies that we want to use the Smith Waterman algorithm between the seeds.
Sparse mapping of species with KMA.
kma -i X_1.fastq.gz X_2.fastq.gz -o X -t_db /home/projects/pr_hackinars/data/kma_databases/bacterial_compl_genomes_hq99_k13_ATG -Sparse -shm -ID 50
Where "-shm" specifies that the database has been set up for sharing, which allows for several parallel calls of KMA where the database is only loaded in one place.
Two of your colleagues has been reported sick, and are infected with some pathogen.
Luckily they were able to sequence the pathogens before they were hospitalized.
It is now your job to save your colleages and analyze the sequencing they have performed.
In this case we would like to know the species of the samples, and whether they carry any resistance genes which might lead us in direction for a proper treatment.
The sequences are collected in: "/home/projects/pr_hackinars/data/unknownInfections"
The tutorial is on github:
Files making the basis of the tutorial: tutorial
The live version which changes during the tutorial: tutorial-livel
The commands given below assume that you are in your own directory. Most of the tools produce multiple output files, therefore it is practical to create the sub-folders in your directory, when running them.
We will investigate an outbreak today. It's 2015 and patients in a US hospital acquire listeriosis... apparently from drinking milkshakes!? We have 10 samples we want to test, to figure out what is the source of the outbreak. They were taken at different times and sequenced on an Illumina platform. Source
You can find all the data in
/home/projects/pr_hackinars/data/outbreaks/ice_cream_outbreak
The following modules are needed for today's exercise:
module load tools module load perl/5.24.0 module load anaconda2/4.0.0 module load mafft/7.310 module load phyml/3.1 module load newick-utils/1.6
Start an interactive session:
qsub -W group_list=pr_hackinars -A pr_hackinars -l nodes=1:ppn=28,walltime=4:00:00,mem=120Gb -I
The samples have been assembled using Spades, and 16S rRNA sequences were extracted from them with RNAmmer 1.2. This sequence is present in all bacterial species, thus can be used to infer phylogeny for even distantly related isolates. Let's concatenate the fasta files together and align them using MAFFT.
mkdir ssu_tree cd ssu_tree cat /home/projects/pr_hackinars/data/outbreaks/ice_cream_outbreak/ssu/SRR* > outbreak_ssu.fasta mafft --maxiterate 1000 --localpair outbreak_ssu.fasta > outbreak_ssu_aligned.fna
The --localpair option specifies that mafft produces a local (and not global) alignment.
Next, we have to convert the fasta format into interlaced phylip, as that's the input format of PhyML, the maximum likelihood based program we want to use for analyzing our (few and short) sequences.
/home/projects/pr_hackinars/src/fasta2phy.py outbreak_ssu_aligned.fna outbreak_ssu_aligned.phy
Check out the PhyML options:
PhyML -h
We specify that we want the Jukes-Cantor model and estimated base frequencies (and not counted), with 100 rounds of bootstrapping:
PhyML -i outbreak_ssu_aligned.phy -d nt -m JC69 -s BEST -f m -b 100
The output quickly reveals, how many different positions are in the alignment. Is it enough to make informed decisions about the outbreak? Let's look at the consensus tree and the clade support count values (the numbers at the base of the clades) with newick-utils.
nw_display outbreak_ssu_aligned.phy_phyml_tree.txt
The difference between the isolates is not great enough to rely just on one short sequence. A whole-genome approach is required.
Someone has already predicted subtypes for the isolates. The results can be seen in
/home/projects/pr_hackinars/data/outbreaks/ice_cream_outbreak/hackinars_samples.tmpl
As we already suspected, all of the samples were predicted to have the same closely matching reference genome. Thus we can use NDtree, which is a SNP based phylogeny pipeline, that evaluates each base-calling to have a minimum confidence (otherwise it's N). It calculates the genetic distance based on these "true" SNPs, then clusters the sequences with UPGMA and NJ.
cd .. mkdir ndtree_r cd ndtree_r /home/projects/pr_hackinars/src/NDtree/NDtree_p.py \ -f /home/projects/pr_hackinars/data/outbreaks/ice_cream_outbreak/ice-cream.list \ -t /home/projects/pr_hackinars/data/outbreaks/ice_cream_outbreak/pt_Listeria_monocytogenes_J0161_NC_017545_1.fa \ -n 2 -z 1.96 -k 17 -a
This is the parallel version of NDtree. -f is the file containing the list of raw reads, -n is the number of reads for one isolate (paired 2, single 1), -t is the template, -k is the k-mer size, -z is the Z-score threshold for base-calling (corresponding to a p-value of 0.05), and -a means that only those positions are used in the distance calculation, where none of the sequences have unknown bases. It runs for approx. 15 minutes.
nw_display tree.upgma.newick nw_display tree.nj.newick
Do the two trees agree with regards to the relatedness of the sequences? (Note: the floating 1-s are just part of the tip labels)
We can continue with a maximum likelihood based analysis, as we can easily construct a multiple alignment from the consensus sequences. Then we have two possible approaches. First, let's extract those columns from the alignment that have polymorphisms and feed that mini alignment to PhyML.
cat *.assimpler.con > ice_cream_wgs_align.fsa mkdir phyml_r /home/projects/pr_hackinars/src/NDtree/stripali_pos.py \ -i $PWD/ice_cream_wgs_align.fsa \ -o $PWD/phyml_r/ice_cream_snp.fsa cd phyml_r cat ice_cream_snp.fsa /home/projects/pr_hackinars/src/fasta2phy.py ice_cream_snp.fsa ice_cream_snp.phy PhyML -i ice_cream_snp.phy -d nt -m JC69 -s BEST -f m -b 100
How many polymorphic sites are in these whole-genome sequences? Very few, but still more than just in the 16S rRNA. Change the sequence identifiers back to the original ones before taking a look at the tree:
/home/projects/pr_hackinars/src/NDtree/chnam.py \ -i ice_cream_snp.phy_phyml_tree.txt -s ../names \ -o ice_cream_snp.phy_phyml_tree_names.nw -t2 nw_display ice_cream_snp.phy_phyml_tree_names.nw
Did the tree topology change? Are the clades more resolved compared to the distance based trees? The clade support (count) values are low, why?
Next, we use IQ-tree for inferring a phylogeny from the whole-genome alignments. IQ-tree is an approximately maximum likelihood based inference program, which means that it cuts some corners during the exploration of the tree-space, in order to finish in a reasonable time.
cd .. mkdir iqtree_r /home/projects/pr_hackinars/src/iqtree-1.6.beta4-Linux/bin/iqtree \ -s ice_cream_wgs_align.fsa \ -st DNA -m TEST -nt AUTO -b 100
-s is the input alignment, -st is the data type (nucleotide sequences), -m TEST option will perform a model test on the dataset, determining the substitution model, -b is bootstrapping (1000 would be better, but that runs for approx. 30 mins). IQ-tree is capable of parallelizing some steps, -nt AUTO will determine the optimal number of threads to use.
mv ice_cream_wgs_align.fsa.* iqtree_r cd iqtree_r /home/projects/pr_hackinars/src/NDtree/chnam.py \ -i ice_cream_wgs_align.fsa.treefile -s ../names \ -o ice_cream_wgs_align.fsa.treefile_names.nw -t2 nw_display ice_cream_wgs_align.fsa.treefile_names.nw
What substitution model did you use? Did the tree topology change? Are the clades more resolved now? The clade support percentage values are very high, why?
Some additional information for the samples:
Sampling time | Source | Run ID |
November, 2015 | Ice cream/Hospital X | SRR3052035 |
November, 2015 | Ice cream/Hospital X | SRR3053137 |
November, 2015 | Ice cream/Hospital X | SRR3086932 |
November, 2015 | Environmental/Hospital X | SRR3086935 |
November, 2015 | Environmental/Hospital X | SRR3086936 |
November, 2015 | Clinical | SRR2994642 |
March, 2015 | Environmental/Company A | SRR1974103 |
December, 2014 | Ice cream/Company A | SRR3091405 |
December, 2014 | Environmental/Company A | SRR3130409 |
April, 2015 | Environmental/Company B | SRR2035442 |
Is the infection truly hospital-acquired? Which company is more likely to be the one where the infecting strain originates from?
If you have more time, you can also run CSI Phylogeny on these isolates.