Assembly with Canu

We have begun testing the extracted .fasta files from both RSII and Sequel using the Canu assembly software.

The latest version of Canu is v1.6 (released on the 15 August 2017)

https://github.com/marbl/canu/releases

and the associated publication is;

We installed locally to the login at the University hpc as the currently installed version on the hpc is v1.3.

To run we used these specifications.

#PBS -l select=1:ncpus=24:mem=128GB+8:ncpus=4:mem=32GB
#PBS -l walltime=48:00:00

# Load modules
module load perl
module load java

# Working directory
cd /project/…

/home/canu/canu-1.6/Linux-amd64/bin/canu -p  name genomeSize=700m -pacbio-raw ../RSII/*.fasta ../sequel/*.fasta

Now it is just a matter of waiting to see how it goes. Probably need a few more parameters but this will be the first test.

False starts with PacBio reads

As mentioned previously, the raw reads from RSII and Sequel PacBio come as bax.h5 and subreads.bam respectively. Both the Falcon and Canu assembler take input files of Fasta or Fastq.

After reading a bit more we realized that we needed to extract the fasta as well as the arrow files for assembly and then polishing. Dextract from the Dazzler suite is the software to extract and make these files from the bax.h5 and subreads.bam.

The following explanation is what happened when we attempted to install dextract via the Falcon-integrate onto the hpc.

Found DEXTRACTOR was not on the HPC – thought it may have been installed as part of FALCON, which is there, but this was not the case. We then decided to install FALCON-integrate which includes DEXTRACTOR. We followed the intructions here:
export GIT_SYM_CACHE_DIR=~/.git-sym-cache # to speed things up
cd FALCON-integrate
git checkout master  # or whatever version you want
git submodule update –init # Note: You must do this yourself! No longer via `make init`.
make init
source env.sh
make config-edit-user
make -j all
make test  # to run a simple one
Needed to run “module load python” to get the correct python version (2.7.9.)
Found that dextract did not build. This was fixed by changing the hdf5 include and lib path in the makefile (DEXTRACTOR/Makefile)
Then found FALCON-integrate doesn’t have the later version of DEXTRACTOR which can work with .bam files, so we decided to do a stand alone install of the latest DEXTRACTOR.
cd DEXTRACTOR
git checkout master
make -f Makefile
The makefile needed to be edited to work with different zlib and hdf5 include and lib paths
(possibly could have avoided this by running “module load hdf5/1.8.16” and “module load zlib” beforehand)
Running dextract
Need to run
module load hdf5/1.8.16
module load zlib
beforehand
Example:
~/bin/DEXTRACTOR/dextract -f -a m54078_170627_071131.subreads.bam
generates a m54078_170627_071131.fasta and a m54078_170627_071131.arrow in the current directory
-f outputs a fasta file
-a outputs an arrow file