False starts with PacBio reads

As mentioned previously, the raw reads from RSII and Sequel PacBio come as bax.h5 and subreads.bam respectively. Both the Falcon and Canu assembler take input files of Fasta or Fastq.

After reading a bit more we realized that we needed to extract the fasta as well as the arrow files for assembly and then polishing. Dextract from the Dazzler suite is the software to extract and make these files from the bax.h5 and subreads.bam.

The following explanation is what happened when we attempted to install dextract via the Falcon-integrate onto the hpc.

Found DEXTRACTOR was not on the HPC – thought it may have been installed as part of FALCON, which is there, but this was not the case. We then decided to install FALCON-integrate which includes DEXTRACTOR. We followed the intructions here:
export GIT_SYM_CACHE_DIR=~/.git-sym-cache # to speed things up
cd FALCON-integrate
git checkout master  # or whatever version you want
git submodule update –init # Note: You must do this yourself! No longer via `make init`.
make init
source env.sh
make config-edit-user
make -j all
make test  # to run a simple one
Needed to run “module load python” to get the correct python version (2.7.9.)
Found that dextract did not build. This was fixed by changing the hdf5 include and lib path in the makefile (DEXTRACTOR/Makefile)
Then found FALCON-integrate doesn’t have the later version of DEXTRACTOR which can work with .bam files, so we decided to do a stand alone install of the latest DEXTRACTOR.
git checkout master
make -f Makefile
The makefile needed to be edited to work with different zlib and hdf5 include and lib paths
(possibly could have avoided this by running “module load hdf5/1.8.16” and “module load zlib” beforehand)
Running dextract
Need to run
module load hdf5/1.8.16
module load zlib
~/bin/DEXTRACTOR/dextract -f -a m54078_170627_071131.subreads.bam
generates a m54078_170627_071131.fasta and a m54078_170627_071131.arrow in the current directory
-f outputs a fasta file
-a outputs an arrow file

2 thoughts on “False starts with PacBio reads”

    1. Hi Ric, we haven’t yet polished the assembled genome with the quiver files. I’ll write a post when I do this. We did use Pilon with Illumina reads but need to re-do this.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s