Canu unitigging/1-overlapper

Using Canu v1.6 this stage has taken 2 months so far with 351 jobs. There are 28 jobs still to run on the University of Sydney HPC, Artemis. Each job requires around 10GB memory, 4 cpus, and a progressive amount of walltime. Starting from only about 5 hours to about 150 hours of walltime for the last 100 jobs.

I have detailed this previously but the genome is predicted to be 1.3 or 1.4 Gbp and we started with 50-60X coverage of PacBio Sequel and RSII data.

The trimmedReads.fasta.gz output was 12G

canu script (NB the genome size was set to 1G for the script):

canu -p MR_1805a \
-d /scratch/directory/run-MR_1805a \
gnuplot=”/usr/local/gnuplot/5.0.0/bin/gnuplot” \
genomeSize=1g \
correctedErrorRate=0.040 \
batOptions=”-dg 3 -db 3 -dr 1 -ca 500 -cp 50″ \

A previous assembly was begun 6 months ago using the same raw data and is still incomplete. The same script was run except for an additional  parameter:

corOutCoverage=200

The trimmedReads.fasta.gz output from that assembly run was 17G

 

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s