Promising Canu output

After nearly 2 and a half months of tweaking, restarting we have an assembled genome. It came in at 1.37 G assembled bases, 19771 contigs (a bit high I guess but with a coverage of around 21X after trimming and correction not too bad) and an NG50 of 134 701.

Busco analysis results were also not too bad for a first assembly using 1335 conserved taxa genes.

Results:
C:76.7%[S:47.9%,D:28.8%],F:7.6%,M:15.7%,n:1335
1024 Complete BUSCOs (C)
640 Complete and single-copy BUSCOs (S)
384 Complete and duplicated BUSCOs (D)
101 Fragmented BUSCOs (F)
210 Missing BUSCOs (M)
1335 Total BUSCO groups searched
BUSCO analysis done. Total running time: 5979.202003479004 seconds

 

I think that the duplicated genes, and maybe the fragmented ones too, is due to heterozygosity. This will have inflated the genome size too. This means that with revised parameters for assembly – to phase the haplotypes – we might get a more realistic assembly.

More work to do but I have heard a few bad comments about Canu and do not think that they are warranted. Time for the assembly was approximately 3 weeks once we ironed out the grid options for different stages.