Things to do better next time

I have had some errors found in my assembled transcriptomes when they were submitted to NCBI. It appears that I missed a couple of quality control steps at the trimming stage.

What I have learnt:

  • after using the default trimmomatic settings – check the data before proceeding.
  • incorporate a file with all Illumina primers for identifying and trimming these from reads.
  • use bbduk to help with this process – as other residual primers may be there – from previous runs on the machine (http://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/bbduk-guide/).
  • screen the raw data for contaminants, eg. bacterial, human and other non-target sequences. Seems my transcriptomes included a ¬†few sequences from humans, gorilla and chimp!!
  • None of this really affected my analysis and, in fact, was such a small proportion of the assembled transcriptome BUT – the assemblies are now a bit questionable and should be done again.