I have had some errors found in my assembled transcriptomes when they were submitted to NCBI. It appears that I missed a couple of quality control steps at the trimming stage.
What I have learnt:
- after using the default trimmomatic settings – check the data before proceeding.
- incorporate a file with all Illumina primers for identifying and trimming these from reads.
- use bbduk to help with this process – as other residual primers may be there – from previous runs on the machine (http://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/bbduk-guide/).
- screen the raw data for contaminants, eg. bacterial, human and other non-target sequences. Seems my transcriptomes included a few sequences from humans, gorilla and chimp!!
- None of this really affected my analysis and, in fact, was such a small proportion of the assembled transcriptome BUT – the assemblies are now a bit questionable and should be done again.