Category Archives: Trimmomatic

Things to do better next time

I have had some errors found in my assembled transcriptomes when they were submitted to NCBI. It appears that I missed a couple of quality control steps at the trimming stage.

What I have learnt:

  • after using the default trimmomatic settings – check the data before proceeding.
  • incorporate a file with all Illumina primers for identifying and trimming these from reads.
  • use bbduk to help with this process – as other residual primers may be there – from previous runs on the machine (http://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/bbduk-guide/).
  • screen the raw data for contaminants, eg. bacterial, human and other non-target sequences. Seems my transcriptomes included a  few sequences from humans, gorilla and chimp!!
  • None of this really affected my analysis and, in fact, was such a small proportion of the assembled transcriptome BUT – the assemblies are now a bit questionable and should be done again.

Trimming the reads

Trimming the reads

The raw sequence reads still have Illumina adaptors and some of them might be low quality. The software often used for trimming reads is Trimmomatic (I used trimmomatic v0.33). Manual available here:

Click to access TrimmomaticManual_V0.32.pdf

Trimmomatic requires Java to be installed and loaded, the default java version on Artemis (hpc) is currently  1.8.0 and this works fine. I did not change any default settings for Trimmomatic.

Here is the pbs script for my paired end data

Note: the term ‘module’ means software and there is generally a default version that is uploaded for your work unless you specify the version in the Load Modules section of the pbs),

#PBS -P (project directory)
#PBS -N (job name eg. AP0 trim)
#PBS -l nodes=1:ppn=16
#PBS -l walltime=01:00:00
#PBS -l pmem=4gb
#PBS -M email@sydney.edu.au
#PBS -m abe

#Load modules
module load java
module load trimmomatic

# Working directory
cd /(pathname of working directory where files are located)

# Run trimmomatic
java -jar /usr/local/trimmomatic/0.33/trimmomatic-0.33.jar PE -phred33 AP0_R1.fastq AP0_R2.fastq AP0_R1_trimpaired.fq AP0_R1_trimunpaired.fq AP0_R2_trimpaired.fq AP0_R2_trimunpaired.fq ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36

So the input files are:

AP0_R1.fastq and AP0_R2.fastq (or gzipped versions)

and the output files are:

AP0_R1_trimpaired.fq, AP0_R1_trimunpaired.fq, AP0_R2_trimpaired.fq and AP0_R2_trimunpaired.fq (or gzipped versions)

These output files can be used for the assembly process with Trinity software.