Since the last post I have re-run the assembly with both paired and unpaired trimmed sequences. My new transcriptome is not very different in size but I am yet to determine if there are other variations for downstream analysis.
To obtain some statistics of the trinity output you can use the N50 check.
#Run N50 script
/usr/local/trinity/2.1.1/util/TrinityStats.pl AP.fasta
with an output like this:
################################
## Counts of transcripts, etc.
################################
Total trinity ‘genes’: 67231
Total trinity transcripts: 82911
Percent GC: 45.44
########################################
Stats based on ALL transcript contigs:
########################################
Contig N10: 3700
Contig N20: 2863
Contig N30: 2350
Contig N40: 1979
Contig N50: 1654
Median contig length: 585
Average contig: 983.42
Total assembled bases: 81535957
#####################################################
## Stats based on ONLY LONGEST ISOFORM per ‘GENE’:
#####################################################
Contig N10: 3463
Contig N20: 2612
Contig N30: 2127
Contig N40: 1758
Contig N50: 1408
Median contig length: 476
Average contig: 834.77
Total assembled bases: 56122482
Compared to previous assembly:
################################
## Counts of transcripts, etc.
################################
Total trinity ‘genes’: 67123
Total trinity transcripts: 82776
Percent GC: 45.44
########################################
Stats based on ALL transcript contigs:
########################################
Contig N10: 3687
Contig N20: 2864
Contig N30: 2350
Contig N40: 1978
Contig N50: 1652
Median contig length: 586
Average contig: 983.62
Total assembled bases: 81419959
#####################################################
## Stats based on ONLY LONGEST ISOFORM per ‘GENE’:
#####################################################
Contig N10: 3456
Contig N20: 2612
Contig N30: 2126
Contig N40: 1757
Contig N50: 1407
Median contig length: 476
Average contig: 834.73
Total assembled bases: 56029878