Re-working the analysis

I have been going back into my transcriptomes for each plant and pulling out ‘genes’ that fit with Hidden Markov Models I made using HMMER (http://hmmer.org/). I used the specific domains from resistance gene models, previously identified in Eucalyptus grandis (http://journal.frontiersin.org/article/10.3389/fpls.2015.01238/full) and chitinases (https://academic.oup.com/treephys/article-abstract/37/5/565/3067625/Identification-of-the-Eucalyptus-grandis-chitinase) to initially find putative genes in my clustered Syzygium. Then I used the aligned genes to build species specific nucleotide HMM for several defence-related genes.

Armed with lists of potential genes in each transcriptome I now want to find out if there are actually transcript variations between my resistant and susceptible plants.

This has meant aligning my raw reads for each plant/at each time against its own transcriptome. A big job again – hope it is worth it.

Another thing that has been useful recently is the regular expressions for sorting etc in Notepad +++ (http://www.rexegg.com/regex-quickstart.html). With large datasets it seems there is always something needing extracting or amending.

Have a bunch of primers ready as well to check against all samples. Back to the lab soon for much qRTPCR.

 

md5 hash

When transferring large data sets file corruption can occur.

To ensure that the file uploaded from one location is the same as the file downloaded at another location it is useful to have a way to validate.I have only just become aware of this utility and it makes a lot of sense to check that your data is what you think it is.

I just installed this File Checksum Integrity Verifier Untility (FCIV) for use in windows which incorporates the MD5 or SHA-1 utilities from here:

https://support.microsoft.com/en-us/kb/841290

If all the files are in one folder you can then just use the cmd window to run this script, for example:

C:\FCIV>fciv.exe -md5 Z:\Syzygium\Rawdata > Rawdata_hash.txt

or in Unix:

md5sum \Syzygium\Rawdata > Rawdata_hash.txt

The output looks like this, a unique identifier for each file which should match the downloaded file md5:

//
// File Checksum Integrity Verifier version 2.05.
//
cbd0f2a14e19042d6b4be355381f9775 z:\syzygium\rawdata\rawdata1.fastq.gz
c9714ac42c5c10c7732514d322b50c21 z:\syzygium\rawdata\rawdata2.fastq.gz