Sunday, November 27, 2016

A new powerful visualization of DNA intertranscript Sequences

Visualizations are very important in interpreting and understand data sets. Often with bioinfomatics it is challenging to visualize certain elements in DNA sequences. Using our methods one can use a powerful technique which can be tuned to ones needs to visualize DNA sequences.

To view and analyze differences in the transcripts of the same gene can be an important tool to compare different DNA sequences. To support this, the advances of image processing and machine learning can further be used to gain insight and find patterns in a large scalable fashion.

To visualize our data all one needs to do is find the center of each sequence to indicate that sequence middle or average position.

For example if one has a sequence which starts at position 310 and ends at 350 then the middle is (310 + 350)/2 = 330. For each transcripts one can plot the Position in iVector value relative to the sequences center value. One can also add another dimension like Diff as well if they like as the colour variable of each point.

The images below show the results for the 6 SET ensemble transcripts and the 15 MEN1 ensemble transcripts. In these images the length maximum was 50.

SET Transcripts

MEN1 Transcripts

In the image above one can see the MEN1 transcripts. Certain groupings were made based on image similarity. Also the numbers indicated transcripts with the same protein amino acid sequences. There are 5 - ones (1) and 3 - twos (2) and 2 - threes (3). Once can see that for sequences that have the same protein sequence there exists a similarity in the images.

