To view and analyze differences in the transcripts of the same gene can be an important tool to compare different DNA sequences. To support this, the advances of image processing and machine learning can further be used to gain insight and find patterns in a large scalable fashion.
To visualize our data all one needs to do is find the center of each sequence to indicate that sequence middle or average position.
For example if one has a sequence which starts at position 310 and ends at 350 then the middle is (310 + 350)/2 = 330. For each transcripts one can plot the Position in iVector value relative to the sequences center value. One can also add another dimension like Diff as well if they like as the colour variable of each point.
The images below show the results for the 6 SET ensemble transcripts and the 15 MEN1 ensemble transcripts. In these images the length maximum was 50.
In the image above one can see the MEN1 transcripts. Certain groupings were made based on image similarity. Also the numbers indicated transcripts with the same protein amino acid sequences. There are 5 - ones (1) and 3 - twos (2) and 2 - threes (3). Once can see that for sequences that have the same protein sequence there exists a similarity in the images.
Post a Comment