Sunday, November 27, 2016

k-mers, Vectors, ncDNA (Intron) and Protein Relativity

Following a particular theory of distributed computing, we developed a method using vectors to expose relative differences in the sort order of subsequence's (k-mers) of transcripts of non-coding DNA. Our expectation is that vectors, which have been used successfully to partition data and determine redundancy can also be used to isolate ncDNA k-mers that played a role in the protein the genes coding DNA encoded. 

We first associated each k-mer with the transcript's protein/m-RNA sequence signature. Then, we computed the vector at each [k-mer/protein signature] for all transcripts. We referenced each at the same start or end position in their base sequence. This provided a novel comparison for ranking k-mers of transcripts. 

There are two ways to derive a vector from a k-mer. Transcripts in the image below labeled T1-T4 represent two different ordering methods. The first (left) is a constant because every k-mer of each transcript will have the same protein signature. The second (right) is established using the i-Score of the k-mer. A single application of a max-order rule is applied to re-order transcripts with equal value iScore's provided the re-order equates with a protein signature order.

The image above represents a k-mer of four different transcripts (left) protein signature sort yields k-mer iScore vector (T3,T2,T4,T1) different to protein signature vector (T2,T3,T4,T1). We refer to each transcript k-mers position in this as Protein hash Vector (PhV). On the right the iScore vector (T1, T2, T3, T4) based on k-mer sort, adjusted for max-order to protein signature. We refer to transcript position as Position (in) iScore Vector (PiV). Using these vectors and their differences, we rank k-mer/protein entries for each transcript in the set.

The actual result for the 1000th k-mer of 15 men1 transcripts can be seen here. Each k-mer can be queried independently. 

No comments:

Post a Comment