Following a particular theory of distributed computing, we developed a method using vectors to expose relative differences in the sort order of subsequence's (k-mers) of transcripts of non-coding DNA. Our expectation is that vectors, which have been used successfully to partition data and determine redundancy can also be used to isolate ncDNA k-mers that played a role in the protein the genes coding DNA encoded.
We first associated each k-mer with the transcript's protein/m-RNA sequence signature. Then, we computed the vector at each [k-mer/protein signature] for all transcripts. We referenced each at the same start or end position in their base sequence. This provided a novel comparison for ranking k-mers of transcripts.
There are two ways to derive a vector from a k-mer. Transcripts in the image below labeled T1-T4 represent two different ordering methods. The first (left) is a constant because every k-mer of each transcript will have the same protein signature. The second (right) is established using the i-Score of the k-mer. A single application of a max-order rule is applied to re-order transcripts with equal value iScore's provided the re-order equates with a protein signature order.
The actual result for the 1000th k-mer of 15 men1 transcripts can be seen here. Each k-mer can be queried independently.