Monday, August 13, 2018

Deep k-mer - new dimension analysis

Sometime back we published a paper that expanded on TP53 intron1 k-mer relationships we have been investigating using our algorithms. We described how TP53 response elements bind p53 monomers and more complex response elements bind and form P53 tetramer's that support known transcription events.

Since that time we have been conducting numerous laboratory tests in conjunction with Professor Noam Shomron at TelAviv University to confirm that sequences we identified from our TP53 bioinformatic produced predictable results that were precisely directed in cells.

From our initial results, it appears we can elicit an important relationship between intron1 and sequenced proteins of same transcripts. Further that these relationships are non-random and that they can be used to identify the highly specific DNA intron1 sequences that drive this non-randomness.

We previously published the chart below indicating that men1 k-mers ordered into a 15-variant transcript vector were producing length bias despite our algorithm being length agnostic. The scatter-graph is a plot of k-mers (15 variants) by intra transcript-repeats:length (horizontal axis) that gather into vector color bands by charting the k-mers repeats.

On the basis that repeats for (length)ATCG(count) would be expressed as (4)ATCG(3), (3)ATC(1), (3)TCG(1), (2)TC(3) and (2)CG(2), the count for (4)ATCG(3) equates with (2)TC(3).


15 variant men1 Intron1 Transcript - kmer repeats

Relying on the unique ordering for each variants k-mer's in the transcript vector, we made selections of TP53 k-mers where variant order in vectors most significantly changed compared to the previous vector.  For this we discovered that most disrupted vectors were caused by k-mers of very low lengths. Further, in comparison almost all vector positions in most vectors remained stable.

12 TP53 Intron1 Transcripts

Ordering in our vectors is a way to represent transcript k-mers where computation is sequential from the first oligo of intron1. Each k-mer contributes its vector ordering based on its relationship to the transcripts' constant, protein or mRNA signature for each variant.