Thursday, May 13, 2021

Non-Coding DNA Key Sequences

DNA Structural Inherency

Wind two strands of elastic, eventually it will knot, ultimately it will double up on itself. Separate the strands. From the point of unwinding, forces will be directed to different regions and the separation will approximately return to the wound state of the band. Do the same with each of 10 different bands or strings of any type, they will all behave in much the same way. For a given section of DNA being transcribed, the effect of separation will be much the same. For a given gene, there will be sequences that can tolerate force to greater or lesser degrees. For different transcripts, of a gene variation at those sequences may be crucial to the integrity of transcription machinery that separates DNA strands to initiate replication to RNA and for the outcome.

Cellular biology is enormously complex in all regards. The physics of molecular interaction, fluid dynamics, and chemistry combine in a system where cause and effect is near impossible to predict. At the most elementary level we hypothesize some non-coding DNA (ncDNA) possess structural inherencies that can be deployed to direct gene proteins and cell function for diagnosis or therapy.

Coding DNA and its regulatory, non-coding gene compliment is transcribed and spliced from a transcribed gene. Transcription to RNA, edited mRNA, spliced non-coding RNA and ultimately mRNA translation to protein can produce wide ranging, variable outcomes that may not be re-captured experimentally. 

A single nucleotide polymorphism (SNP) or SNP combinations within a gene may affect the finely tuned balance that results. Under different environmental conditions this could be material to the protein produced. Additionally other mutations of the gene could add complexity to the environment and/or the  resulting protein translation. 

At this level of cellular biology, genetic DNA stores instruction for protein assemblies to produce new protein required for the fully functional cell. However, DNA's stored mutations can lead to different functional or non-functional versions of protein depending on many different factors. Relationships between ncDNA, including mutations and the transcripts' edited, protein coding mRNA may represent unexplored inherencies that can regulate the gene's mRNA or translated protein.

We built an algorithm to elaborately compare ncDNA sequences of multiple protein coding transcripts of the same gene. For each transcript it steps through every variable length ncDNA sequence (kmer) (specifically intron1), computes a signature for each and indexes it to the constant of the transcripts' mRNA signature. For each step these signatures order the kmers for each of the transcript's. The order is represented in a vector of all the transcripts being compared.  

At millions of successive steps (depending on total intron 1 length's) transcripts mostly retain their vector ordering except, as expected at a kmer length change. Mostly transcript order in the vector does not change, occasionally a few positions change, vary rarely do all positions change. Position changes that cause another, like a domino effect are filtered out. For the rarest positions changes at a step, we look to the root causes in the kmer (sequence). We call this a Key Sequence because it is identified by the significance of changes to transcript positions in the vector compared to the vector at the next step. 

Therefore, Key Sequences cause the most position changes between transcripts being compared by the algorithm. This relative measure is step dependent and Key Sequences are discovered by comparing transcript positions in the vector at the next step location. Logically, this infers a genes structural inherency discovered through ncDNA Key Sequence relationships to mRNA, to other transcripts, error in gene alignments, sequenced reads or the algorithm. 

In assay testing we were able to predict and synthesize non-coding RNA Key Sequences that significantly reduced proliferation of HeLa cells. In our pre-clinical work, based on comparisons to transcripts of the TP53 we will be predicting the efficacy of cell and tissue selections that educate and activate Natural Killer cells.

If Key Sequences are inherent they could open a new frontier for diagnosis and therapy.