Monday, November 26, 2018

Mathematical vectors in biology

Mathematical vectors in biology!

We built a model to determine whether a random or non-random relationship existed between introns and proteins of a transcript. We determined the relationship was overwhelmingly non-random and progressed to study particular genes in more detail.

Based on our studies we suggested TP53 readily encodes specific isoform concentrations that alter next generation transcriptions and introns play significant roles. Here we validate our selection logic and describe its proposed use in immunotherapy. From intron1, we computed +400k k-mers from which we selected 8 short k-mers out of 12 TP53 and 29 BRCA1 transcripts. We synthesized the sequences and In subsequent transfection experiments 3/3 TP53 and 3/5 BRCA1 significantly (p<0.05) reduced the rate of proliferating HeLa cells.

Selection Background

For each transcript we first computed intron1 k-mers greater than 7 oligos (see image below, each k-mer has an Offset#). For all k-mer’s we computed a signature and associated it with a signature of the transcripts’ protein. In Offset# order, for each k-mer of each transcript we ordered transcripts according to the result of a k-mer:protein signature ordering.  For each offset# (k-mer) we recorded the order of each transcript in a vector.

In offset# (computation) order, we observed the next vector to discover any changes in ordering of transcripts. After filtering k-mers for a length change, more than 90% of transcript ordering remained stable. Occasionally one or two transcripts changed position, very rarely more than 75% of transcripts in the vector changed position. So when we discovered a few vectors with >75% change we extracted them and subjected them to a selection algorithm that  identified 8 short, 28 oligo sequences from the 41 transcripts processed.

Codondex iScoreTM ordering, comparison and selection algorithms consider that transcripts compared at sequential k-mers represents a compelling method to identify sequences that “stand out from the crowd” because they may be inherent upstream of transcription. Potential of any k-mer exists to aggregate or contribute to the formation of coacervates in a sequence and length dependent manner. In the image above red text represents our computation of the first 14 of 135 potential k-mer’s of the identical 23 oligo sequence. For each k-mer all k-mers of the 23 oligo sequence would be queried (in both directions) and repetitions counted. For example of the 14 k-mers, the k-mer at Offset#0 can also be found in Offsets#2,5,9 and 14.

In the compound computation of the 23 letter sequence, Offset#0 GTGGGAAT is repeated in 16 other k-mers and Offset#135 GTGGGAATCTTATCCATGACCCA has 136 k-mers repeated in it (including itself). When looking at the entire intron sequence (or any long sequence) there is never a linear progression of k-mers, inevitably the counts becomes disordered.

In the following example of ordering transcript computations for a single Offset#, each result has been ordered in a vector of 15 men1 transcripts. Each protein signature is constant for every Offset# because the signature is computed from the entire string. Some protein signatures are identical, but not intron signatures. Transcripts with identical protein signatures are preferentially sorted to give final order to transcripts in the vector.

 15 transcripts, for a single k-mer (Offset#) in a vector
In our detailed review of each transcript we discovered that the compound effect of k-mer repeats described an inherent structure of relationships between nucleotides lengths. We considered how varied transcription events would alter the representations of these non-coding oligo lengths in their ncRNA form. For example, Offset#135 included 136 repeats of k-mers, which statistically infers it has a greater chance of survival and/or function in any of its constitutive parts than a k-mer with a lesser number of repeats.

As stated in the opening paragraph we synthesized 8 short RNA selections we made using our vectors to discover how they translated in biology. In future we intend to compute p53 (or other gene) transcripts from multiple samples of a patient biopsy. We do this by separating cells into multiple wells, running RNAseq on each well and computing the transcript position of each well in our p53 vectors. Once we identify the logarithmic proximity of each wells transcript to other transcripts we will select a well. We will use selected cells to educate natural killer cells extracted from the patient and return the immune cells only to the patient to reduce proliferation of diseased cells. We hope to bring this therapy to the market in the next few years.