Sunday, May 26, 2019

The Deep Data of Cell Selection.

We identified 16 DNA sequences, each comprising 28 nucleotide's from 15 TP53 transcripts, each comprising ~10,000 intron1 nucleotide's and an mRNA isoform. To make the identification we computed and analyzed relationships between more than 225,000 derived sequences for each transcripts ~10,000 intron1 nucleotide's and mRNA with the same for each of the 15 transcripts.

During analysis we first discovered shorter (than 28 nucleotide) sequences, iterated from the same sequence start position in each transcript and compared them by using a highly ordered vector. The order of the sequences, for each of the 15 transcripts in each vector was compared with the vector computed for the sequences at the next start position. The final selection of shorter length sequences were made from sequences at the most disordered vectors. From these sequences we identified any consecutive 28 nucleotide's, from intron1 of all 15 TP53 transcripts that fully incorporated more than one of these shorter length sequences, no less than 8 nucleotide's.

In each of the 16 DNA sequences, 4 or 5 (complex) shorter length sequences were discovered in their  identical nucleotide combinations suggesting a broad sequence affinity with these shorter length intron1 sequences.

We ran a series of 8 sequence alignment tests to determine whether there was anything special about the 16 DNA sequences of length 28 and the shorter sequences used to identify them. Each test used an algorithm to optimize the ordering of the sequences according to a sort score. This score assigned points to each A|T|C or G character that was aligned with the next of the 16 length 28 sequences or the next sequence of any length in the ordering. Each of the 8 tests varied the points weighting assigned to the length 28 alignment, while points assigned to the next alignment were kept constant. This was expressed as a ratio but left un-normalized. As a control we scrambled the ordering of the letters in each sequence and applied the same algorithm to optimize for a sort score, obtaining the following results.


Scoring Ratio (L28:Next)
0.5:1
1:1
1.5:1
2:1
3:1
5:1
7.5:1
10:1
Sequence Score
922.5
1288
1734
2144
2923
4539
6563.5
8517
Randomized Score
742.5
1036
1373.5
1072
2367
3631
5362
6917


8 organizations of 16 x 28 oligonucleotide sequences and shorter lengths
The order bias toward Sequence Score (resulting from our selection process) is evident in the chart and numbers above. It indicates that the 16 identified DNA sequences and those used to select them have better alignments than the random alternatives. In previous randomization studies we determined the vector performs similarly against two methods of randomization's which are described in detail at the link.

These methods form part of our neural network initiative and will be used during the process of cell selections for autologus immune therapy using patient derived Natural Killer cells.