A representation of a nucleic acid sequence encodes a particular gene having at least one intron. An intron signature value corresponding to the at least one intron is determined based on a first computational function applied to at least one portion of the representation of the nucleic acid sequence corresponding to the at least one intron. A protein signature value is determined, being based on a second computational function applied to a representation of a protein. In a database, an association is formed between the intron and protein signature values. This process is repeated for each of a plurality of nucleic acid sequences. Nucleic acid sequences in the database are ordered based on a sort of corresponding intron signature values. An ordering determined by the sort is used to determine or confirm a role or function of a portion of a given nucleic acid sequence.
TP53 Intron Derived Concentrations Implicate p53
TP53 functions as a tumor suppressor and has been described as the Guardian of the Genome. p53, the protein coded by TP53 predominantly binds regulatory response elements in introns of multiple target genes. We compared identical oligo subsequences in introns of multiple gene/transcripts including TP53, BRCA1, men1, PELP1, SET, HIF1A, ULBP1/2 and IRF3. Some identical oligo subsequences also contain the core response elements of known p53 binding sites or TP53 intronic, autoregulatory response elements and strongly correlate with known miRNAs. We describe how TP53 intronic autoregulatory response elements contribute their preference to p53 binding and propose a computational method to identify targets.
We were motivated to discover whether small intron:protein signatures triangulate to identify functional sequences. We tested biological outcomes using the signatures and intend to screen cells for use in autologous and allogeneic immunotherapies.