Research suggests that retrovirus insertions evolved from a type of transposon called a retrotransposon. The evolutionary time scales of inherited, endogenous retroviruses (ERV) and the appearance of the zinc finger gene that binds its unique sequences occur over same time scales of primate evolution. Additionaly the zinc-finger genes that inactivate transposable elements are commonly located on chromosome 19. The recurrence of independent ERV invasions can be countered by a reservoir of zinc-finger repressors that are continuously generated on copy number variant (CNV) formation hotspots.
Frequently occuring DNA breaks can cause genomic instability, which is a hallmark of cancer. These breaks are over represented at G4 DNA quadruplexes within, hominid-specific, SVA retrotransposons and generally occur in tumors with mutations in tumor suppressor genes, such as TP53. Cancer mutational burden is shaped by G4 DNA, replication stress and mitochondrial dysfunction, that in lung adenocarcinoma downlregulates SPATA18, a mitochondrial eating protein (MIEAP) that contributes to mitophagy.
Genetic variations, in non-coding regions can control the activity of conserved protein-coding genes resulting in the establishment of species-specific transcriptional networks. A chromosome 19 zinc finger, ZNF558 evolved as a suppressor of LINE-1 transposons, but has since been co-opted to singly regulate SPATA18. These variations are evident from a panel of 409 human lymphoblastoid cell lines where the lengths of the ZNF558 variable number tandem repeats (VNTR) negatively correlated with its expression.
Colon cancer cells with p53 deletion were used to analyze deregulated p53 target genes in HCT116 p53 null cells compared to HCT116-p53 +/+ cells. SPATA18 was the most upregulted gene in the differential expression providing further insight to p53 and mitophagy via SPATA18-MIEAP.
p53 response elements (p53RE) can be shaped by long terminal repeats from endogenous retroviruses, long interspersed nuclear repeats, and ALU repeats in humans and fuzzy tandem repeats in mice. Further, p53 pervasively binds to p53REs derived from retrotransposons or other mobile genetic elements and can suppress transcription of retroelements. The p53- mediated mechanisms conferring protection from retroelements is also conserved through evolution. Certainly, p53 has been shown to have other roles in DNA context, such as playing an important role in replication restart and replication fork progression. The absence of these p53-dependent processes can lead to further genomic instability.
The frequency of variable length, long or short nucleotide repeats and their locations within a gene may be key to the repression of DNA sequences that would otherwise cause genomic instability or protein expressions that would eat bacterial mitochondria or destroy its cell host.
The complexity of variable length insertions is made evident when exhaustively analyzing a simple length 12 sequence for the potential frequency of each of its variable length repeats starting from a minumum variable length of 8.
Then, for TGTGGGCCCACA(12)
All possible internal variable length combinations from and including length 8:
TGTGGGCC(8)|GTGGGCCC(8)|TGTGGGCCC(9)|TGGGCCCA(8)|GTGGGCCCA(9)|TGTGGGCCCA(10|GGGCCCAC(8)|TGGGCCCAC(9)|GTGGGCCCAC(10)|TGTGGGCCCAC(11)|GGCCCACA(8)|GGGCCCACA(9)|TGGGCCCACA(10)|GTGGGCCCACA(11)|TGTGGGCCCACA(12)
For example, reviewing length (8) only:
TGTGGGCC (8) occurs 5 times
GTGGGCCC (8) occurs 8 times
TGGGCCCA (8) occurs 9 times
GGGCCCAC (8) occurs 8 times
GGCCCACA (8) occurs 5 times
Any repeat can be ranked based on its ocurrence within all possible combinations of a given sequence, known as the repeats' iScore rank. This illustrates a potential useful statistical ranking that, subject to biology may describe a repeats inherency to be more or less effective, in increments of the gene sequence.