Saturday, February 13, 2021

Cell's with an Index like Google?

Its been a while since I last wrote about DNA repeats or their RNA descendants. In that time advanced research has emerged relating repeats to increasing numbers of viral or other disease. Generally the repeats of interest here can be either long or short sequences of nucleotides that from part of an unspliced gene. Logically, counts of long sequences that repeat would be less than short sequences, but when normalized to their respective nucleotide lengths the indexed results can shift the relative order of repeating sequences quite dramatically.

In most knowledge systems repeats in low level data present redundancy and opportunity to improve efficacy in local or global upstream processes acting on that data. We see this in the structure of efficient alphabets that had a significant impact on whether or not a language survived continuous use. Why use ten words when precise meaning, including abstracts can be derived from three. Or why alpha when, at least for some period in the language history alphanumeric made it more effective? 

Search engines reduce their primary index to the least redundant data set used to drive efficient data access by upstream requests and processes to satisfy any query. However, at the storage level, data redundancy is permitted because energy efficiency is gained. Similarly genetic DNA is massively redundant. Redundant data stores can make highly indexed systems more efficient because frequently accessed data elements are more accessible at multiple locations and parallel processes can more efficiently satisfy upstream requests.

Repetitive sequences constitute 50%–70% of the human genome. Some of these can transpose positions, these transposable elements (TE's) are DNA transposons and retrotransposons. The latter are predominant in most mammals and can be further divided into long terminal repeat (LTR)-containing endogenous retrovirus transposons and non-LTR transposons including short interspersed nuclear elements (SINEs) and long interspersed nuclear elements (LINEs). The most abundant subclass of SINEs comprises primate-specific Alu elements in human with more abundant GC-rich DNA. Humans have up to 1.4 million copies of these repeats, which constitute about 10.6% of the genomic DNA. Long interspersed element-1 (LINE1 or L1), are abundant in AT-rich DNA, constitute 19% of the human genome and make up the largest proportion of transposable element-derived sequences.

Most TE classes are primarily involved in reduced gene expression, but Alu elements are associated with up regulated gene expression. Intronic Alu elements are capable of generating alternative splice variants in protein-coding genes that illustrate how Alu elements can alter protein function or gene expression levels. Non-coding regions were found to have a great density of TEs within regulatory sequences, most notably in repressors. TEs have a global impact on gene regulation that indicates a significant association between repetitive elements and gene regulation.

In liquid systems, phase separation is one of the most fundamental phase transition phenomena and ubiquitous in nature. De-mixing of oil and water in salad dressing is a typical example. The discovery of biological phase separation in living cells led to the identification that phase-separation dynamics are controlled by mechanical relaxation of the network-forming dense phase, where the limiting process is permeation flow of the solvent for colloidal suspensions and heat transport for pure fluids. The application of this derived governing universal law is a step to understanding and defining the liquid biological indexing equivalence of data-processing systems and inherent genetic redundancy.

Repeats have been widely implicated. In plant immunity a TE has been domesticated through histone marks and generation of alternative mRNA isoforms that were both directly linked to immune response to a particular pathogen. p53 transcription sites evolved through epigenetic methylation, deamination and histone regulation that constituted a universal mechanism found to generate various transcription-factor binding sites in short TE's or Alu repeats. In disease cytoplasmic synthesis of Alu cDNA was implicated in age related macular degeneration and there is transient increase of nearly 20-fold in the levels of Alu RNA during stress, viral infection and cancer.

In chromosomal DNA, each sequence, relative to its length may conveniently describe a phase-separated indexed location and method for discovery. Repeats within genetic DNA may present precisely sensitive phase-separated guidance to drive histone, epigenetic and transcription factors to specific genetic locations at the cells' 'end-of-line' from where the genetic response to upstream membrane bound changes begin.