The origin of life as a probabilistic event in the Universe

By means of a probabilistic mathematical model, we bring into discussion the origin of life as a stochastic process. We consider only the chance of information emergence in the proteome and genome under the ideal thermodynamic and chemical conditions. For a more realistic model, we used, as a parameter, the information amount in Nanoarchaeum equitans genome, the simplest known nowadays, as the equivalent to the first living cell that could have emerged in primitive Earth. We estimated the probability of information emergence by chance as about 10-500’000. Considering the necessary ideal conditions for information emergence, the probability of the origin of life would be even smaller.

Observing its complexity, the probability for a cell to emerge by chance must be too small.The skepticism about the Darwin's warm pond is so old as this hypothesis that defies the logical thinking (Follmann & Brownson, 2009).In the 40's, the famous astrophysical Fred Hoyle regarded in his novel Evolution from Space (Hoyle & Wickramasinghe, 1984) that the probability of emergence of a litlle (and necessary for life) set of enzimes is about 10 -40.000 .Hoyle's coauthor, Dr. Wickramasinghe (Wickramasinghe, 2010) had advocated for cosmic panspermia as a possible answer for the Original Problem.
Even if illogical, is the stochastic emergence of the life physically compatible to the all Universe's dynamics?Seth Lloyd (Lloyd, 2002), regarding the estimated number of bits in the Universe (in order of 10 90 ), calculated that the Universe produced a total of 10 120 transformations (e.g., spin inversions), or computational operations, since the Big Bang.Thus, the probability of any transformation that yet occurred in the Universe is about 1/10 120 .
In this essay, we present a probabilistic model to discuss the origin of life as a stochastic event.The model regard the emergence of the necessary and suficient information (encoded in the DNA and also present in the proteome) for a biomolecular machinery express a selfreplicant living cell (see methods).For that, we are considering the information in the simplest known living cell as the parameter.
Therefore, the model concerns only the cell information.The problem of the emergence of all chemical, structural and environmental context, called fundamental resources (4), necessary for the putative organization of genomic/proteomic information, is not regarded here.On the contrary of those FR, the probability of the emergence of genomic/proteomic information is theoretically predictable because it is the combination of predefined "bits", that are nucleotides and aminoacids in informative sequences along their respective polimers.
For the subsequent discussion, we consider two assumptions: (1) the Universe is the only existing thermodynamically closed system, once there is nothing encompassing the Universe and there is not a perfectly closed system encompassed within the Universe.(2) The Universe's dynamics is probabilistic, i.e., it is the result of the chance.
These assumptions yield two logical deployments: (1) if the Universe is the only closed system, everything has its origin inside the Universe and everything depends uniquely on the Universe's resources; (2) if the Universe is a probabilistic system, every deterministic process is naturally probabilistic too, because the first .27 The origin of life as a probabilistic event in the Universe.process in the Universe, that was the precursor of the all other processes, purely happened by chance.Thus, even being the process X set by a chain of causes, this process is already probabilistic per se, because it is statistically dependent on the first process.
In light of these assumptions and their logical deployments, non-living (non-selfreplicant) entities in the universe had originated from chains of probabilistically dependent events P(A|B): in a chain of dependent probabilistic events, if A is determined by B, and B is determined by C. The probability that event A happens is dependent on the probability that event B happens, and the probability that event B happens is dependent on the probability that event C happens, and so on.Therefore, the product of the probabilities of each event taken independently is equal to the probability that the final event A happens after the occurrence of a chain of consecutively dependent events.
Self-replicant entities, producing many copies of themselves, enlarge the probabilities of transformation and the conditional probabilities don't obey the above linear chain of statistical dependency.
Thus, before the life origin, the probability of emergence of genome information is equal to the probability for all nucleotides that form such genome to be in the correct sequence, having this genome appeared at once or in N steps, each one probabilistically dependent on the preceding ones.
The present model explores this probability of encoded information in a genome/ proteome to emerge among all the possible combinations of monomers (among all possible codes) to discuss the admissibility of the origin of life as a stochastic event.

The Model
The model predicts that the genome and proteome are formed by combinations of any nucleotides and amino acids with equal binding probability between them.The genome/proteome chosen was the thermophilic prokaryotic cell Nanoarchaeum equitans, an archeabacteria, the simplest known living being (Huber et al., 2002;Waters et al., 2003;Das et al., 2006), which is regarded as being a living fossil (Di Giulio, 2006).N. equitans lives in extreme conditions, similar to the primitive Earth environment, one billion years ago.This cell has a simple metabolome, which does not synthetize lipids, amino acids or nucleotides; however, it is obviously able to perform self-replication.N. equitans holds 466340 pairs of informative bases (95% of all the cell DNA), which form its 540 genes.This microorganism uses all the twenty known amino acids in its proteome.
The model accesses the chance for the genome and proteome to be expressed in parallel (in order of magnitude), as follows: P(g) = b -g , where g is the amount of informative DNA nucleotides in the genome, and b is the chance for any DNA base to be raffled (b = 0.25, for four nucleotide types).
Revista Brasileira de Zoociências 19(1): 25-30.2018 28.P(p) = a -p , where p is the amount of amino acids in the proteome, corresponding to the number of encoded pieces of information in the genome (p = g/3) and a is the chance for any amino acid to be raffled (a = 0.05, for twenty amino acids).
Therefore, we calculated the total probability of parallel emergence of N. equitans proteome and genome: where E is the probability for the piece of information in the genome/proteome in the N. equitans to emerge.
Regarding the known constitution of the structure and metabolism of the genome and proteome of cells at general, based on DNA and polipeptides, the probability for the origin of life to be a stochastic event by the means of the chance of information emergence in the simplest known cell, inside an ideal dissipative system with all fundamental resources, is of about 1 / 10 500K , a number astronomically near to zero.
As we have previously pointed out, this model is informational, i.e., it considers only the probability of information emergence, previously regarding all the structural dynamic requiring conditions (the fundamental resources) as ideal, since these elements cannot be estimated.We also disregard the necessary amount of specimens for the emerging species to be able to persevere.It is unlikely that one single emerging cell can originate an ecologically stable population for beginning biotic evolution.Therefore, the probability of origin of life is a product of E: the emergence of ten cells means a probability ten times lower.
The main discussion about the validity of this simple probabilistic model appears if we consider that the putative prebiotic processes create a proto-metabolism and a protogenome that would be steps for life emergence (Wachtershauser, 1990;Lee et al., 1997;Aono et al., 2015).Before the proto-system became self-replicant, the final/total probability is equal to the product of the probabilities of those steps, because they are dependent on the preceding ones.After the first self-organized and selfreplicant system emerged, the probabilities of subsequent system transformations increase, in function of the number of self-replicant units.Nevertheless, that emergent self-organized and self-replicant system was the first life form: the OP would have been solved.
Therefore, considering that the first life form was equivalent to the N. equitans, the probability of the origin of life would be remarkably smaller than E.
Would be a self-organized and selfreplicant system substantially simpler than N. equitans possible?First, we must define the .29 The origin of life as a probabilistic event in the Universe.threshold to "substantially simpler": an emergent system with emergence probability equivalent to the Universe's computational load, Lloyd's number (Lloyd, 2002), near Dembsky's universal bound of probability.A genome with probability of 10 -120 would have only 250 nucleotides, which is equivalent to about 28% of one gene from the N. equitans.The smallest known viral genome, from the circoviridae family (BelyI et al., 2010), has about 2000 nucleotides, and viruses are not self-replicant.Therefore, it is conceivable that the information relative to 250 nucleotides is not sufficient to generate any life form.A system one hundred times simpler would still be too big, whose probability (10 -5000 ) is still incompatible to the estimated magnitude of the Universe.Observing Lloyd's number (Lloyd, 2002), we can notice that the mathematical probability of the origin of life as a stochastic phenomenon could be regarded as a physical impossibility.
As an alternative discussion, we visited England's theory of emergent adaptation of complex systems by energy dissipation (England, J.L. 2013;Perunov et al., 2016): if, on the one hand, the Universe tends to the thermodynamic equilibrium (maximum entropy), on the other hand, the open systems inside the Universe would tend to spontaneous processes of dissipationdriven self-organization and adaptation.These processes result in a progressive enlargement of diversity and complexity.The replication of complex dissipative systems, in the nature, seems to be a rule: reproduction is a way to dissipate energy, for the species stability improvement and for the increase on the probability of new state transitions (i.e., evolution).Another alternative for this problem is the RNA World Theory, what regards a possible prebiotic evolution of selfreplicators made by RNA molecules with catalytic properties, exponential growth and pottentially able to darwinian evolution (Robertson & Joyce, 2012;Robertson & Joyce, 2014).But it still speculative lacking evidences to explain the lifeforms with DNA and typical enzymes.
Thus, we consider that the Origin of Life as a stochastic phenomenon is incompatible with the presented assumptions and their logical deployments.Therefore, unknown properties of the Universe organization could be necessary for the origin of life in conditions for survival and evolution.