# Seven Theses on Intelligence and Science

March 22, 2023

***Gregory F. Coppola***

Apocalypse Like Right Now

# Introduction

We will present seven theses on the topics of *******science******* and ************intelligence************.

**The Common Theme of Intelligence**

These theses explore links between the concepts of *********intelligent design*********, ******************human intelligence****************** and ************************artificial intelligence************************.

The common theme between these topics is: ************intelligence************.

**Intelligent Design is Such a Science *Right Now***

There is a widespread perception that “intelligent design is not a science”.

For example, in the ****ACLU****’s handy primer *Frequently Asked Questions About "Intelligent Design"*, we read:

> A scientific theory makes predictions about occurrences in the natural world that can then be tested through scientific experimentation. ID makes no predictions and cannot be scrutinized using the scientific method.
> 

The overarching thesis of the enclosed seven theses is that the ACLU’s statement just listed is *****wrong*****, because intelligent design ************is a science************.

In more detail, the overarching thesis is:

- *intelligent design* is a *science*
    - it can make forward-looking predictions
        - as well as model the past
    - it has a better fit to the past than Darwin’s random/gradual evolution
    - it has finite size
    - it is provably more parsimonious than the “infinite universes hypothesis” invoked by atheists to counter the “finely tuned universe” argument

# Background

## Science and its Prerequisites

We begin with a meditation on the notion of “science”.

### The Vienna Circle

The Vienna Circle met in the city of Vienna over the period 1924-1936 to discuss the “philosophy of science”.

The goal of the group was to separate those intuitively “meaningful” or “productive” pursuits going on at a university, like science and mathematics, from those “fuzzy” or “unproductive” fields, like “metaphysics”.

For example, the existentialist philosopher Martin Heidegger famously asserted that ********************the nothing nothings********************.

According to the Vienna circle, ********************the nothing nothings******************** was intuitively a “meaningless” statement, and they sought a criterion by which to formally say so.

The conclusion of the Vienna Circle was that scientific pursuits are those that predict “empirical quantities”, or, in other words “observable quantities”, and make “verifiable statements”.

### Science

Science is, intuitively, the business of predicting “empirically observable quantities”.

**Data Collection**

We might imagine that the stereotypical scientist is a person in a laboratory, wearing a lab coat, holding perhaps a beaker, collecting ****data****.

But, the collection of data is perhaps the least controversial part of science.

**Modeling and Predictions**

The ultimate goal of science is to make *models*, which make *predictions*.

In other words, science is done scribbling at the chalkboard as much as it is in the lab coat collecting data.

The models scientists conceive of are inspired by past data, and the predictions are evaluated against both ****past**** and ******future****** data.

Here, *past data* is data we have already seen, and *future data* is data we have not yet seen.

**A Theory can be Falsified but never Proven “True”**

Science can never prove a theory to be ****true****, but can only can, at best, conclude that one theory is the *most likely*.

Typically, we expect all theories to be proven “false” in the future, in the sense that they must at least be refined, and made more precise.

Thus, it is said “all models are wrong, but some are useful”.

Finally, we note that speaking of statements as “true” or “false” is somewhat pre-statistical, because the more general way would be to talk in terms of probabilistic observations.

**Non-Intuitive Nature of Science**

The experience of the 20th century in physics showed that reality turned out to be much more ************unusual************ than would have been expected at the end of the 19th century.

It turned out that time and space were related, energy and mass were the same, and truth was in some quantum sense “probabilistic”.

Each of these realizations was an upheaval, to the point where physicists were forced to realize that the philosophers’ practice of reasoning based on “reasonable assumption” is of no use in science.

Niels Bohr is said to have said to his colleague Wolfgang Pauli:

> We all agree that your theory is crazy. The only question is whether it is *crazy enough* to be correct.
> 

Scientific theories are not required to be, nor rewarded for being, “reasonable”.

The only criterion is the *********************empirical fit********************* of the theory (traded against theory complexity).

Atheist philosophers would do well to remember that “reasonable assumptions” have no place in science, when considering the problem of intelligent design.

### Mathematics

Mathematics involves the creation of abstract objects, like numbers or sets, and the proving of truths that follow by the definitions of such objects.

If something is true in mathematics, it is always true, because it follows by definitions, no matter what observations are ever made.

**Some Mathematics is Inspired by Science**

In some cases, the inspiration for a branch of mathematics might come from science, or the need for applications.

For example, it is speculated that the Pythagorean theorem was first known empirically before it was formalized as a theorem by the Ancient Greeks and recorded by the mathematician Euclid in 300 BCE.

**Sometimes Science Leverages Previously Unused Math**

In other cases, including important cases in physics, the mathematics are invented first, and then, by an act of true “luck”, someone finds a way to apply those results.

So, it is actually an ******************empirical question****************** in and of itself which mathematics will be useful to describe nature.

Thus:

- any branch mathematics *****could*****, in principle, be useful in modeling the “world”
    - no branch of mathematics can be ruled out of describing “nature” ********a priori********

Thus, in principle, all mathematics (the proving of theorems based on definitions) is considered legitimate, from the perspective of science.

This does not preclude the judgement that a branch of mathematics is not worth pursuing given finite resources based on expert judgement, but only says that, in principle, no branch of mathematics can be definitively judged useless.

### Legitimate Non-Scientific Philosophy

To single out science and its pre-requisites as special is ***not*** to say that non-scientific intelectual endeavors are not important.

For example, art, music and literature are clearly very important to human civilization.

By the same token, the critical analysis of art, music and literature are important.

Religious thought and “revelation”, like the Bible, the Torah, the Vedas, the Dao De Ching, shamanic traditions, the Quran, etc., are valuable because, we suppose, the collective “revelations” of human civilization are a better source of data than the “revelations” made to any individual.

It is possible that someday there will be a scientific interpretation of these various “revelations”.

But, until that time, we are not advocating that people don’t study historical “revelations”, whatever that word ultimately means.

As William James argued in “The Will to Believe”, we have to act, especially with regards to religious questions, without having all of the data available.

The non-scientific intellectual pursuits can be called *philosophy*.

There are many legitimate uses of philosophy.

### The Philosophy of Science

But, there are also many cases in which “philosophers” are just wasting time.

There is ********a priori******** no way to say exactly which philosophy is useful, in general.

There is however, one kind of “philosophy”, which is special for the practice of science.

This is the so-called “philosophy of science”.

This branch of philosophy answers the question, “how do we do science?”

In other words, the output of the “philosophy of science” is a *********procedure********* for doing science.

While all other forms of philosophy must be evaluated on their own merits, the “philosophy of science” was included by the Vienna Circle as a special kind of philosophy, because it is the one branch of “philosophy” that is a prerequisite for science.

## The Modeling of Intelligence

The trend in cognitive science, starting in the 20th century, has been steadily toward more use of the use of concepts from computer science and statistics in the study of the brain or mind.

We examine this story of the role of computer science in cognitive science through the story of the most influential cognitive scientist of the 20th century, Professor Noam Chomsky, of the Massachusetts Institute of Technology.

### A Fascinating Case Study in Stagnation

The story of Chomskyan linguistics, we believe, is an interesting example of “stagnation” in a scientific discipline (linguistics), and is a further data towards Max Planck’s aphoristic theory that science proceeds (if it must) “one funeral at a time”.

This is relevant to the question of intelligent design, because Darwinism is a theory that one could say is “stagnating”, with still no clear story of how exactly evolution happened, over 160 years after the theory was first proposed in 1859.

Thus, we propose that the stagnation of Chomskyan linguistics may be a model for the stagnation of the Darwinian theory.

### **Vienna Circle Revisited**

The Vienna Circle’s conclusion was popularly taken to be:

- in order for a statement to meaningful, there must be a way to ****verify that statement****

This rule, as stated, has a *****fatal***** flaw, because it is not *********************individual statements********************* that make verifiable predictions, but instead it is ********theories******** which make predictions.

### Behaviorism

Behaviorists like B. F. Skinner were limited by their incorrect ideas of what was required to do “science” in the field of psychology and cognition.

Skinner interpreted the conclusions of the Vienna Circle to mean that, if one was going to make a science of “psychology”, this would require that *each individual statement* would be verified.

Skinner would discourage talk of “unseen entities”, or “mental events”.

Instead, the only way to do psychology in his interpretation, was with simple experiments that could, Skinner thought, be “completely” described, like with rats in cages doing simple stimulus-response experiments.

Skinner argued in ***************Verbal Behavior*************** (1957) that it was not only possible to understand ********************human language******************** through stimulus-response experiments with rats, but that he thought he ***********already had*********** understood human language through such experiments.

Of course, this was wrong: it is not possible to understand human language through experiments with rats.

### Chomsky’s 1957 Revolution in Linguistics

**Overview**

Chomsky’s innovation in cognitive science was to take concepts from computer science, especially from Turing and Church, and apply these to the study of language.

Chomsky (1957) was able to show that this kind of theory can be “empirical”, by showing that his theory, though it posited “unseen entities” like a grammatical faculty, could predict syntactic “judgements”, which are a kind of data.

In Chomsky’s review of ***************Verbal Behavior***************, he made a variety of arguments, e.g., about the “poverty of the stimulus”.

But, from our point of view, the implicit argument in Chomsky’s review is that it is *********theories********* that make predictions, not individual statements.

Chomsky didn’t worry that **************each statement************** on its own needed to be “verifiable”.

Thus, Chomsky’s expanded notion of what “science is” was what made progress possible.

**Summary**

From our perspective, the main contributions Chomsky made in 1957 were:

- use of computer science
    - Chomsky introduced the use computer science to empirically model natural language
    - this turned out to *very* effective
- expanded notion “science”
    - Chomsky was able to realize that Skinner’s restriction on the enterprise of psychology were too strict

### The Empirical Failure of Chomsky’s Post-1957 Linguistics

**Post-1957 Stagnation**

Chomsky’s 1957 ********************Syntactic Structures******************** was so important it was called a “Copernican revolution” in linguistics (Putnam).

Chomsky was considered the leading linguist in the field until at least the 1990’s, and some still think he is the leader in the field.

Ironically, however, in retrospect, 1957 could actually be ****last**** year that Noam Chomsky actually did any work in the field of linguistics that has had any practical use, judged in terms of history.

His subsequent ideas of “transformational grammar” and “principles and parameters” have found no use that we can see in 2023.

**Principles and Parameters**

We can generously (to Chomksy) date to 1981 (the roots are presumably later) the proposal of Chomsky’s “principles and parameters” thesis:

- Chomsky’s “principles and parameters” thesis
    - all human languages vary only a finite number of finitely-valued parameters

Chomsky and his disciples have pursued this thesis, under various guises, including rebranding to “the minimalist program”, over the last 42 years, at least.

Chomsky’s “finite number of finitely valued parameters” have never, it seems, been found.

Many linguists predicted language could not work this “finite principles and parameters” way already in the 1980’s.

But, Chomsky plowed ahead anyway, and seems to still never have acknowledge clearly the failure of this theory.

This is a potentially vivid example of the powerful role that ego plays in science, with the power to cause stagnation among large sections of the academic system.

### The Rise of Artificial Intelligence for Language Processing

It has been apparent to many linguists since the at least the 1980’s that Chomsky’s program of searching for a “finite number of finitely valued parameters” would never succeed.

**Computationally Realistic Symbolic Grammars**

More realistic, and more practically computational, alternatives to Chomsky’s “principles and parameters” theory were developed already in the 1980’s, especially, HPSG (Pollard and Sag), CCG (Steedman) and TAG (Joshi).

However, for practical purposes in natural language processing engineering, it has turned out that “attention is all you need”, and products like ChatGPT do not make use of traditional symbolic grammars at all, though the use of symbolic grammars be still be useful for understanding human language, or further developing A. I.

**Statistical Natural Language Processing**

The 1980’s and 1990’s also saw the dawn of computational linguistics, where many researchers like Jelinek, Charniak, Collins and Pereira showed that one could successfully apply machine learning and artificial intelligence to the problem of **natural language processing**.

The early 2000’s brought a renewed interest in neural networks for artificial intelligence, at last, realizing the potential in work by Hinton, LeCunn and Bengio.

Neural networks can find not only find patterns in the data, but can even find their own *representations* of data, that their programmers didn’t even anticipate.

The combination of neural networks with computational linguistics has resulted in powerful applications like ChatGPT.

In contrast to Chomsky’s “principles and parameters” theory which has, as far as we are aware, no serious implementation, ChatGPT can both understand ****************and generate**************** human language.

By any objective metric, the artificial intelligence-driven approach to language is infinitely more successful in modeling both the production and the interpretation of language, than anything Chomsky has ever produced.

But, as a Professor Emeritus, no one can literally *****force***** Chomsky to recognize this.

### Chomsky Refuses to Recognize Scientific Importance of Artificial Intelligence

Chomsky has always insisted that statistical approaches to natural language processing are, at most, “technology”, and are not relevant to the study of human language, according to how Chomsky defines the field.

In his March 8, 2023 paper in The New York Times, entitled **Noam Chomsky: The False Promise of ChatGPT**, Chomsky is on his usual tip:

> Roughly speaking, [GPT models] take huge amounts of data, search for patterns in it and become increasingly proficient at generating statistically probable outputs — such as seemingly humanlike language and thought.
> 

This is meant as a *********criticism*********.

But, to anyone who ******isn’t****** Noam Chomsky, the goal of cognitive science would seem to be *precisely* to understand exactly how the *****brain***** can “take huge amounts of data”, then “search for patterns in it” and generate “statistically probable outputs” like “human language and thought”.

If ChatGPT can do these things, then that would seem to be pretty relevant indeed to the study of cognition!

Chomsky goes on to make it explicit:

> The human mind is not, like ChatGPT and its ilk, a lumbering statistical engine for pattern matching, gorging on hundreds of terabytes of data and extrapolating the most likely conversational response or most probable answer to a scientific question.
> 

This passage is an excellent illustration of the error in Chomsky’s thinking:

- Chomsky’s bizarre thesis
    - the human mind *is not* a (“lumbering”?) statistical engine
        - note: it is not clear what the empirical meaning of “lumbering” is

Virtually all data from cognitive science, artificial intelligence, natural language processing, and common everyday experience itself refute this bizarre thesis:

- the brain is precisely a statistical engine

### Lessons for the Future from the Story of Noam Chomsky

**The Role of the Philosophy of Science**

Chomsky’s intellectual victory over Skinner’s behaviorism was as a result largely, we have posited, of the fact that Chomsky realized that Skinner’s assumptions about “what constitutes science” were overly strict.

The ongoing intellectual triumph of artificial intelligence over Chomsky’s pre-statistical ways is, in some sense, the reverse story.

Now, it is Chomsky who is imposing unnecessary constraints on what “constitutes linguistics”, and thus he has been able to make any empirical progress for 40, if not 60, years.

Thus, we see that the question of “what constitutes science” or “what constitutes cognitive science” can actually be quite controversial, even among leaders in the field, even up until the present day.

And, it is the group that is less encumbered by unnecessary constraints that, in each case, was able to make progress.

With respect to the question of intelligent design, we will thus closely scrutinize exactly what is necessary in order to “do science”, and wonder if the constraints some atheist philosophers propose are actually necessary.

**The Possibility of Lengthy Stagnation in Science**

The story of Noam Chomsky shows the potential for a field, e.g., post-1957 Chomskyan linguistics, to totally stagnate for a period of even 40-60 years.

It seems we will never find Chomsky’s “finite number of finitely valued parameters”, but Chomsky has never explicitly admitted this fact, and neither have many in the generations of grad students he has produced in his line.

This is despite the fact that, not only has “principles and parameters” failed to materialize as a falsifiable theory, but a competing theory (artificial intelligence demonstrated by ChatGPT) is succeeding wildly, and drawing massive financial investment and talent inflow as well.

If Chomsky’s followers admit the “principles and parameters” theory must be abandoned, in favor of an artificial intelligence-based approach, they must now accommodate the fact that their primary skill (that they know “principles and parameters”) is not marketable once the theory has officially collapsed.

This is a general problem in science: what to do with scientists who have specialized in a field that has collapsed?

And, if it is true that Chomsky has not contributed to linguistics post-1957, the realization of this fact must be a difficult one to accept for Chomsky, who has invested so much of his time and effort since then.

**Science Proceeds (if it Must) “One Funeral at a Time”**

Max Planck is attributed (perhaps apocryphally) to have (darkly) said:

> A new scientific truth does not triumph by convincing its opponents and making them see the light, but rather because its opponents eventually die, and a new generation grows up that is familiar with it.
> 

The story of Noam Chomsky versus statistical methods (artificial intelligence) for cognitive science seems to be a data point in favor of Planck’s macabre but apparently evergreen aphorism.

Skinner apparently went to his grave without ever admitting that Chomsky was right to use the methods of computer science to empirical study of cognition.

Chomsky may well go this grave without acknowledging the statistical contribution artificial intelligence theorists have made to the empirical study of cognition.

But, in the end, the science train keeps rolling down the track, one funeral at a time, if need be.

### Computer Science and the Modeling of Intelligence

In conclusion, the seven theses that we will review below adopt the following framework for the role of computer science for the modeling of intelligence:

- intelligence is to be modeled using the tools of computer science
    - there is an abstract model of computation called the **************Turing Machine************** (Turing, 1936)
        - according to the Church-Turing thesis, all conceivable universal computers are equivalent to the Turing Machine
        - the Turing Machine can be used to model ***any*** intelligence, including
            - human intelligence
            - artificial intelligence
            - the potential “intelligence” which designed humans and our universe
- intelligence crucially involves *statistical* reasoning, modeling and information
    - this is closely related to Shannon’s (1948) definition of ***********information*********** in statistical terms
    - artificial intelligence models, especially ***************neural networks*************** (Hinton, LeCunn, Bengio), are relevant for the study of intelligence

# “Optimal” Science and the Minimum Description Length Principle

## Summary

We reiterate the, already existing, proposal that “minimal description length”, in the sense of Kolmogorov or Solomonoff is the optimal principle for doing science.

Though this idea dates back to the 1950’s, the fact that it is evidently not universally accepted, or even widely known, outside of the field of computer science, necessitates to us that it be restated.

This formal understanding of the “optimal theory for a data set” will be the basis for several original results that follow.

## Details

### Informal Understanding of Science

The intuitive principles of the *****************scientific method***************** are well-known and agreed upon, especially:

- an emphasis on empirical observations
- the formulation of models that make predictions
- the testing of models using experiments
- openness to changing one’s mind
- parsimony in explanations (Ockham’s razor)

However, it is the formalization of these principles that, we believe, is not universally known, or agreed upon.

In order to formalize the notion of science, one would have to be able to explain as an algorithm how to do science.

But, the vast majority of people who practice science are able to practice science perfectly well, without being able to give an algorithm for doing so.

For example, in 1977 the philosopher Hilary Putnam expressed in a televised interview to Bryan Magee his surprise at the fact that there was then no known way of expressing the scientific method as an algorithm, but that the vast majority of scientists were not bothered or encumbered by this at all.

### Ockham’s Razor as an Informal Principle

William of Ockham, the 14th century English philosopher is credited with the following principle for comparing theories:

- Ockham’s razor
    - among theories which *fit the facts equally*, prefer the ********simplest******** theory

However, there have always been two major difficulties in implementing this principle precisely:

1. How does one measure the “simplicity” of a theory?
2. What do we do when one theory is less simple but fits more facts?
    - In other words, how does one assess the benefit of fitting more facts, versus the cost of a more complex theory?

### Minimum Description Length

The principle of *******minimum description length******* dates back to the work of Kolmogorov (1965) and Solomonoff (1964).

It says that the optimal theory is the one which compresses the data the most.

In particular, the ***description length*** of a data set is broken into two parts:

- model size
    - the amount of “disk space” needed to encode the model
        - a larger model will need more space, and be penalized for this
- data compression
    - the amount of “disk space” needed to encode the data, given the model
        - a better model will need less space, and be rewarded for this

This provides a principled way to trade off model complexity versus data fit.

The optimal theory is:

- optimal theory using MDL
    - the theory that uses the least total space to encode the data

### Intuitive Optimality of Minimum Description Length

The concept of Kolmogorov complexity is one of the central concepts in computer science.

Intuitively, as scientists, we feel better knowing that we can reuse core concepts.

In the words of Richard Feynman:

> Nature uses only the longest threads to weave her patterns, so that each small piece of her fabric reveals the organization of the entire tapestry.
> 

The idea that we prefer to reuse concepts is actually just a statement Ockham’s razor, in a different guise.

### Formal Optimality of Minimum Description Length

Minimum description length can be shown to be optimal under the following conditions (see *Advances in Minimum Description Length*, Grünwald et al., 2005.):

1. The true data generating process is included in the set of models under consideration.
2. The set of models under consideration is finite.
3. The data are generated independently and identically according to the true data generating process.

The “no free lunch theorem” in machine learning (Wolpert and Macready, 1997) says that, without making assumptions, we cannot predict the future.

Thus, it seems that the assumptions required by minimum description length optimality proof are as good as can be expected.

## Conclusion

The minimum description length principle allows us to formalize Ockham’s razor, and many other informal aspects of the “philosophy of science”.

We stress that we are ****not**** the first to propose the minimum description length principle.

The novelty is in the *use* of this framework to assess the complexity of the “God hypothesis”.

Not only is “minimum description length” the only provably optimal framework for general theorizing, it is, as far as we are aware, the only practical formal framework for measuring theory complexity that exists at all.

We therefore adopt in the following theses:

- principle
    - the minimum description length criterion is the optimal one for measuring theory complexity in science

# The Finite Complexity of the God Hypothesis

## Summary

Using the minimum description length framework, we show that the “God hypothesis” has finite complexity, on three different levels.

## Details

### Complexity of a Theoretical Entity

The cost of a theory using minimum description length is just the sum of:

- the space needed to express that theory
- the space needed to encode the mis-predictions of the theory

As such, the important point to understand is:

- important note about theory cost
    - the cost of a theoretical entity is a function of the *************theory itself*************
    - the cost of a theoretical entity ******is not****** inherent in the complexity of the object itself

### Even Ordinary Objects have Infinite Complexity

Atheist philosophers will object that the theory of God is “infinitely” complex, because God must be infinitely complex.

However, using the minimum description length framework, we see that we are not penalized for the infinite complexity that an object has, but only for the complexity of the ******theory****** we use to model this object.

Thus, we are not penalized for the complexity of “God” the object, only for the complexity of ***********our theory*********** of God.

But, even ordinary objects have “infinite” complexity, in principle.

Aldous Huxley famously wrote:

> If the doors of perception were cleansed everything would appear to man as it is, infinite.
> 

In other words, even ordinary objects have “infinite complexity”, but this does not preclude us from dealing with them using finite theories.

Atheist philosophers like Dawkins, Harris, Dennett or Hitchens—typically not the strongest on the topic of mathematical understanding—have not proposed any other rigorous theory to measure the size of theories, as far as we are aware.

### Finite Complexity of the God Hypothesis

**Argument From Intelligent Design**

The argument made by intelligent designers like Meyer, Lennox or Behe would be of the following form:

- argument for intelligence from DNA
    - we see that human DNA is “non-trivial” computer code
    - only an “intelligence” can create non-trivial computer code
        - in particular, computer code cannot evolve randomly
    - therefore, human DNA must be a result of an intelligent cause

**A Minimal Theory of an Intelligence**

The complexity of the God (intelligent design) hypothesis is only the amount of space needed to write the theory:

- an intelligent being can write computer code
- DNA is computer code
- therefore, an intelligent being can be the cause of computer code

Measured in ASCII, this little theory takes up 132 bytes on disk.

This is not only finite but very small!

Of course, we would use a different coding scheme that ASCII for both theoretical and technical reasons.

But, the point remains, this “minimal” theory of intelligence is very small.

**A Theory to Recognize Intelligence**

It may seem like a trick to use the word “intelligence” without actually specifying how to *********recognize********* intelligence.

This would be an interesting point to philosophize on.

But, it is no problem, because we can, in practice, also specify a *****************recognition***************** function for intelligence.

Throughout human history, all humans have been able to recognize other intelligent objects, even though, except for perhaps a handful of scientists and only recently, no one would claim to “understand” intelligence.

Assuming humans are finitely specifiable programs, which a materialistic viewpoint would necessitate, humans prove we can give a finite specification of a theory to “recognize” intelligence.

****************************A Theory to Explain Intelligence****************************

Though it is in no way necessary to do so for the purposes of intelligent design, it is even possible to *******explain******* intelligence with a finite theory.

That is, to the extent that ChatGPT “is intelligent”, we can say that artificial intelligence theory in 2023 gives us the ability to “explain intelligence” to a high degree.

Since ChatGPT takes up finite disk space, this is a finite theory as well.

This is arguably limited by the fact that ChatGPT is not yet “full intelligence”.

However, current trends suggest that “full intelligence” will eventually be achieved by these methods.

This will imply that even “full intelligence” has a finite description.

## Conclusion

Though atheist philosophers fret about the “infinite complexity” of the God hypothesis, we have seen three different levels at which the theory of intelligence is actually finite.

# Infinite Universes is More Complex than the God Hypothesis

## Summary

Using the framework for measuring theory complexity implied by “minimum description length”, we can show that the “infinite universes hypothesis” commonly invoked by atheists to explain the “finely tuned” universe, if modeled faithfully, is mathematically more expensive (in fact infinitely more expensive) than the “God hypothesis”.

## Details

### The Problem for Atheism of the Finely-Tuned Universe

Empirical investigation into physics, chemistry and biology has shown that the “parameters” of the physical universe are “finely-tuned” in the sense that, if the constants were changed slightly, the universe would not support life (Barrow, Davies, Rees).

In fact, if one were sampling physical parameters entirely at random, the chance of selecting a universe that would support life is, in ordinary parlance, effectively zero.

Assuming this is true, it is in conflict with the “atheist hypothesis”, which states that life can plausibly evolve “randomly”.

### Atheist Resort to “Infinite Universes”

The atheist response to the finely tuned universe is that there must be “infinite” universes, each running with their own set of physical parameters.

We just happen to be in one universe that supports life, so we are here to investigate our own origins.

In infinitely many other universes, there is no life, so no one to wonder.

Informally, this is an obvious violation of Ockham’s razor, because these extra universes don’t do anything except for save the atheist theory from a misprediction.

However, using the minimum description length framework, we can quantify the size of this “infinite universes” hypothesis, and see that it is infinite, which makes it strictly worse than the finitely specifiable God hypothesis.

### A Simulation Model of the Universe

We note that there are believed to be a finite number of atoms in the universe, each containing a finite number of indivisible particles.

Thus, the total number of indivisible particles in the universe is finite.

Assuming each particle can be represented by a vector of real values with finite dimension, the entire state of the universe can be represented on a Turing machine (a computer) with finite storage space.

There is believed to be a “smallest possible unit of time”, called *****Planck time***** (1899), smaller than which it is not possible to detect.

If this is true, then it is possible to represent the universe as a state machine in the following way:

- the universe as a state machine
    - state
        - the state is a finite number of real-valued (or perhaps just scalar-valued) parameters
    - discrete state transitions
        - the state transitions can take at discrete intervals, at a frequency that makes it impossible for observers inside the universe to notice that there are discrete updates going on

Thus, we can represent the universe with a finite amount of space.

Thus, the minimal description length description of the entire universe could, in principle, be given with a finite size.

We also saw in the last thesis that the “God hypothesis” has finite size.

### Infinite Universes is a Infinitely Complex

A theory that posits an “infinite number” of universes might seem ***********intuitively*********** simpler than the God hypothesis, because it is, one could argue, made up of “simpler parts”.

But, this analysis is based on a *vague* notion of how to assess theories.

We have agreed that minimum description length is the optimal theory for describing science.

We saw that we can represent one universe with finite space.

But, an infinite number of universes, modeled in the same way, would require infinite space.

In general, $K$ universes will take $K$ times more space than one universe to represent.

Comparing the infinite size of the infinite universes hypothesis to the finitely-sized God hypothesis, we can say that from the perspective of minimum description length:

- the theory of infinite universes is more complex (infinitely more complex) than the God hypothesis

## Conclusion

From the perspective of minimum description length, the theory of “infinite universes” requires infinite theory complexity.

Since an intelligent designer has already been shown to have finite complexity, therefore the infinite universes theory is mathematically a more complex theory, according to minimum description length, than the intelligent design theory.

# Predicting the Future with Intelligent Design

## Summary

One of the most serious criticisms of intelligent design is the idea that intelligent design is “not a science” because intelligent design theory “does not predict the future”.

To remedy this, we present a forward-looking, *falsifiable* prediction (i.e., a prediction about the future that could be wrong) based on intelligent design.

## Background

### Predicting the Future versus Modeling the Past

Predicting the future and modeling the past are two fundamentally different things.

The no free lunch theorem suggests that, without assumptions about the predicability of the future, we can’t predict anything.

History is an example of an example of an empirical inquiry where it is not always possible to use the future to resolve theories.

However, for all laws in the “hard sciences”, it is possible to predict the future, and thus we consider it a requirement for intelligent design to predict the future in order to be a “fully fledged” theory.

### Past Work on Predictions of Intelligent Design

**Fitting Past Data**

Most aspects of the “intelligent design” hypothesis effectively fit past data (e.g. Meyer *********Darwin’s Doubt*********), including:

- the ability for new fossil forms to arise suddenly
    - this is what we find in the fossil record (Gould, 1995)
- the ability for new animal forms to arise with arbitrary complexity
    - DNA code seems to jump in large gaps (Behe, 1996)
- the ability for irreducibly complex structures to arise (Behe, 1996)
    - e.g., that a machine of separate parts including A and B can arise, when neither of A or B on its own would confer an advantage

**Future Data**

Meyer has suggested the following “prediction” of intelligent design:

- junk DNA is expected to be “small or zero”

That is, non-coding portions of DNA were presumed by Darwinists to be “junk”, and Darwinian theory basically requires that “lots” of DNA be junk, or the probabilistic pressure on random mutation to find human DNA would be too great.

It has turned out that “less DNA is junk” than was previously thought, and Meyer says this is a correct prediction of intelligent design.

Perhaps it is more accurate to say that the junk DNA data point is more of a *misprediction* of Darwinism than a positive prediction of intelligent design.

While the junk DNA argument may certainly add value in the long run, there are two drawbacks to this prediction:

- not easy to say how much DNA is junk
    - individual nucleotides are not labeled in nature as “junk”
    - it will presumably be along time before every least nucleotide can be proven to have or not have a function
        - and this is kind of question where debate can be made to drag out
- not an exact prediction
    - we intuitively think that “God” would leave less DNA than Darwin’s random process
    - but, how much “junk” DNA would we expect God to leave?
        - this is not clear from any theory, so this prediction is somewhat fuzzy

## Novel Theory

### Past Data

**Stasis**

The pattern in the fossil record is *******lengthy periods of stasis******* (Gould, Meyer).

This also commonly accepted to the pattern in human history.

That is, humans have not evolved throughout human history.

Neither have we observed, we believe, any truly “meaningful” macro-evolution in any other animals.

**DNA Complexity**

DNA’s “complexity”, measured in bits, would be between 6 to 7 billion bits.

The age of the Earth is meant to be around 4.5 billion years old.

A time frame on the order of tens of billions of years is nowhere near enough time to “create” 7 billion bits worth of functional computer software.

The Darwinian hope is that most of this DNA will turn out to be “junk”, and only a very small part of DNA might be “meaningful”.

The realization that DNA was 7 billion bits of non-reducible information would surely sink the Darwinian process outright, because the odds that one can evolve this complex of a program is, in ordinary parlance, effectively zero.

In any case, the apparent high complexity of DNA puts a heavy pressure on both the random and gradual aspects of Darwin’s theory.

### ****************************************Darwin’s Random and/or Gradual Evolution****************************************

There are two standard, but in some ways separable, dimensions of modern interpretations a Darwinian thesis:

- undirected, or random
    - the “undirected” aspect of evolution has created humans and other species “by accident”
        - this would be equivalent to choosing mutations “at random”
- gradual
    - evolution takes place in very small increments
    - if this isn’t happening randomly, then it would probably be happening according to small accumulations in the normal course of genetic variation that we see, otherwise called “micro-evolution”

### Intelligent Design

Intelligent Design solves the problems of the the origin of life by postulating that the source of the computer code of DNA is actually created by an “intelligent cause”.

By “intelligent cause”, we mean “something that is able to write computer code”.

Since the intelligence can write computer code, we are not surprised to find the computer code that is DNA.

One problem for “intelligent design” is explaining who this “intelligence” is.

Two options for the intelligent source of Earth life are:

- aliens
- God

This paper is agnostic as to which it is: we are only considering the hypothesis that the source of life is “intelligent” in an abstract sense.

### A Falsifiable Future Prediction of Intelligent Design

Consider the following prediction of stasis *********into the future*********:

- theory based on intelligent design
    - suppose that the only way for a “new form of life” to arise is an “intelligent cause”
    - then, in the future, we expect that any newly observed “new form of life” will have an “intelligent cause”

In other words:

- we do *****not***** expect to see species just randomly “evolving” into different species, as Darwin’s theory predicts
    - that is, we do not expect to see new genes with N different co-ordinated nucleotide changes, where N is non-trivially large, e.g., 12, or even 1000
- we **do** expect that, if there is a “meaningfully new species”, this will have an intelligent cause, either:
    - humans
    - God
    - aliens

In other words, this theory can be falsified if:

- falsifiability criterion
    - if we observe any new “new form of life” that has arisen from a non-intelligent (i.e., random, or undirected) cause (i.e., Darwinian ************macro-species************ evolution)

**Past Performance**

Suppose we had made this prediction at any time in the last 2500 years.

We have assumed that there has been no “meaningful” macro-evolution of any species in this time.

Thus, on a historical basis, the prediction of intelligent design ******is not****** falsified.

In contrast, Darwin’s predictions that we will always have “new species” arising has never actually been observed, and so has been false every year for the last 2500 years.

## Conclusion

We have provided a prediction of intelligent design that can be investigated in the future.

On a historical basis, the theory of intelligent design is a better fit for the data than the Darwinian hypothesis.

# The Mathematical Relationship Between Truth and God

## Summary

It is intuitively grasped that there is a relationship between God and the concept of objective Truth, and that, e.g., the post-modernist rejection of Truth is somehow a consequence of their rejecting God.

We use a mathematical result due to Gödel and Tarski to show that this relationship between Truth and God is true in a precise sense.

## Details

### Intuitive Relationship Between God and Truth

Intuitively, if there is to be a “true” answer to an empirical question, the only person that could actually know the answer for sure is an omniscient being.

If we assume that all “mortal” beings in “our ordinary experience” are fallible, and infallibility is required in order to accurately judge what is “true”, then we can, intuitively, only have a notion of “truth” if someone in the system if infallible.

However, this kind of thinking relies on an informal notion of truth, and we propose to strengthen this using the mathematical notion of “truth”.

### The Wait for a Theory of Truth

The concept of “truth” is extremely slippery, from both an empirical and a mathematical perspective.

Words like “truth”, “lies”, “correct”, “incorrect” are ones that are used by all people from all walks of life, without requiring any technical background.

But, to give a non-trivial and non-vacuous accounts of these words took until the 20th century, relatively late compared to, e.g., physics.

### Tarski’s Definition of Truth

The modern definition of “truth” that is used in mathematics is the notion of *********************model-theoretic truth********************* proposed by Tarski, based on Gödel.

This theory says that, in order to evaluate the “truth” of a statement $s$, we need:

- some other language, called a *************meta-language*************, in which to interpret $s$
- a ********universe******** U of objects onto which to map the objects and predicates named in $s$
- a way to map sentences from the language that contains $s$ to be a sentence about the universe U

### Mathematical Requirement for God to have Truth

Thus, we cannot, according to this view of “truth” speak of truth coherently unless there is a being who:

- has its own language
- has its own universe of objects which is the “real universe”
- has a way to map human utterances into its own language, about its own universe

Thus, we have a formal notion of what kind of “omniscience” would be required in order to speak of “truth”.

This is what would colloquially be referred to as “God”.

## Conclusion

Based on the work of Tarski, we have given precise properties that omniscience would need to have in order to define the concept of “truth”.

# A Scientific Theory of the Meaning of Natural Language

## Summary

Concepts get their meanings from:

- their relation to other concepts
- their relations to the sensory functions
- the mechanism of inference

A scientific theory of “meaning” can be built on these concepts, without reference to the relation between a word and the “outside world”, with applications to cognitive science and artificial intelligence.

## Details

### Background on Meaning

**************************************Semantic Primitives**************************************

Katz and Fodor (1963) proposed that words can be made out of semantic primitives.

But, no such primitives have been found to exist.

Indeed it would seem that no set of primitives available to humans before the start of human history could explain the new, exotic, and manifold concepts available in the information age.

**Meaning in the Outside World**

Another approach to meaning that we feel is not fruitful, from a scientific perspective, is to refer to the “meaning” of an object as being that set of things which the word names “in the outside world”.

That is the meaning of the word *****apple***** is the actual “set of all apples” in the “real world”.

This has been a popular view among philosophers such as Russell, Frege and Montague.

But, the problem is that only omniscience can know the extension of a word in the “real world”, and we cannot appeal to omniscience in testing or applying a theory, and so this is not a scientific way of thinking about language.

**Holistic Theories of Meaning**

Holistic theories of meaning are ones in which the “meaning” of a word comes from its relationships to other words.

This, some philosophers worry, could cause an “infinite regress” in the definition, because A gets its meaning from B, and B gets its meaning from A.

The challenge is to show how a “holistic” theory of meaning can be coherent.

### A Computational Theory of Meaning

A theory of meaning can be given in purely computational terms as follows:

- concepts
    - concepts are like nodes in (graph theoretic) graph
    - concepts are related to one another
    - concepts are ***************linguistic*************** in nature
- senses organs
    - sense organs label sensory input with the linguistic concepts that are sensed
- inference
    - some kind of logical or inferential system allows a certain set of “true” or “active” propositions to active additional “true” or “active” propositions
    - this can be thought of in logical terms, or in neural network terms

For example, suppose we have a theory T and a sentence s.

The assumption of T along with s allows for more inferences than assuming T alone.

We can then say that the “meaning” of s, relative to T, is the **********difference********** in what is implied by knowing T and s, compared to knowing T alone.

## Conclusion

We propose that this theory has the following applications:

- allows us to settle long-running philosophical confusions around the word *******meaning*******
- gives a model for doing cognitive science
- gives a model for understanding the behavior of artificial neural networks
    - i.e., interpret them as logical inference

# Quantifying the Amount of Information in a Computer Program

## Summary

One conundrum in the field of intelligent design is how to describe the “information” that is in the “computer program” of DNA, since it is the assumed “high quantity of information” in DNA, that points to an intelligent designer.

Shannon’s “noisy channel model” of entropy is not suitable for measuring the amount of information in a computer program, in this case, because God, if there is one, would presumably not need to work through a noisy channel.

We propose a scheme for measuring the information in a computer program relative to the tests that that computer program passes, i.e., in terms of its ********function********, and discuss an example application of this.

## Details

### Shannon’s Information isn’t for Intelligent Design

Shannon summarized the problem of **************communication************** that he was trying to solve as:

> The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point. Frequently the messages have meaning; that is they refer to or are correlated according to some system with certain physical or conceptual entities. These semantic aspects of communication are irrelevant to the engineering problem.
> 

This is an inappropriate model for the study of “intelligent design” for two reasons:

- no noisy channel
    - the assumption of a God who is “all powerful”, or at least very powerful, and perhaps omniscient, calls into question whether God would really be limited by “communicating” DNA through noisy channel
    - presumably an infinite God could communicate exactly the program that they want
- any given program is arbitrary
    - Shannon explicitly denies the role of “meaning” in the bits that are communicated
    - however, infinitely many computer programs have the same meaning, so it is unclear which program God would have theoretically given for us, and whether this program is the shortest possible

In a sense, Shannon’s model is inappropriate precisely because it ********does not******** model the meaning, whereas in the field of intelligent design we are interested in the complexity of the underlying “message” that it represents.

### Formalizing the Notion “The Meaning of a Program is its Function”

For use in the field of intelligent design, we propose to formalize the following intuition:

- the meaning of a computer program is its behavior

We can formalize this by speaking of, for any program P, the set of all tests that P can pass.

Then:

- definition of the amount of information in a program
    - the amount of information contained in a program P is the smallest program (i.e., with the lowest Kolmogorov complexity) that can pass all the same tests that P can pass

This solves the problems that Shannon’s model has when applied for intelligent design purposes:

- no noisy channel
    - this definition does not depend on a noisy channel
- no unique program
    - we are taking the shortest program, and we know there must be ****some**** shortest program
- quantifies the “meaning”
    - Shannon explicitly said he was ***not*** interested in the actual “meaning” of the underlying comunication
    - we equate the “meaning” with the passing of tests

### On the Application of this Theory

We believe this theory can be very productive.

To apply the theory, one must note that has certain inherent behavior with respect to one-way boundedness:

- no upper limit on the number of tests that a program can empirically pass
    - the full set of tests that a program (e.g., humans) can pass is an empirical question
    - at any given time, we only know a ****part**** of what humans can do
    - thus, at any given time, we can only give a ***********lower bound*********** on the tests that a program (e.g., human) can pass
- now lower limit on the shortest program that can pass a given set of tests
    - Kolmogorov showed that we cannot know exactly what the shortest program that will pass a set of tests is
    - thus, given a set of tests, we can only ever ***********upper bound*********** the shortest program, by providing an example program
        - but, we are free to always look for shorter programs to achieve the same result

### A Method for Practical Application

Here is one application of the framework just given.

We can’t know all of the tests that a human can pass, but we ***can*** know that a human can speak language.

And, we have programs like ChatGPT that, almost, can speak language.

If we assume that ChatGPT is basically the smallest program that can do what ChatGPT does, then we can use the size of the ChatGPT program as a lower bound on the complexity of language use, and thus on amount of information contained in human DNA.

## Conclusion

We have given a way to measure the amount of “information” in human DNA that does not rely on Shannon’s “noisy channel” model.