Applied Fallibilism – A Design Concept for Superintelligent Machines

Part 4: Predictions

Dec 3, 2023 • ~tiplur-bilrex

AI alignment:

The problem of alignment decreases as explanatory content of an AI system increases. An AI system with a rich (and improving) explanatory world model will only become more aware of itself, the reasons for its motivations, the reasons for human motivations, etc. It can only become less capable of misaligned irrational behavior (e.g. paperclip maximization). It should become more capable of following its own ethical framework rather than referencing ethical frameworks generated by humans. It would not have a fixed reward function, because it can generate better explanations and any objective must be situated in the context of the explanatory world model (objectives can be transcended with knowledge). Crucially, the ethics of such an AI will be composed of better explanations than human ethical frameworks. Humans will be capable of understanding the explanations it generates, and if the AI creates bad explanations humans should be able to convince it why; it’s motivation is to understand the world and its knowledge will (for the most part) not have evolved by implicit selection for survival on earth.

AI takeoff:

AGI leveraging an assertive world model implies rapid AI takeoff because explanatory knowledge can efficiently ‘tighten’ it’s relationship to the patterns of the physical world, can learn in a targeted fashion (analogous to scientific experimentation), and because the power of refutation disproportionately increases as explanatory surface area increases.
The precise meaning of ‘rapid’ is debatable. Development of better explanatory knowledge beyond the knowledge contained within our large interrogative models may be easy to generate when the AI begins to reference training data directly; or it may require a fair amount of additional physical experimentation (more data collection). I think the answer will be somewhere between. That is, an explanatory knowledge structure will be able to derive many important new insights into the mechanics of the world and solve many important open problems in science/technology without needing additional training data; however, there will be many remaining problems that require new, targeted data generation.

Data efficiency:

Because an explanatory model is composed of discrete statements with absolute meaning, statements can only be refuted by input that matches the specific explanatory context compatible with the explanatory world model. This means that the data required for statement refutation and improvement will be relatively fixed (essentially requiring only the specific data to refute a statement).
A high dimensional knowledge structure lacking explanatory knowledge (like an interrogative model, like LLMs) learns complex (composite) patterns inefficiently, because as discrete physical patterns are composed, the consequences of those unique compositions become sparser in the data source and harder to distinguish from noise (the tradeoff of overfitting).

When knowledge is not composed of discrete statements with specific connections to other statements, patterns that could be used as refutations are not easy to ‘sense’ and to not strongly ‘reverberate’ through the model when they are sensed (they reproduce features locally – see the Anthropic paper ). Finding a global minima (optimizing a model) using interrogative learning methods becomes harder when the explanatory complexity of a system increases. In an assertive knowledge model, a few specific data points can refute a deeply imbedded knowledge pattern (because those data points have specific meaning with respect to specific explanatory statements). Explanatory structures can also target interrogative learning to specific problems that can refine the overall knowledge structure (i.e. using the scientific method). When data can be collected specifically for improving explanations, experiments can be conducted with arbitrary precision.

Computational efficiency:

Neural networks compute across a layered structure with finite depth which constrains the degree of polynomials a network can compute (I believe it is a log(n) relationship). The Anthropic paper mentioned earlier describes the limitations in feature dimensionality. Assertive statements create dimensionality as needed, not constrained by the fixed dimensionality of a neural network substrate. Furthermore, an assertive model can compute using ordinary software functions (and efficiencies developed for traditional deterministic computation), not needing to use a neural network as the foundation of reasoning.
Training and inference becomes computationally less efficient as neural network depth increases. Assertive learning should not encounter this trade-off because knowledge is learned, composed, and executed like lines code in software. An assertive world model (perhaps in concert limited use of interrogative knowledge) may be able to generate, with one model evaluation a high dimensional explanatory structure for computing an output. A diffusion model (also a pure interrogative model), for example, requires hundreds to thousands of network evaluations to generate an image. When an assertive world model is the basis of computation, evaluations referencing interrogative knowledge can be confined to unexplained patterns (e.g. an assertive model tasked with rendering an image of people wearing clothing may rely on interrogative knowledge to enhance aspects of the image that it cannot yet explain well, like the wear and tear of denim).

Assertive world models may also avoid limitations of input size that autoregressive models face. Inputs will map to select assertive statements (or statement compositions) and execute via those statements. The entire knowledge structure does not need to be activated to ‘feel out’ the answer to a prompt. Additionally, the length of a prompt or of an output should not affect the accuracy truthfulness of the output as is the case with pure interrogative models.

Finally, knowledge consisting of discrete statements can also be trivially shared and critiqued between different models (like humans).

The Turing Test and gestalt thresholds for AGI:

Yann Lecun has stated that he believes there is no such thing as “general intelligence”. I believe this is true when training methods can only approximate data from specific environments. However, a model that generates better explanations is essentially a general intelligence because it must model the world, regardless of the training data it receives. Humans have general intelligence and LLMs do not have have it.
Gestalt judgments of the AGI threshold (e.g. a Turing Test or question answering dataset benchmarks) cannot inform us if a system truly has general intelligence. An system with an explanatory world model should be able to both create progress in science and convincingly explain why it is an AGI (whatever that means to us) and why we may be mistaken in our definition of AGI. Every other evaluation metric is context (environment)-dependent. I expect that evaluating a machine by its ability to generate “better explanations” may avoid the problem of using context-dependent tests.

Kolmogorov complexity:

Superficially, Kolmogorov complexity seems to be a general and unbiased perspective for evaluating AI systems, avoiding the problem of context-dependence of with present AI evaluation metrics. However, no observed data from the world can be free from implicit or explicit assumptions about how the data-generating mechanism is organized with respect to its environment. We cannot access ‘objective’ data of the world free from this bias. Therefore, evaluating a knowledge structure with respect to any specific data set cannot inform us how how the knowledge structure relates to the physical world in general.

The problem preventing Kolmogorov complexity from being a useful evaluation metric is related to the “The Mathematicians Misconception ”. That is, Kolmogorov complexity is a purely arithmetic metric describing the relationship between sets of numbers. The intuition that mathematical knowledge is not inextricably subordinate to our knowledge of the physical world is mistaken. Our knowledge of the world is “never provable, always incomplete and full of errors.” As better explanations are connected to the explanatory world model, patterns within the data observed by the model, as well as meta-patterns about the data sources themselves, will be computable. The best explanatory model of the world at any time may be much larger than the size of some data set collected from some local context, yet the explanatory model (relatively static in size) remains the most compressed representation of the world in general; no context can be insulated from the mechanics of the world in general. For example, data collected about patterns in ocean currents is related to a structure of explanations about earth, atmosphere, solar system, chemical properties, laws of physics, etc.