Today (Dec 3, 2023), prominent AI systems develop high dimensional interrogative knowledge structures from their training environments. The knowledge in auto-regressive generative models (ChatGPT by OpenAI, Llama-2 by FAIR, LaMDA/Bard by Google, Alpaca by Stanford) adjusts to stochastically predict future from past patterns (token prediction). The structure of knowledge in these models is a continuous vector representation in a high-dimensional space. They can approximate any computable function to an arbitrary degree of accuracy [ref
]. Despite the absence of absolute discreteness, features can be identified within this continuous structure. Some of these models (particularly LLMs) can can give an impression of systematic reasoning abilities. However, because formal assertive statements are present within the training sets (and prompts) of LLMs, the observed ability is a kind of approximate mimicry.
Progress via scaling
Many people predict that, given enough parameters, data, and computing power, stochastic gradient descent will suffice for inductive models to acquire the capability of reasoning [ref
]. Indeed, reasoning may be an essential method of optimizing the objective of making accurate sequence predictions. I conjecture that, because of the conflict between training for predictive success and training for explanatory power, LLMs lacking explanatory models cannot learn to reason arbitrarily well.
The recent interpretability paper
from Anthropic, seemed to indicate this limitation within the their models knowledge structure. Anthropic demonstrated that their autoregressive model did not contain a knowledge structure with “compositional representations” of features (i.e. the composability inherent in assertive knowledge). The features they identified often consisted of “local representations” of token-context pairs, replicated many times in different contexts:
Why do we see hundreds of different features for "the" (such as "the" in Physics, as distinct from "the" in mathematics). We also observe this for other common words (e.g. "a", "of"), and for punctuation like periods. These features are not what we expected to find when we set out to investigate one-layer models!
The researchers at Anthropic expected the model to contain “compositional representations”, where features like "the" are independent from a context like “Physics”. They hypothesized that “The underlying transformer is genuinely using a local code (at least in part)” and the reason is that models require many local representations of features in various contexts to produce “sharper” predictions. I believe this is because developing an explanatory world model requires a distinct method of learning (the formation of assertive knowledge, which has distinct properties from interrogative knowledge) that autoregressive models do not natively do.
This is why the researchers at Anthropic found that LLM features are over-represented ; why neurons and features are not discrete, evade disentanglement (forming abstract high dimensional surfaces), why they mostly replicate regardless of the LLM training (the interrogative structure is regressive measurement of the environment, sensitive to patterns in the data and insensitive to internal statement structure), and why despite this some features demonstrate constrained algorithmic relationships to other features . The knowledge in the LLM is an impression of a high dimensional deterministic structure. The formation of the interrogative knowledge structure lacks incentive or means of explaining or understanding the mechanics of the system it learns. It seems like it has explanatory knowledge, but those explanation-like features are just impressions of explanatory structures made by humans; the LLM does not have a reasoning structure for observing the world. Despite demonstrating some reasoning abilities, autoregressive models have limited ability to plan answers and limited ability to produce factual and consistent answers (they hallucinate).
I believe LLMs are the most impressive form of AI to-date because they are trained on language, which contains explicit deductive structure. Although the knowledge in the LLM remains implicit, it can ‘intuit’ explanatory structure. The phenomena is even more evident in the case of code generation, where the autoregressive model is trained on a strictly formal deterministic environment. I think there is a much more efficient path to AGI available to us than further scaling interrogative models.
Bias reduction via multi-modality
Data-generating (measuring) systems interact with their environment via mechanistic patterns; all measurement is theory-laden. The meaning of data from a scale is contingent on patterns of interaction between the weighting mechanism and the environment (how the scale works; whether it is being used on Earth, Jupiter, or under water, etc.). The data generated by humans is contingent the relationship between human perceptual and cognitive ‘biases’, and the physical environment (e.g. how our retina interacts with light in the room, what it means to Texas ER physician when she learns there will be freezing rain this weekend, etc.). Explanatory models can separate the context of the data they receive from the data itself. Unlike interrogative models, they are robust to the specific distortions caused by the system generating their data. Interrogative models do not interpret data through an explanatory structure and cannot distinguish reality from the data set (they lack a reality model to contextualize the data), an interrogative model does not understand the meaning of what it has learned. Moreover, interrogative knowledge (because it is continuous) does not transfer efficiently to different contexts.
Multi-modal interrogative training is an effective technique for decreasing the distortion of data sources on the knowledge structure. Training an interrogative model on multiple data contexts can help a model ‘see’ and approximate patterns of the world that exists between the environments that generated its data. A term used for this phenomena is “inter-subjectivity”; I think of it as analogous to the phenomena of a parallax
. However, because an assertive knowledge structure has fundamentally different properties from an interrogative knowledge structure (as discussed above), I would not expect a coherent unifying explanatory world model to emerge from multi-modal training alone.
For example, the multi-modal generalist agent Gato [5
] from DeepMind, was trained using data including vision and language, robotics, and simulated control tasks. As would be expected from the interrogative-assertive perspective of knowledge, the Gato embeddings from different tasks remained largely segregated. The model can solve problems in many environments but the knowledge is not well integrated. The model was susceptible to confuse tasks tasks when they shared similar observation and actions specifications. The designers of Gato “hypothesize that [a generalist] agent can be obtained through scaling data, compute and model parameters, continually broadening the training distribution while maintaining performance, towards covering any task, behavior and embodiment of interest.” I think this is effect would not manifest in the absence of assertive knowledge structure. I do agree with the Gato team that “natural language can act as a common grounding across otherwise incompatible embodiments, unlocking combinatorial generalization to new behaviors.” I agree because I think natural language serves as an essential intermediary substrate to translate interrogative knowledge patterns into formal assertive statements.
A recent review paper
by Ravid Shwartz-Ziv and Yann Lecun (Meta’s FAIR) seems to support the theory that a conflict exists between predictive performance and descriptive accuracy of interrogative knowledge structures. The performance of an interrogative model in a specific environment relies on the ability of the model to create an impression of the patterns in the environment “the objective for obtaining an optimal representation is to maximize the mutual information between each input and its representation”, yet “we cannot separate relevant and irrelevant information… compressing irrelevant information when the Multiview assumption [essentially asserting that a single general world model must be sufficient for all tasks] does not hold presents one of the most significant challenges in self-supervised learning”. One can compress a multimodal interrogative knowledge into a tighter structure at a cost of losing knowledge (information) relevant to domain-specific decision making. An interrogative knowledge structure, absent an explanatory world model, cannot distinguish learning environments from the world generally. Because explanatory knowledge optimizes toward better explanations, which have the property of reach, the model will model must become more general as it improves. I do not believe that compressing the approximate knowledge in an interrogative model can give it this property.
Deductive die for inductive models
Many improvements to the performance of inductive models have been made by various methods of increasing the deductive capability of the models (see below) while failing to build an assertive knowledge structure. In a way, these methods attempt to “harden” the “soft” interrogative knowledge structures or force outputs through a “hard” deductive structure.
Some examples I have encountered from my brief review include:
- “Hardening” the interrogative model:
- Multi-modal and multiview representation learning (as described above)
- Consistency models
[a distillation technique]
- Joint embedding architectures
- Nested and multi-agent models
- RLHF
- Multi-modal and multiview representation learning (as described above)
- “Hardening” the output:
- “Chain-of-thought” prompting [1
] [2
]
- “Tree of thoughts
” prompting
- Faithful reasoning using LLMs
- “Chain-of-thought” prompting [1
] [2
]
I will not go into further detail of these various approaches in this essay. While these approaches may improve some capabilities of interrogative knowledge structures, I do not expect them to yield AGI because they cannot construct a coherent world model with the properties of assertive knowledge.
Referencing an assertive world model
AI models have been built around assertive structures and have had success in mastering simple worlds with known rules. These models are designed with predefined rules, domain knowledge, or specific instructions programmed into them. They rely on human expertise define decision boundaries or logic. AlphaZero
is an example of such a model, which rapidly achieved superhuman performance at a number of games (the rules of which were available to the model during training). These models were essentially provided an assertive world model and did not have the capability of learning it.
MuZero
(a successor of AlphaZero) and Dreamer v3
are examples of AI systems capable of learning the rules (mechanics) of an environment. They learn and store knowledge of environment mechanics in neural networks. MuZero learns interrogative models which represent the game state, the dynamics of states, and predictions of future position values. These models, while being able to generate distinct models of their environments, still lack an explanatory knowledge structure with the properties described above; therefore, I do not expect these approaches to be able to understand or explain the worlds they learn, to distinguish between their training environment and the world beyond it, to generate statements with the property of reach, to be capable of refuting statements, to reorganize their knowledge structures by incorporating better explanations, etc. MuZero and Dreamer v3 are capable of learning close approximations of the rules of simple worlds. Interestingly, Dreamer v3 uses a dynamics predictor with discrete latent representations (that performed better than continuous representations); which may be because discrete world representations can replicate some properties of assertive knowledge (e.g. reach, composability, absolute predictions), but I am not confident I understand their architecture well enough make this conclusion.
Creating an assertive knowledge structure using an LLM
The project I have found which is most similar to the explanatory world model concept is this
work lead by Subbarao
Kambhampati. Subbarao et al. have demonstrated that GPT-4 can extract a high quality assertive “PDDL
” world model from a natural language description of a physical environment [ref
]. They used the extracted world model to plan actions using an automated deterministic local-search planner. Therefore, the environment and actions are fully observable. They also utilized the LLM to correct errors in the PDDL world model. World model correction was accomplished, in part, using automated model validators; however, their system also required human feedback. Model errors identified by the automated validator and by humans were communicated to the LLM, which translated the feedback into formal PDDL statements).
Subbarao et al. did not attempt to create an explanatory model of the physical world. They constructed assertive knowledge models of narrow environments by providing the LLM with a “detailed description of a specific domain including contextual information about the agent’s tasks and physical constraints due to the specific embodiment of the agent” and “a description of the agent’s action”. Importantly, however, they demonstrated that a high dimensional interrogative knowledge structure can produce assertive knowledge statements, which can be assembled into an assertive world model. This is a basic demonstration of the general process that I believe will be essential for achieving AGI. Creating an explanatory model of the physical world through an automated process of questioning the knowledge already present within an LLM (perhaps using the design principles I have described) may be the next step toward superhuman general intelligence.
A few more comments about the work by Subbarao et al. and how they relate to the ideas described in this text.
- Classical PDDL systems may not have adequate generality for supporting a model of the physical world. I expect the system supporting the model will require both discrete and continuous dynamics, and will benefit from a kind of flexible divided memory database for maintaining information about the state (descriptive assertive knowledge) and mechanics (explanatory assertive knowledge) of the world. A generic programming language may be better suited for kickstarting the model.
- Like Subbarao’s method, I expect that automating creation of an explanatory world model will require detailed instructions for the formal model generation task, including example input and output formats. Outputs will have retrievable lists of explicit supporting statements as well as information about state changes.
- Automated creation of an explanatory world model will require a kind of “VAL”, as in the case of the PDDL model creation. The system will help detect syntax errors and should be able to detect internal logical conflicts.
- Subbarao et al. observed that GPT-4 had difficulty constructing formal models of actions involving spatial reasoning (e.g. when asked to construct the action of “pick up an object from a furniture piece” it failed to generate the precondition that other objects are not stacked on top of the target object), and in rare occasions GPT-4 output contradictory effects. This observation may indicate that optimal action planning will consist of a kind of interrogative intuition learned on top of a dynamic explanatory assertive wold model (somewhat like the Dreamer v3 and MuZero systems), where planning is performed neither through pure interrogative intuition (solving by LLM) nor pure assertive model search, but a combination of both.
- Finally, automated explanatory world model creation should provide an LLM multiple opportunities to learn the statements which represent common emergent patterns of the physical world. That is, the LLM will not only be prompted to create formal statements describing an environment (e.g. a block stacking game) from one perspective (e.g. the perspective of the game environment), it will be prompted from many directions to learn the properties of physical reality shared among many physical environments (e.g. learning statements describing force vectors, friction, momentum etc. in many scenarios responsive to such dynamics), and will be less likely to exclude such statements from it’s world model.
Evolving AGI:
Induction of deduction:
An assertive knowledge structure can be closely approximated with interrogative learning (as I believe is demonstrated by MuZero). An interrogative system can intuitively learn when and where to compute using refined deduction-like features, despite the model not using assertive statements (which are discrete, absolutely constrained, modular, and explicitly observable in the presence of explanatory knowledge). Training in a narrow deterministic environment (e.g. the simple rule-based world of an Atari game) can help an interrogative model develop an effective assertive-like knowledge structure that simulates mechanics of the training environment. Because our physical world seems to also have a structure of deterministic rules instantiated by emergent and fundamental phenomena, LMs may, eventually, organically intuit that modes of thinking which approximate deduction consistently work and that rigid, precise, specifications of phenomena can be computed following formal rules (i.e. they can learn that ‘thinking’ using assertive knowledge structures is effective).
I believe LMs specifically can possibly ‘evolve’ to create a deductive knowledge structure for two reasons. First, their training data (unlike vision of training data for example) consists of the symbols in natural language, which are building blocks for creating formal assertive statements. Second, the natural language data they learn from already contains many examples formally structured knowledge encoded in Turing complete symbolic languages (e.g. some of ordinary language, statements from physical sciences, perhaps all of software). LMs can already demonstrate superficially convincing deductive reasoning abilities. Trained on sufficient quantities of deductive knowledge, a pure interrogative model may closely approximate human reasoning abilities, and, combined with their advantages in computational speed and attention, may surpass human deductive abilities in some domains. Yet, as long as these models can only approximate assertive knowledge we should expect that their deductive abilities will remain bounded and that these systems will continue to hallucinate (not being able to identify logical inconsistencies absolutely). Although present LLMs have quite limited ability to solve complicated or extended reasoning problems [ref
], the possibility remains that they will learn to build explanatory knowledge structures of their own, as humans have. I believe that if we can recognize the different properties of the two categories of knowledge, AGI can be built by intelligent design rather than waiting for it to develop through a process of natural selection.
Building from a deductive seed (and the WolframAlpha project):
Many humans have tried to formalize all of our knowledge. A notable modern example is Stephen Wolfram and his WolframAlpha
project, an answer engine with the mission “to collect and curate all objective data; implement every known model, method and algorithm; and make it possible to compute whatever can be computed about anything… to provide a single source that can be relied on by everyone for definitive answers to factual queries.”
Like humans referring to WolframAlpha software, interrogative AI systems can be furnished with tools (which compute using assertive knowledge structures) to improve their reasoning abilities. Again, I suspect that lacking an explanatory world model of their own (integrated with their interrogative knowledge), that the AI system can independently improve, its knowledge of the world will remain fundamentally intuitive, implicit, and vague. It may have a sense of when to use a calculator, which can generate precise answers, but it will not know precisely why or how these answers were produced or what they mean.
Some tool that contains an assertive knowledge structure, used to augment an LLM may be the ‘seed’ that AGI evolves from (once the LLM learns how to improve the tool by creating better explanatory statements). That seed may be a complex deductive system like WolframAlpha or NVIDIA’s PhysX
, or it may be a single seed statement (as I propose) used to construct its world model from scratch. Once the model can make progress in generating better explanations I expect it will quickly surpass the best human-generated deductive models (whether in the minds of scientists or written by teams like WolframAlpha). An alternative path may be to train an assertive model from scratch, alongside inductive learning (similar the approach used in developing Consistency Models); however, it may be much easier to leverage a large interrogative model as a learning orthotic to more quickly find the assertive statements needed to assemble the world model. One day, a coherent automated knowledge curator will be built (it will mostly build itself) and it will not rely on a committee to decide what it believes (as in the case of the respectable effort of the “world-class team and... top outside experts in countless fields” who have created WolframAlpha).