Artificial Human Intelligence: The Programmers Apprentice Tom Dean - - PDF document

▶

Apr 09, 2023 505 likes •700 views

Artificial Human Intelligence: The Programmers Apprentice Tom Dean and Rishabh Singh Google Research 1 In this presentation, the phrase Artificial Human Intelligence refers to AI systems that employ architectures modeled after the

SLIDE 1

Artificial Human Intelligence: The Programmer’s Apprentice

Tom Dean and Rishabh Singh Google Research

In this presentation, the phrase “Artificial Human Intelligence” refers to AI systems that employ architectures modeled after the human brain, by leveraging ideas from developmental psychology and cognitive neuroscience. We start off with a discussion of the overall project with an emphasis on the underlying cognitive architecture, motivated in large part by the basic natural-language-processing and problem-solving skills required of an assistant in collaborating with a software engineer. Our primary objective is to build an end-to-end system for an individualized personal assistant that focuses on a specific area of expertise, namely software engineering, that learns from experience, works collaboratively with an expert programmer and that provides value from day one.

SLIDE 2

The Programmer’s Apprentice was the name of a project started by Charles Rich and Richard Waters at the MIT AI lab in 1987. The goal of the project was to develop a theory of how expert programmers analyze, synthesize, modify, explain, specify, verify and document programs, and, if possible, implement it. Their research plan was to build prototypes of the apprentice incrementally. Our research plan also involves making incremental steps. However, we will be able to make substantially larger steps by exploiting and contributing to the powerful AI technologies developed during the intervening 30 years with our primary focus on recent advances in applied machine learning and artificial neural networks.

SLIDE 3

Artificial Human Intelligence

Cognitive and systems neuroscience provide clues to engineers interested in exploiting what we know concerning how humans think about and solve

problems. Fundamental to our understanding of

human cognition is the essential tradeoff between fast, highly-parallel, context-sensitive, distributed connectionist-style computations and slow, serial, systematic, combinatorial symbolic computations.

The fields of cognitive and systems neuroscience provide clues to engineers interested in applying what we’ve learned about how humans communicate with one another to solve problems. Fundamental to our understanding of human cognition is the tradeoff between fast, highly-parallel, context-sensitive connectionist-style models and slow, essentially-serial, combinatorial symbolic

models. Human intelligence is considered to be a hybrid of these two

complementary computational strategies.

SLIDE 4

Artificial Human Intelligence

Our goal in developing systems that incorporate characteristics of human intelligence is two fold: humans provide a complete solution that we can use as a basic blueprint and then improve upon, and the resulting AI systems are likely to be well suited to developing assistants that complement and extend human intelligence while operating in a manner comprehensible to our understanding.

The study of human cognitive function at the systems level primarily consists

f human subjects performing cognitive tasks in an fMRI scanner that

measures brain activity reflected in changes associated with blood flow. These studies localize cognitive activity in space and time to construct cognitive models by measuring the correlation between regions of observed brain activity and the steps carried out by subjects in solving problems. In addition to localizing brain activity correlated with cognitive functions, Diffusion Tensor Imaging (DTI) can be used for tractographic reconstructions to infer white matter connections between putative functional regions.

SLIDE 5

Conscious Attention & Short Term Memory

There is a tendency to think the neocortex is the epitome of human cognition. The evolution of the human cerebral cortex — or neocortex — occurred in parallel with substantial enhancements in the cerebellar cortex and many subcortical areas including the basal ganglia, thalamus and hippocampus. In the following, we will be primarily concerned with circuits that involve the cortex, hippocampus and basal ganglia. We begin with a straightforward interpretation of conscious attention developed by Stanislas Dehaene and his colleagues at Collège de France in Paris that provides a basis for short-term memory and executive function in the prefrontal cortex.

SLIDE 6

Global Workspace Model Bottom-up Propagation Global Workspace Activation Feed-forward Connections Thalamo-cortical Column Feed-back Connections

6 Stanislas Dehaene. Consciousness and the Brain: Deciphering How the Brain Codes Our Thoughts. Viking Press, 2014. Figure 27

Global Workspace Theory

In the Global Workspace Theory developed by Bernard Baars and extended by Dehaene, sensory data is initially processed in the primary sensory areas located in posterior cortex, propagates forward and is further processed in increasingly-abstract multi-modal association areas. Even as information flows forward toward the front of the brain, the results of abstract computations performed in the association areas are fed back toward the primary sensory

cortex. This basic pattern of activity is common in all mammals.

SLIDE 7

We have developed a simple architecture that represents the apprentice’s global workspace and incorporates a model of attention that surveys activity throughout somatosensory and motor cortex, identifies the activity relevant to the current focus of attention and then maintains this state of activity so that it can readily be utilized in problem solving. In the case of the apprentice, new information is ingested into the model at the system interface, including dialog in the form of text, visual information in the form of editor screen images, and a collection of programming-related signals originating from a fully-instrumented integrated development environment called FIDE. Single-modality sensory information then feeds into multi-modal association areas to create rich abstract representations. Attentional networks in the prefrontal cortex take as input activations occurring throughout the posterior

cortex. These networks are trained by reinforcement learning to identify areas

worth attending to and the resulting policy selects specific areas to actively maintain in short-term memory. In keeping with a model suggested by Yoshua Bengio, this attentional process is guided by a prior that prefers relatively-low-dimensional abstract thought vectors corresponding information useful for making decisions. While humans can sustain only a few such activations at a time, the apprentice has no such limitations.

SLIDE 8

Episodic Memory & Long Term Memory

It is said that each of us is defined by a collection of episodic memories, a set

f expectations about the future and a single thread of control directed by

conscious attention. Whether or not this is an accurate characterization of a human being, it is our working model for the programmer’s apprentice. Long-term memory storage and retrieval in humans involves most areas of the neocortex plus several subcortical circuits, the most important being the hippocampal formation in the medial temporal lobe of the brain.

SLIDE 9

On the above left you see a cartoon drawing of the hippocampus and related cortical and subcortical areas. The primary components include the entorhinal cortex or EHC, the dentate gyrus or DG and two hippocampal nuclei referred to as CA3 and CA1. The graphic in the bottom center is an anatomically more accurate artistic rendering showing the entorhinal cortex on the top right with green neural processes, the dentate gyrus in the top center with blue processes projecting to CA3 in the hippocampus, from which purple processes project to CA1 and from there back to the entorhinal cortex to complete a recurrent loop essential for storage and retrieval. The block diagram in the upper right summarizes the component circuits, along with their projections and reciprocal connections. In the process of forming new memories, a stimulus corresponding to compressed summaries of neural activity occurring throughout the cortex is projected onto the entorhinal cortex. This composite pattern of activity is sent to the dentate gyrus where it undergoes extreme pattern separation by projecting the high-dimensional composite pattern of activity from the EHC

nto a lower dimensional representation in dentate gyrus that is fowarded to
CA3. CA3 is an auto-associative network that incorporates each new

composite pattern of activity to construct an index that can be used to recover the stored memory from the original stimulus.

SLIDE 10

The auto-associative network performs pattern completion enabling the hippocampus to reconstruct an appropriate index from a partial pattern consisting of a subset of the activity in the original stimulus. During retrieval, the index reconstructed from a stimulus pattern of activity is used to recover the pattern of activity of the original stimulus in CA1 that is projected back to the entorhinal cortex which uses it to reactivate the original pattern of activity in the cortex using its reciprocal connections. The result is not a perfect reconstruction of the original state of the cortex at the time the memory was recorded, but rather a potentially more relevant version of the original state that incorporates information from the current state in which the memory is reconstructed.

SLIDE 11

During the encoding of memory, the EHC uses its reciprocal connections to project the original stimulus back to CA1 in order to train the network linking CA1 and CA3 to accurately reconstruct the original state of the memory from the encoded index. The alignment of the newly encoded index produced by CA3 and the original pattern of activity from EHC enables learning via long-term potentiation. We have provided somewat more technical detail in discussing the episodic memory system in order to make a couple of

bservations about the general architecture of the programmer’s apprentice.

The first observation is that the recurrent process initiated in the EHC and then propagated through the hippocampus before returning to the EHC via CA1 is but one example of a type of recurrent connection that is motivated by unsupervised learning and quite common in the mammalian brain. A second

bservation is that learning in the programmer’s apprentice takes place on

multiple temporal scales, the two illustrated here correspond to (a) the use of what Geoff Hinton calls fast weights to train the CA3 auto-associator network, and (b) the importance of long-term potentiation in enabling the routine adjustment of weights during life-long learning as in the case of training the connections between CA3 and CA1 to reproduce EHC stimuli during memory retrieval.

SLIDE 12

You can think of the episodic memory encoded in the hippocampus and entorhinal cortex as random-access memory, and the actively maintained memories in the prefrontal cortex as the contents of registers in a conventional von Neumann architecture. Since the activated memories have different temporal characteristics and functional relationships with the contents of the global workspace, we implement them as two separate Differentiable Neural Computer (DNC) memory systems each with its own special-purpose

controller. In the DNC paper appearing in Science, the authors point out that

an associative key that only partially matches the content of a memory location can still be used to attend strongly to that location. This enables a form of pattern completion whereby the value recovered by reading the memory location includes information that is not present in the key. In general, key-value retrieval provides a rich mechanism for navigating associative data structures stored in the DNC memory, because the content of one address can effectively encode references to other addresses. The contents of memory consist of thought vectors that can be composed with other thought vectors to shape the global context for interpretation. This slide summarizes the architectural components introduced so far in a single model. Data in the form of text transcriptions of ongoing dialogue, source code and related documentation and output from the integrated development environment are the primary input to the system and are handled

SLIDE 13

by relatively standard neural network models. The Q-network for the attentional reinforcement learning system is realized as a multi-layer convolutional network. The two DNC controllers are straightforward variations

n existing network models with a second controller responsible for

maintaining a priority queue of encodings of relevant past experience retrieved from episodic memory. The nondescript box labeled “motor cortex” includes the machinery for managing dialogue and handling tasks related to programming and code synthesis. We ignore dialogue management in this talk, but Rishabh will provide an overview of our strategy for tackling code synthesis in the second part of this presentation.

SLIDE 14

Action Selection & Procedural Memory

Action selection is fundamental to most everything that the apprentice does both in terms of carrying out specific instructions suggested by the programmer and in performing actions related to editing source code and using its built-in integrated development environment to execute and debug

code. At a deeper level, action selection is required in problem solving of any
sort. Here we consider the most basic machinery for action selection upon

which all of the other tools and modes of problem-solving and executive function depend. This basic machinery relies on a collection of subcortical nuclei called the basal ganglia with reciprocal connections to the cortex mediated by the thalamus and to primitive circuits in the limbic system responsible emotional processing such as the hypothalamus and amygdala. In a 2018 paper in Nature Neuroscience Jane Wang of DeepMind along with Demis Hassabis and Matt Botvinick formulated a new theory of reward-based learning that explains not only the biology of action selection but also the success of the learning-to-learn meta-reasoning model used by AlphaGo.

SLIDE 15

The left panel provides a highly stylized anatomical drawing of the basal

ganglia. The block diagram shown in the right panel depicts the primary

components involved in action selection as functional blocks. The blocks shown in blue represent components in what is called the direct path. The blocks shown in light green with dashed borders represent additional components that contribute to the indirect path. The green arrows shown in the block diagram indicate excitatory connections and the red arrows inhibitory

connections. GP sub “i” refers to the “internal” segment of the Globus Pallidus,

whereas GP sub “e” refers to its “external” segment, and the initials STN refer to the Subthalamic Nucleus. Both segments of Globus Pallidus play inhibitory roles while the Subthalamic Nucleus is primarily excitatory. In the direct path, cortical projections emanating from the sensory cortex enter the striatum which is the name for the combined Putamen and Caudate Nucleus. In what was until recently the prevailing theory for how the basal ganglia manage action selection, the direct path initially inhibits all input from the cortex after which the indirect path plays a countervailing exitatory role that ultimately selects a single action; the combined paths were thought to implement a more or less conventional form of reinforcement learning.

SLIDE 16

However, it now appears there is an additional path called the hyperdirect pathway so named because it receives input directly from the frontal cortex and sends excitatory projections directly to the basial ganglia output, corresponding to the internal segment of the Globus Pallidus, bypassing the striatum altogether. This may seem like a relatively small change but it substantially alters our understanding of action selection and hence what we can expect from upstream cognitive capabilities that rely on this basic enabling functionality for executive control and complex problem solving. In the paper appearing in Nature Neuroscience, the team at DeepMind present a new theory of reward-based learning, in which the dopamine system trains another part of the brain, the prefrontal cortex, to operate as its own free-standing learning system. This new perspective accommodates the findings that motivated the standard model, but also deals gracefully with a wider range of

bservations and suggests that this expanded theory implements a form of

“learning-to-learn” that can be realized by a form of meta-learning.

SLIDE 17

In particular, such meta-learning has been used to combine model-based and model-free learning by exploiting meta-reinforcement learning to choose between using a model to explore alternatives and using what was learned from such exploration in order to select actions. Several teams at DeepMind are using so-called imagination-based agents to augment deep reinforcement learning systems borrowing ideas from cognitive neurosciece by Felipe De Brigard, Demis Hassibis, Eleanor Maguire, Nicole Van Hoeck and others on commonsense counterfactual reasoning. This slide represents one of several approaches we are exploring for representing and reasoning about differentiable computer programs. Structured programs consisting of multiple procedures are emulated using a differentiable memory model such as a Neural Turing Machine that is partitioned to encode static programs in the form of abstract syntax trees and a dynamic run-time call stack to support program execution. Writing and debugging programs would be conducted using a variant of imagination-based planning building on the work of Battaglia, Hamrick, Pascanu, Vinyals, Weber and their colleagues at DeepMind. We realize these are substantial aspirational steps and they are included here primarily as an indication of the potential value of our long-term research plan. In the second part of this presentation, Rishabh will explore some of the more conservative, near-term

bjectives that we have been developing.