[PPT] - Neural Networks Hopfield Nets and Auto Associators Fall 2017 1 PowerPoint Presentation

SLIDE 1

Neural Networks

Hopfield Nets and Auto Associators Fall 2017

1

SLIDE 2

Story so far

Neural networks for computation
All feedforward structures
But what about..

2

SLIDE 3

Loopy network

Each neuron is a perceptron with +1/-1 output
Every neuron receives input from every other neuron
Every neuron outputs signals to every other neuron

𝑧𝑗 = Θ ෍

𝑘≠𝑗

𝑥

𝑘𝑗𝑧𝑘 + 𝑐𝑗

Θ 𝑨 = ቊ+1 𝑗𝑔 𝑨 > 0 −1 𝑗𝑔 𝑨 ≤ 0

The output of a neuron affects the input to the neuron

3

SLIDE 4

Each neuron is a perceptron with +1/-1 output
Every neuron receives input from every other neuron
Every neuron outputs signals to every other neuron

𝑧𝑗 = Θ ෍

𝑘≠𝑗

𝑥

𝑘𝑗𝑧𝑘 + 𝑐𝑗

Θ 𝑨 = ቊ+1 𝑗𝑔 𝑨 > 0 −1 𝑗𝑔 𝑨 ≤ 0

A symmetric network: 𝑥𝑗𝑘 = 𝑥

𝑘𝑗

Loopy network

4

SLIDE 5

Hopfield Net

Each neuron is a perceptron with +1/-1 output
Every neuron receives input from every other neuron
Every neuron outputs signals to every other neuron

𝑧𝑗 = Θ ෍

𝑘≠𝑗

𝑥

𝑘𝑗𝑧𝑘 + 𝑐𝑗

Θ 𝑨 = ቊ+1 𝑗𝑔 𝑨 > 0 −1 𝑗𝑔 𝑨 ≤ 0

A symmetric network: 𝑥𝑗𝑘 = 𝑥

𝑘𝑗

5

SLIDE 6

Loopy network

Each neuron is a perceptron with a +1/-1 output
Every neuron receives input from every other neuron
Every neuron outputs signals to every other neuron

𝑧𝑗 = Θ ෍

𝑘≠𝑗

𝑥

𝑘𝑗𝑧𝑘 + 𝑐𝑗

A neuron “flips” if weighted sum of other neuron’s outputs is of the opposite sign But this may cause

ther neurons to flip!

Θ 𝑨 = ቊ+1 𝑗𝑔 𝑨 > 0 −1 𝑗𝑔 𝑨 ≤ 0

6

SLIDE 7

Loopy network

At each time each neuron receives a “field” σ𝑘≠𝑗 𝑥

𝑘𝑗𝑧𝑘 + 𝑐𝑗

If the sign of the field matches its own sign, it does not

respond

If the sign of the field opposes its own sign, it “flips” to

match the sign of the field

𝑧𝑗 = Θ ෍

𝑘≠𝑗

𝑥

𝑘𝑗𝑧𝑘 + 𝑐𝑗

Θ 𝑨 = ቊ+1 𝑗𝑔 𝑨 > 0 −1 𝑗𝑔 𝑨 ≤ 0

7

SLIDE 8

Example

Red edges are -1, blue edges are +1
Yellow nodes are +1, black nodes are -1

8

SLIDE 9

Example

Red edges are -1, blue edges are +1
Yellow nodes are +1, black nodes are -1

9

SLIDE 10

Example

Red edges are -1, blue edges are +1
Yellow nodes are +1, black nodes are -1

10

SLIDE 11

Example

Red edges are -1, blue edges are +1
Yellow nodes are +1, black nodes are -1

11

SLIDE 12

Loopy network

If the sign of the field at any neuron opposes

its own sign, it “flips” to match the field

– Which will change the field at other nodes

Which may then flip

– Which may cause other neurons including the first one to flip… » And so on…

12

SLIDE 13

20 evolutions of a loopy net

All neurons which do not “align” with the local

field “flip”

𝑧𝑗 = Θ ෍

𝑘≠𝑗

𝑥

𝑘𝑗𝑧𝑘 + 𝑐𝑗

A neuron “flips” if weighted sum of other neuron’s outputs is of the opposite sign But this may cause

ther neurons to flip!

Θ 𝑨 = ቊ+1 𝑗𝑔 𝑨 > 0 −1 𝑗𝑔 𝑨 ≤ 0

13

SLIDE 14

120 evolutions of a loopy net

All neurons which do not “align” with the local

field “flip”

14

SLIDE 15

Loopy network

If the sign of the field at any neuron opposes

its own sign, it “flips” to match the field

– Which will change the field at other nodes

Which may then flip

– Which may cause other neurons including the first one to flip…

Will this behavior continue for ever??

15

SLIDE 16

Loopy network

Let 𝑧𝑗

− be the output of the i-th neuron just before it responds to the

current field

Let 𝑧𝑗

+ be the output of the i-th neuron just after it responds to the current

field

If 𝑧𝑗

− = 𝑡𝑗𝑕𝑜 σ𝑘≠𝑗 𝑥 𝑘𝑗𝑧𝑘 + 𝑐𝑗 , then 𝑧𝑗 + = 𝑧𝑗 −

– If the sign of the field matches its own sign, it does not flip

𝑧𝑗

+

෍

𝑘≠𝑗

𝑥

𝑘𝑗𝑧𝑘 + 𝑐𝑗

− 𝑧𝑗

−

෍

𝑘≠𝑗

𝑥

𝑘𝑗𝑧𝑘 + 𝑐𝑗

= 0

𝑧𝑗 = Θ ෍

𝑘≠𝑗

𝑥

𝑘𝑗𝑧𝑘 + 𝑐𝑗

Θ 𝑨 = ቊ+1 𝑗𝑔 𝑨 > 0 −1 𝑗𝑔 𝑨 ≤ 0

16

SLIDE 17

Loopy network

If 𝑧𝑗

− ≠ 𝑡𝑗𝑕𝑜 σ𝑘≠𝑗 𝑥 𝑘𝑗𝑧𝑘 + 𝑐𝑗 , then 𝑧𝑗 + = −𝑧𝑗 −

𝑧𝑗

+

෍

𝑘≠𝑗

𝑥

𝑘𝑗𝑧𝑘 + 𝑐𝑗

− 𝑧𝑗

−

෍

𝑘≠𝑗

𝑥

𝑘𝑗𝑧𝑘 + 𝑐𝑗

= 2𝑧𝑗

+

෍

𝑘≠𝑗

𝑥

𝑘𝑗𝑧𝑘 + 𝑐𝑗

– This term is always positive!

Every flip of a neuron is guaranteed to locally increase

𝑧𝑗 ෍

𝑘≠𝑗

𝑥

𝑘𝑗𝑧𝑘 + 𝑐𝑗

𝑧𝑗 = Θ ෍

𝑘≠𝑗

𝑥

𝑘𝑗𝑧𝑘 + 𝑐𝑗

Θ 𝑨 = ቊ+1 𝑗𝑔 𝑨 > 0 −1 𝑗𝑔 𝑨 ≤ 0

17

SLIDE 18

Globally

Consider the following sum across all nodes

𝐸 𝑧1, 𝑧2, … , 𝑧𝑂 = ෍

𝑗

𝑧𝑗 ෍

𝑘<𝑗

𝑥

𝑘𝑗𝑧𝑘 + 𝑐𝑗

= ෍

𝑗,𝑘<𝑗

𝑥𝑗𝑘𝑧𝑗𝑧𝑘 + ෍

𝑗

𝑐𝑗𝑧𝑗

– Definition same as earlier, but avoids double counting and assumes 𝑥𝑗𝑗 = 0

For any unit 𝑙 that “flips” because of the local field

∆𝐸 𝑧𝑙 = 𝐸 𝑧1, … , 𝑧𝑙

+, … , 𝑧𝑂 − 𝐸 𝑧1, … , 𝑧𝑙 −, … , 𝑧𝑂

18

SLIDE 19

Upon flipping a single unit

∆𝐸 𝑧𝑙 = 𝐸 𝑧1, … , 𝑧𝑙

+, … , 𝑧𝑂 − 𝐸 𝑧1, … , 𝑧𝑙 −, … , 𝑧𝑂

Expanding

∆𝐸 𝑧𝑙 = 𝑧𝑙

+ − 𝑧𝑙 − ෍ 𝑘≠𝑗

𝑥

𝑘𝑙𝑧𝑘 + 𝑧𝑙 + − 𝑧𝑙 − 𝑐𝑙

– All other terms that do not include 𝑧𝑙 cancel out

This is always positive!
Every flip of a unit results in an increase in 𝐸

19

SLIDE 20

Hopfield Net

Flipping a unit will result in an increase (non-decrease) of

𝐸 = ෍

𝑗,𝑘<𝑗

𝑥𝑗𝑘𝑧𝑗𝑧𝑘 + ෍

𝑗

𝑐𝑗𝑧𝑗

𝐸 is bounded

𝐸𝑛𝑏𝑦 = ෍

𝑗,𝑘<𝑗

𝑥𝑗𝑘 + ෍

𝑗

𝑐𝑗

The minimum increment of 𝐸 in a flip is

∆𝐸𝑛𝑗𝑜= min

𝑗, {𝑧𝑗, 𝑗=1..𝑂} 2 ෍ 𝑘≠𝑗

𝑥

𝑘𝑗𝑧𝑘 + 𝑐𝑗

Any sequence of flips must converge in a finite number of steps

20

SLIDE 21

The Energy of a Hopfield Net

Define the Energy of the network as

𝐹 = − ෍

𝑗,𝑘<𝑗

𝑥𝑗𝑘𝑧𝑗𝑧𝑘 − ෍

𝑗

𝑐𝑗𝑧𝑗

– Just the negative of 𝐸

The evolution of a Hopfield network

constantly decreases its energy

Where did this “energy” concept suddenly sprout

from?

21

SLIDE 22

Analogy: Spin Glasses

Magnetic diploes
Each dipole tries to align itself to the local field

– In doing so it may flip

This will change fields at other dipoles

– Which may flip

Which changes the field at the current dipole…

22

SLIDE 23

Analogy: Spin Glasses

𝑞𝑗 is vector position of 𝑗-th dipole
The field at any dipole is the sum of the field contributions of all other dipoles
The contribution of a dipole to the field at any point falls off inversely with

square of distance

Total field at current dipole: 𝑔 𝑞𝑗 = ෍

𝑘≠𝑗

𝑠𝑦𝑘 𝑞𝑗 − 𝑞𝑘

2 + 𝑐𝑗 intrinsic external

23

SLIDE 24

Analogy: Spin Glasses

A Dipole flips if it is misaligned with the field

in its location

Total field at current dipole: Response of current diplose

𝑦𝑗 = ൝𝑦𝑗 𝑗𝑔 𝑡𝑗𝑕𝑜 𝑦𝑗 𝑔 𝑞𝑗 = 1 −𝑦𝑗 𝑝𝑢ℎ𝑓𝑠𝑥𝑗𝑡𝑓 𝑔 𝑞𝑗 = ෍

𝑘≠𝑗

𝑠𝑦𝑘 𝑞𝑗 − 𝑞𝑘

2 + 𝑐𝑗

24

SLIDE 25

Analogy: Spin Glasses

Total field at current dipole: Response of current diplose

𝑦𝑗 = ൝𝑦𝑗 𝑗𝑔 𝑡𝑗𝑕𝑜 𝑦𝑗 𝑔 𝑞𝑗 = 1 −𝑦𝑗 𝑝𝑢ℎ𝑓𝑠𝑥𝑗𝑡𝑓

Dipoles will keep flipping

– A flipped dipole changes the field at other dipoles

Some of which will flip

– Which will change the field at the current dipole

Which may flip

– Etc..

𝑔 𝑞𝑗 = ෍

𝑘≠𝑗

𝑠𝑦𝑘 𝑞𝑗 − 𝑞𝑘

2 + 𝑐𝑗

25

SLIDE 26

Analogy: Spin Glasses

When will it stop???

Total field at current dipole: Response of current diplose

𝑦𝑗 = ൝𝑦𝑗 𝑗𝑔 𝑡𝑗𝑕𝑜 𝑦𝑗 𝑔 𝑞𝑗 = 1 −𝑦𝑗 𝑝𝑢ℎ𝑓𝑠𝑥𝑗𝑡𝑓

𝑔 𝑞𝑗 = ෍

𝑘≠𝑗

𝑠𝑦𝑘 𝑞𝑗 − 𝑞𝑘

2 + 𝑐𝑗

26

SLIDE 27

Analogy: Spin Glasses

The total potential energy of the system

𝐹 = 𝐷 − 1 2 ෍

𝑗

𝑦𝑗𝑔 𝑞𝑗 = 𝐷 − ෍

𝑗

෍

𝑘>𝑗

𝑠𝑦𝑗𝑦𝑘 𝑞𝑗 − 𝑞𝑘

2 − ෍ 𝑗

𝑐𝑗𝑦𝑘

The system evolves to minimize the PE

– Dipoles stop flipping if any flips result in increase of PE

Total field at current dipole:

𝑔 𝑞𝑗 = ෍

𝑘≠𝑗

𝑠𝑦𝑘 𝑞𝑗 − 𝑞𝑘

2 + 𝑐𝑗

Response of current diplose

𝑦𝑗 = ൝𝑦𝑗 𝑗𝑔 𝑡𝑗𝑕𝑜 𝑦𝑗 𝑔 𝑞𝑗 = 1 −𝑦𝑗 𝑝𝑢ℎ𝑓𝑠𝑥𝑗𝑡𝑓

27

SLIDE 28

Spin Glasses

The system stops at one of its stable configurations

– Where PE is a local minimum

Any small jitter from this stable configuration returns it to the stable

configuration

– I.e. the system remembers its stable state and returns to it

state PE

28

SLIDE 29

Hopfield Network

𝐹 = − ෍

𝑗,𝑘<𝑗

𝑥𝑗𝑘𝑧𝑗𝑧𝑘 − ෍

𝑗

𝑐𝑗𝑧𝑗

This is analogous to the potential energy of a spin glass

– The system will evolve until the energy hits a local minimum 𝑧𝑗 = Θ ෍

𝑘≠𝑗

𝑥

𝑘𝑗𝑧𝑘 + 𝑐𝑗

Θ 𝑨 = ቊ+1 𝑗𝑔 𝑨 > 0 −1 𝑗𝑔 𝑨 ≤ 0

29

SLIDE 30

Hopfield Network

𝐹 = − ෍

𝑗,𝑘<𝑗

𝑥𝑗𝑘𝑧𝑗𝑧𝑘 − ෍

𝑗

𝑐𝑗𝑧𝑗

This is analogous to the potential energy of a spin glass

– The system will evolve until the energy hits a local minimum 𝑧𝑗 = Θ ෍

𝑘≠𝑗

𝑥

𝑘𝑗𝑧𝑘 + 𝑐𝑗

Θ 𝑨 = ቊ+1 𝑗𝑔 𝑨 > 0 −1 𝑗𝑔 𝑨 ≤ 0 Typically will not utilize bias: The bias is similar to having a single extra neuron that is pegged to 1.0 Removing the bias term does not affect the rest of the discussion in any manner But not RIP, we will bring it back later in the discussion

30

SLIDE 31

Hopfield Network

𝐹 = − ෍

𝑗,𝑘<𝑗

𝑥𝑗𝑘𝑧𝑗𝑧𝑘

This is analogous to the potential energy of a spin glass

– The system will evolve until the energy hits a local minimum 𝑧𝑗 = Θ ෍

𝑘≠𝑗

𝑥

𝑘𝑗𝑧𝑘

Θ 𝑨 = ቊ+1 𝑗𝑔 𝑨 > 0 −1 𝑗𝑔 𝑨 ≤ 0

31

SLIDE 32

Evolution

The network will evolve until it arrives at a

local minimum in the energy contour

𝐹 = − ෍

𝑗,𝑘<𝑗

𝑥𝑗𝑘𝑧𝑗𝑧𝑘

state PE 32

SLIDE 33

Content-addressable memory

Each of the minima is a “stored” pattern

– If the network is initialized close to a stored pattern, it will inevitably evolve to the pattern

This is a content addressable memory

– Recall memory content from partial or corrupt values

Also called associative memory

state PE

33

SLIDE 34

Evolution

The network will evolve until it arrives at a

local minimum in the energy contour

Image pilfered from unknown source

𝐹 = − ෍

𝑗,𝑘<𝑗

𝑥𝑗𝑘𝑧𝑗𝑧𝑘

34

SLIDE 35

Evolution

The network will evolve until it arrives at a local minimum in the

energy contour

We proved that every change in the network will result in decrease

in energy

– So path to energy minimum is monotonic

𝐹 = − ෍

𝑗,𝑘<𝑗

𝑥𝑗𝑘𝑧𝑗𝑧𝑘

35

SLIDE 36

Evolution

For threshold activations the energy contour is only

defined on a lattice

– Corners of a unit cube on [-1,1]N

For tanh activations it will be a continuous function

𝐹 = − ෍

𝑗,𝑘<𝑗

𝑥𝑗𝑘𝑧𝑗𝑧𝑘

36

SLIDE 37

Evolution

For threshold activations the energy contour is only

defined on a lattice

– Corners of a unit cube on [-1,1]N

For tanh activations it will be a continuous function

𝐹 = − ෍

𝑗,𝑘<𝑗

𝑥𝑗𝑘𝑧𝑗𝑧𝑘

37

SLIDE 38

Evolution

For threshold activations the energy contour is only

defined on a lattice

– Corners of a unit cube

For tanh activations it will be a continuous function

𝐹 = − 1 2 𝐳𝑈𝐗𝐳

In matrix form Note the 1/2

38

SLIDE 39

“Energy”contour for a 2-neuron net

Two stable states (tanh activation)

– Symmetric, not at corners – Blue arc shows a typical trajectory for sigmoid activation

39

SLIDE 40

“Energy”contour for a 2-neuron net

Two stable states (tanh activation)

– Symmetric, not at corners – Blue arc shows a typical trajectory for sigmoid activation Why symmetric? Because −

1 2 𝐳𝑈𝐗𝐳 = − 1 2 (−𝐳)𝑈𝐗(−𝐳)

If ො 𝐳 is a local minimum, so is −ො 𝐳

40

SLIDE 41

3-neuron net

8 possible states
2 stable states (hard thresholded network)

41

SLIDE 42

Examples: Content addressable memory

http://staff.itee.uq.edu.au/janetw/cmc/chapters/Hopfield/

42

SLIDE 43

Hopfield net examples

43

SLIDE 44

Computational algorithm

Very simple
Updates can be done sequentially, or all at once
Convergence

𝐹 = − ෍

𝑗

෍

𝑘>𝑗

𝑥

𝑘𝑗𝑧𝑘𝑧𝑗

does not change significantly any more

1. Initialize network with initial pattern

𝑧𝑗 0 = 𝑦𝑗, 0 ≤ 𝑗 ≤ 𝑂 − 1

2. Iterate until convergence

𝑧𝑗 𝑢 + 1 = Θ ෍

𝑘≠𝑗

𝑥

𝑘𝑗𝑧𝑘

, 0 ≤ 𝑗 ≤ 𝑂 − 1

44

SLIDE 45

Issues

How do we make the network store a specific

pattern or set of patterns?

How many patterns can we store?

45

SLIDE 46

Issues

How do we make the network store a specific

pattern or set of patterns?

How many patterns can we store?

46

SLIDE 47

How do we remember a specific pattern?

How do we teach a network

to “remember” this image

For an image with 𝑂 pixels we need a network

with 𝑂 neurons

Every neuron connects to every other neuron
Weights are symmetric (not mandatory)
𝑂(𝑂−1)

2

weights in all

47

SLIDE 48

Storing patterns: Training a network

A network that stores pattern 𝑄 also naturally stores – 𝑄

– Symmetry 𝐹(𝑄) = 𝐹(−𝑄) since 𝐹 is a function of yiyj

𝐹 = − ෍

𝑗

෍

𝑘<𝑗

𝑥

𝑘𝑗𝑧𝑘𝑧𝑗

1

1 1 1

1

1

1
1
1

1

48

SLIDE 49

A network can store multiple patterns

Every stable point is a stored pattern
So we could design the net to store multiple patterns

– Remember that every stored pattern 𝑄 is actually two stored patterns, 𝑄 and −𝑄

state PE

1

1
1
1

1 1 1

1

1

1

49

SLIDE 50

Storing a pattern

Design {𝑥𝑗𝑘} such that the energy is a local

minimum at the desired 𝑄 = {𝑧𝑗}

𝐹 = − ෍

𝑗

෍

𝑘<𝑗

𝑥

𝑘𝑗𝑧𝑘𝑧𝑗

1

1
1
1

1 1 1

1

1

1

50

SLIDE 51

Storing specific patterns

Storing 1 pattern: We want

𝑡𝑗𝑕𝑜 ෍

𝑘≠𝑗

𝑥

𝑘𝑗𝑧𝑘

= 𝑧𝑗 ∀ 𝑗

This is a stationary pattern

1

1
1
1

1 51

SLIDE 52

Storing specific patterns

Storing 1 pattern: We want

𝑡𝑗𝑕𝑜 ෍

𝑘≠𝑗

𝑥

𝑘𝑗𝑧𝑘

= 𝑧𝑗 ∀ 𝑗

This is a stationary pattern

HEBBIAN LEARNING: 𝑥

𝑘𝑗 = 𝑧𝑘𝑧𝑗

1

1
1
1

1 52

SLIDE 53

Storing specific patterns

𝑡𝑗𝑕𝑜 σ𝑘≠𝑗 𝑥

𝑘𝑗𝑧𝑘 = 𝑡𝑗𝑕𝑜 σ𝑘≠𝑗 𝑧𝑘𝑧𝑗𝑧𝑘

= 𝑡𝑗𝑕𝑜 ෍

𝑘≠𝑗

𝑧𝑘

2𝑧𝑗

= 𝑡𝑗𝑕𝑜 𝑧𝑗 = 𝑧𝑗

HEBBIAN LEARNING: 𝑥

𝑘𝑗 = 𝑧𝑘𝑧𝑗

1

1
1
1

1 53

SLIDE 54

Storing specific patterns

𝑡𝑗𝑕𝑜 σ𝑘≠𝑗 𝑥

𝑘𝑗𝑧𝑘 = 𝑡𝑗𝑕𝑜 σ𝑘≠𝑗 𝑧𝑘𝑧𝑗𝑧𝑘

= 𝑡𝑗𝑕𝑜 ෍

𝑘≠𝑗

𝑧𝑘

2𝑧𝑗

= 𝑡𝑗𝑕𝑜 𝑧𝑗 = 𝑧𝑗

HEBBIAN LEARNING: 𝑥

𝑘𝑗 = 𝑧𝑘𝑧𝑗

1

1
1
1

1

The pattern is stationary

54

SLIDE 55

Storing specific patterns

𝐹 = − ෍

𝑗

෍

𝑘<𝑗

𝑥

𝑘𝑗𝑧𝑘𝑧𝑗 = − ෍ 𝑗

෍

𝑘<𝑗

𝑧𝑗

2𝑧𝑘 2

= − ෍

𝑗

෍

𝑘<𝑗

1 = −0.5𝑂(𝑂 − 1)

This is the lowest possible energy value for the network

HEBBIAN LEARNING: 𝑥

𝑘𝑗 = 𝑧𝑘𝑧𝑗

1

1
1
1

1 55

SLIDE 56

Storing specific patterns

𝐹 = − ෍

𝑗

෍

𝑘<𝑗

𝑥

𝑘𝑗𝑧𝑘𝑧𝑗 = − ෍ 𝑗

෍

𝑘<𝑗

𝑧𝑗

2𝑧𝑘 2

= − ෍

𝑗

෍

𝑘<𝑗

1 = −0.5𝑂(𝑂 − 1)

This is the lowest possible energy value for the network

HEBBIAN LEARNING: 𝑥

𝑘𝑗 = 𝑧𝑘𝑧𝑗

1

1
1
1

1

The pattern is STABLE

56

SLIDE 57

Storing multiple patterns

𝑥

𝑘𝑗 = ෍ 𝑞∈{𝑧𝑞}

𝑧𝑗

𝑞𝑧𝑘 𝑞

{𝑧𝑞} is the set of patterns to store
Superscript 𝑞 represents the specific pattern

1

1
1
1

1 1 1

1

1

1

57

SLIDE 58

Storing multiple patterns

Let 𝐳𝑞 be the vector representing 𝑞-th pattern
Let 𝐙 = 𝐳1 𝐳2 … be a matrix with all the stored pattern
Then..

𝐗 = ෍

𝒒

(𝐳𝑞𝐳𝑞

𝑈 − I) = 𝐙𝐙𝑈 − 𝑂𝑞𝐉

1

1
1
1

1 1 1

1

1

1

58

Number of patterns

SLIDE 59

Storing multiple patterns

Note behavior of 𝐅 𝐳 = 𝐳𝑈𝐗𝐳 with

𝐗 = 𝐙𝐙𝑈 − 𝑂𝑞𝐉

Is identical to behavior with

𝐗 = 𝐙𝐙𝑈

Since

𝐳𝑈 𝐙𝐙𝑈 − 𝑂𝑞𝐉 𝐳 = 𝐳𝑈𝐙𝐙𝑈𝐳 − 𝑂𝑂𝑞

But the latter 𝐗 = 𝐙𝐙𝑈 is easier to analyze. Hence in

the following slides we will use 𝐗 = 𝐙𝐙𝑈

59

Energy landscape

nly differs by

an additive constant

SLIDE 60

Storing multiple patterns

Let 𝐳𝑞 be the vector representing 𝑞-th pattern
Let 𝐙 = 𝐳1 𝐳2 … be a matrix with all the stored

pattern

Then..

𝐗 = 𝐙𝐙𝑈

1

1
1
1

1 1 1

1

1

1

60

Positive semidefinite!

SLIDE 61

Issues

How do we make the network store a specific

pattern or set of patterns?

How many patterns can we store?

61

SLIDE 62

Consider the energy function

Reinstating the bias term for completeness sake

– Remember that we don’t actually use it in a Hopfield net

𝐹 = − 1 2 𝐳𝑈𝐗𝐳 − 𝐜𝑈𝐳

62

SLIDE 63

Consider the energy function

Reinstating the bias term for completeness sake

– Remember that we don’t actually use it in a Hopfield net

𝐹 = − 1 2 𝐳𝑈𝐗𝐳 − 𝐜𝑈𝐳 This is a quadratic! W is positive semidefinite

63

SLIDE 64

Consider the energy function

Reinstating the bias term for completeness sake

– Remember that we don’t actually use it in a Hopfield net

𝐹 = − 1 2 𝐳𝑈𝐗𝐳 − 𝐜𝑈𝐳 This is a quadratic! For Hebbian learning W is positive semidefinite E is convex

64

SLIDE 65

The energy function

𝐹 is a convex quadratic

𝐹 = − 1 2 𝐳𝑈𝐗𝐳 − 𝐜𝑈𝐳

65

SLIDE 66

The energy function

𝐹 is a convex quadratic

– Shown from above (assuming 0 bias)

But components of 𝑧 can only take values ±1

– I.e 𝑧 lies on the corners of the unit hypercube

𝐹 = − 1 2 𝐳𝑈𝐗𝐳 − 𝐜𝑈𝐳

66

SLIDE 67

The energy function

𝐹 is a convex quadratic

– Shown from above (assuming 0 bias)

But components of 𝑧 can only take values ±1

– I.e 𝑧 lies on the corners of the unit hypercube

𝐹 = − 1 2 𝐳𝑈𝐗𝐳 − 𝐜𝑈𝐳

67

SLIDE 68

The energy function

The stored values of 𝐳 are the ones where all

adjacent corners are higher on the quadratic

– Hebbian learning attempts to make the quadratic steep in the vicinity of stored patterns

𝐹 = − 1 2 𝐳𝑈𝐗𝐳 − 𝐜𝑈𝐳

Stored patterns

68

SLIDE 69

Patterns you can store

4-bit patterns
Stored patterns would be the corners where the

value of the quadratic energy is lowest

69

SLIDE 70

Patterns you can store

Ideally must be maximally separated on the hypercube

– The number of patterns we can store depends on the actual distance between the patterns

Stored patterns Ghosts (negations)

70

SLIDE 71

How many patterns can we store?

Hopfield: For a network of 𝑂 neurons can

store up to 0.15𝑂 patterns

– Provided the patterns are random and “far apart”

71

SLIDE 72

How many patterns can we store?

Problem with Hebbian learning: Focuses on patterns

that must be stored

– What about patterns that must not be stored?

More recent work: can actually store up to 𝑂 patterns
Non Hebbian learning
𝑿 loses positive semi-definiteness

72

SLIDE 73

Storing N patterns

Non Hebbian
Requirement: Given 𝐳1, 𝐳2, … , 𝐳𝑄

– Design 𝐗 such that

𝑡𝑗𝑕𝑜 𝐗𝐳𝑞 = 𝐳𝑞 for all target patterns
There are no other binary vectors for which this holds
I.e. 𝐳1, 𝐳2, … , 𝐳𝑄 are the only binary

Eigenvectors of 𝐗, and the corresponding eigen values are positive

73

SLIDE 74

Storing N patterns

Simple solution: Design 𝐗 such that 𝐳1, 𝐳2, … , 𝐳𝑄 are the Eigen

vectors of 𝐗

Easily achieved if 𝐳1, 𝐳2, … , 𝐳𝑄 are orthogonal to one another

– Let 𝑍 = 𝐳1 𝐳2 … 𝐳𝑄 𝐬𝑄+1 … 𝐬𝑂 – 𝑂 is the number of bits – 𝐬𝑄+1 … 𝐬𝑂 are “synthetic” non-binary vectors – 𝐳1 𝐳2 … 𝐳𝑄 𝐬𝑄+1 … 𝐬𝑂 are all orthogonal to one another 𝑋 = 𝑍Λ𝑍𝑈

Eigen values 𝜇 in diagonal matrix Λ determine the steepness of the

energy function around the stored values

– What must the Eigen values corresponding to the 𝐳𝑄s be? – What must the Eigen values corresponding to the “𝐬”s be?

Under no condition can more than 𝑂 values be stored

74

SLIDE 75

Storing N patterns

For non-orthogonal 𝐗, solution is less simple
𝐳1, 𝐳2, … , 𝐳𝑄 can no longer be Eigen values, but now

represent quantized Eigen directions

𝑡𝑗𝑕𝑜 𝐗𝐳𝑞 = 𝐳𝑞 – Note that this is not an exact Eigen value equation

Optimization algorithms can provide 𝐗s for many

patterns

Under no condition can we store more than N patterns

75

SLIDE 76

Alternate Approach to Estimating the Network

Estimate 𝐗 (and 𝐜) such that

– 𝐹 is minimized for 𝐳1, 𝐳2, … , 𝐳𝑄 – 𝐹 is maximized for all other 𝐳

We will encounter this solution again soon
Once again, cannot store more than 𝑂 patterns

𝐹 = − 1 2 𝐳𝑈𝐗𝐳 − 𝐜𝑈𝐳

76

SLIDE 77

Storing more than N patterns

How do we even solve the problem of storing

N patterns in the first place

How do we increase the capacity of the

network

– Store more patterns

Common answer to both problems..

77

SLIDE 78

Lookahead..

Adding capacity to a Hopfield network

78

SLIDE 79

Expanding the network

Add a large number of neurons whose actual

values you don’t care about!

N Neurons K Neurons

79

SLIDE 80

Expanded Network

New capacity: ~(N+K) patterns

– Although we only care about the pattern of the first N neurons – We’re interested in N-bit patterns

N Neurons K Neurons

80

SLIDE 81

Introducing…

The Boltzmann machine…
Next regular class…

N Neurons K Neurons

81