Neural Networks Hopfield Nets and Auto Associators Fall 2017 1 - - PowerPoint PPT Presentation

β–Ά
neural networks
SMART_READER_LITE
LIVE PREVIEW

Neural Networks Hopfield Nets and Auto Associators Fall 2017 1 - - PowerPoint PPT Presentation

Neural Networks Hopfield Nets and Auto Associators Fall 2017 1 Story so far Neural networks for computation All feedforward structures But what about.. 2 Loopy network = +1 > 0 =


slide-1
SLIDE 1

Neural Networks

Hopfield Nets and Auto Associators Fall 2017

1

slide-2
SLIDE 2

Story so far

  • Neural networks for computation
  • All feedforward structures
  • But what about..

2

slide-3
SLIDE 3

Loopy network

  • Each neuron is a perceptron with +1/-1 output
  • Every neuron receives input from every other neuron
  • Every neuron outputs signals to every other neuron

𝑧𝑗 = Θ ෍

π‘˜β‰ π‘—

π‘₯

π‘˜π‘—π‘§π‘˜ + 𝑐𝑗

Θ 𝑨 = α‰Š+1 𝑗𝑔 𝑨 > 0 βˆ’1 𝑗𝑔 𝑨 ≀ 0

The output of a neuron affects the input to the neuron

3

slide-4
SLIDE 4
  • Each neuron is a perceptron with +1/-1 output
  • Every neuron receives input from every other neuron
  • Every neuron outputs signals to every other neuron

𝑧𝑗 = Θ ෍

π‘˜β‰ π‘—

π‘₯

π‘˜π‘—π‘§π‘˜ + 𝑐𝑗

Θ 𝑨 = α‰Š+1 𝑗𝑔 𝑨 > 0 βˆ’1 𝑗𝑔 𝑨 ≀ 0

A symmetric network: π‘₯π‘—π‘˜ = π‘₯

π‘˜π‘—

Loopy network

4

slide-5
SLIDE 5

Hopfield Net

  • Each neuron is a perceptron with +1/-1 output
  • Every neuron receives input from every other neuron
  • Every neuron outputs signals to every other neuron

𝑧𝑗 = Θ ෍

π‘˜β‰ π‘—

π‘₯

π‘˜π‘—π‘§π‘˜ + 𝑐𝑗

Θ 𝑨 = α‰Š+1 𝑗𝑔 𝑨 > 0 βˆ’1 𝑗𝑔 𝑨 ≀ 0

A symmetric network: π‘₯π‘—π‘˜ = π‘₯

π‘˜π‘—

5

slide-6
SLIDE 6

Loopy network

  • Each neuron is a perceptron with a +1/-1 output
  • Every neuron receives input from every other neuron
  • Every neuron outputs signals to every other neuron

𝑧𝑗 = Θ ෍

π‘˜β‰ π‘—

π‘₯

π‘˜π‘—π‘§π‘˜ + 𝑐𝑗

A neuron β€œflips” if weighted sum of other neuron’s outputs is of the opposite sign But this may cause

  • ther neurons to flip!

Θ 𝑨 = α‰Š+1 𝑗𝑔 𝑨 > 0 βˆ’1 𝑗𝑔 𝑨 ≀ 0

6

slide-7
SLIDE 7

Loopy network

  • At each time each neuron receives a β€œfield” Οƒπ‘˜β‰ π‘— π‘₯

π‘˜π‘—π‘§π‘˜ + 𝑐𝑗

  • If the sign of the field matches its own sign, it does not

respond

  • If the sign of the field opposes its own sign, it β€œflips” to

match the sign of the field

𝑧𝑗 = Θ ෍

π‘˜β‰ π‘—

π‘₯

π‘˜π‘—π‘§π‘˜ + 𝑐𝑗

Θ 𝑨 = α‰Š+1 𝑗𝑔 𝑨 > 0 βˆ’1 𝑗𝑔 𝑨 ≀ 0

7

slide-8
SLIDE 8

Example

  • Red edges are -1, blue edges are +1
  • Yellow nodes are +1, black nodes are -1

8

slide-9
SLIDE 9

Example

  • Red edges are -1, blue edges are +1
  • Yellow nodes are +1, black nodes are -1

9

slide-10
SLIDE 10

Example

  • Red edges are -1, blue edges are +1
  • Yellow nodes are +1, black nodes are -1

10

slide-11
SLIDE 11

Example

  • Red edges are -1, blue edges are +1
  • Yellow nodes are +1, black nodes are -1

11

slide-12
SLIDE 12

Loopy network

  • If the sign of the field at any neuron opposes

its own sign, it β€œflips” to match the field

– Which will change the field at other nodes

  • Which may then flip

– Which may cause other neurons including the first one to flip… Β» And so on…

12

slide-13
SLIDE 13

20 evolutions of a loopy net

  • All neurons which do not β€œalign” with the local

field β€œflip”

𝑧𝑗 = Θ ෍

π‘˜β‰ π‘—

π‘₯

π‘˜π‘—π‘§π‘˜ + 𝑐𝑗

A neuron β€œflips” if weighted sum of other neuron’s outputs is of the opposite sign But this may cause

  • ther neurons to flip!

Θ 𝑨 = α‰Š+1 𝑗𝑔 𝑨 > 0 βˆ’1 𝑗𝑔 𝑨 ≀ 0

13

slide-14
SLIDE 14

120 evolutions of a loopy net

  • All neurons which do not β€œalign” with the local

field β€œflip”

14

slide-15
SLIDE 15

Loopy network

  • If the sign of the field at any neuron opposes

its own sign, it β€œflips” to match the field

– Which will change the field at other nodes

  • Which may then flip

– Which may cause other neurons including the first one to flip…

  • Will this behavior continue for ever??

15

slide-16
SLIDE 16

Loopy network

  • Let 𝑧𝑗

βˆ’ be the output of the i-th neuron just before it responds to the

current field

  • Let 𝑧𝑗

+ be the output of the i-th neuron just after it responds to the current

field

  • If 𝑧𝑗

βˆ’ = π‘‘π‘—π‘•π‘œ Οƒπ‘˜β‰ π‘— π‘₯ π‘˜π‘—π‘§π‘˜ + 𝑐𝑗 , then 𝑧𝑗 + = 𝑧𝑗 βˆ’

– If the sign of the field matches its own sign, it does not flip

𝑧𝑗

+

෍

π‘˜β‰ π‘—

π‘₯

π‘˜π‘—π‘§π‘˜ + 𝑐𝑗

βˆ’ 𝑧𝑗

βˆ’

෍

π‘˜β‰ π‘—

π‘₯

π‘˜π‘—π‘§π‘˜ + 𝑐𝑗

= 0

𝑧𝑗 = Θ ෍

π‘˜β‰ π‘—

π‘₯

π‘˜π‘—π‘§π‘˜ + 𝑐𝑗

Θ 𝑨 = α‰Š+1 𝑗𝑔 𝑨 > 0 βˆ’1 𝑗𝑔 𝑨 ≀ 0

16

slide-17
SLIDE 17

Loopy network

  • If 𝑧𝑗

βˆ’ β‰  π‘‘π‘—π‘•π‘œ Οƒπ‘˜β‰ π‘— π‘₯ π‘˜π‘—π‘§π‘˜ + 𝑐𝑗 , then 𝑧𝑗 + = βˆ’π‘§π‘— βˆ’

𝑧𝑗

+

෍

π‘˜β‰ π‘—

π‘₯

π‘˜π‘—π‘§π‘˜ + 𝑐𝑗

βˆ’ 𝑧𝑗

βˆ’

෍

π‘˜β‰ π‘—

π‘₯

π‘˜π‘—π‘§π‘˜ + 𝑐𝑗

= 2𝑧𝑗

+

෍

π‘˜β‰ π‘—

π‘₯

π‘˜π‘—π‘§π‘˜ + 𝑐𝑗

– This term is always positive!

  • Every flip of a neuron is guaranteed to locally increase

𝑧𝑗 ෍

π‘˜β‰ π‘—

π‘₯

π‘˜π‘—π‘§π‘˜ + 𝑐𝑗

𝑧𝑗 = Θ ෍

π‘˜β‰ π‘—

π‘₯

π‘˜π‘—π‘§π‘˜ + 𝑐𝑗

Θ 𝑨 = α‰Š+1 𝑗𝑔 𝑨 > 0 βˆ’1 𝑗𝑔 𝑨 ≀ 0

17

slide-18
SLIDE 18

Globally

  • Consider the following sum across all nodes

𝐸 𝑧1, 𝑧2, … , 𝑧𝑂 = ෍

𝑗

𝑧𝑗 ෍

π‘˜<𝑗

π‘₯

π‘˜π‘—π‘§π‘˜ + 𝑐𝑗

= ෍

𝑗,π‘˜<𝑗

π‘₯π‘—π‘˜π‘§π‘—π‘§π‘˜ + ෍

𝑗

𝑐𝑗𝑧𝑗

– Definition same as earlier, but avoids double counting and assumes π‘₯𝑗𝑗 = 0

  • For any unit 𝑙 that β€œflips” because of the local field

βˆ†πΈ 𝑧𝑙 = 𝐸 𝑧1, … , 𝑧𝑙

+, … , 𝑧𝑂 βˆ’ 𝐸 𝑧1, … , 𝑧𝑙 βˆ’, … , 𝑧𝑂

18

slide-19
SLIDE 19

Upon flipping a single unit

βˆ†πΈ 𝑧𝑙 = 𝐸 𝑧1, … , 𝑧𝑙

+, … , 𝑧𝑂 βˆ’ 𝐸 𝑧1, … , 𝑧𝑙 βˆ’, … , 𝑧𝑂

  • Expanding

βˆ†πΈ 𝑧𝑙 = 𝑧𝑙

+ βˆ’ 𝑧𝑙 βˆ’ ෍ π‘˜β‰ π‘—

π‘₯

π‘˜π‘™π‘§π‘˜ + 𝑧𝑙 + βˆ’ 𝑧𝑙 βˆ’ 𝑐𝑙

– All other terms that do not include 𝑧𝑙 cancel out

  • This is always positive!
  • Every flip of a unit results in an increase in 𝐸

19

slide-20
SLIDE 20

Hopfield Net

  • Flipping a unit will result in an increase (non-decrease) of

𝐸 = ෍

𝑗,π‘˜<𝑗

π‘₯π‘—π‘˜π‘§π‘—π‘§π‘˜ + ෍

𝑗

𝑐𝑗𝑧𝑗

  • 𝐸 is bounded

𝐸𝑛𝑏𝑦 = ෍

𝑗,π‘˜<𝑗

π‘₯π‘—π‘˜ + ෍

𝑗

𝑐𝑗

  • The minimum increment of 𝐸 in a flip is

βˆ†πΈπ‘›π‘—π‘œ= min

𝑗, {𝑧𝑗, 𝑗=1..𝑂} 2 ෍ π‘˜β‰ π‘—

π‘₯

π‘˜π‘—π‘§π‘˜ + 𝑐𝑗

  • Any sequence of flips must converge in a finite number of steps

20

slide-21
SLIDE 21

The Energy of a Hopfield Net

  • Define the Energy of the network as

𝐹 = βˆ’ ෍

𝑗,π‘˜<𝑗

π‘₯π‘—π‘˜π‘§π‘—π‘§π‘˜ βˆ’ ෍

𝑗

𝑐𝑗𝑧𝑗

– Just the negative of 𝐸

  • The evolution of a Hopfield network

constantly decreases its energy

  • Where did this β€œenergy” concept suddenly sprout

from?

21

slide-22
SLIDE 22

Analogy: Spin Glasses

  • Magnetic diploes
  • Each dipole tries to align itself to the local field

– In doing so it may flip

  • This will change fields at other dipoles

– Which may flip

  • Which changes the field at the current dipole…

22

slide-23
SLIDE 23

Analogy: Spin Glasses

  • π‘žπ‘— is vector position of 𝑗-th dipole
  • The field at any dipole is the sum of the field contributions of all other dipoles
  • The contribution of a dipole to the field at any point falls off inversely with

square of distance

Total field at current dipole: 𝑔 π‘žπ‘— = ෍

π‘˜β‰ π‘—

π‘ π‘¦π‘˜ π‘žπ‘— βˆ’ π‘žπ‘˜

2 + 𝑐𝑗 intrinsic external

23

slide-24
SLIDE 24

Analogy: Spin Glasses

  • A Dipole flips if it is misaligned with the field

in its location

Total field at current dipole: Response of current diplose

𝑦𝑗 = ࡝𝑦𝑗 𝑗𝑔 π‘‘π‘—π‘•π‘œ 𝑦𝑗 𝑔 π‘žπ‘— = 1 βˆ’π‘¦π‘— π‘π‘’β„Žπ‘“π‘ π‘₯𝑗𝑑𝑓 𝑔 π‘žπ‘— = ෍

π‘˜β‰ π‘—

π‘ π‘¦π‘˜ π‘žπ‘— βˆ’ π‘žπ‘˜

2 + 𝑐𝑗

24

slide-25
SLIDE 25

Analogy: Spin Glasses

Total field at current dipole: Response of current diplose

𝑦𝑗 = ࡝𝑦𝑗 𝑗𝑔 π‘‘π‘—π‘•π‘œ 𝑦𝑗 𝑔 π‘žπ‘— = 1 βˆ’π‘¦π‘— π‘π‘’β„Žπ‘“π‘ π‘₯𝑗𝑑𝑓

  • Dipoles will keep flipping

– A flipped dipole changes the field at other dipoles

  • Some of which will flip

– Which will change the field at the current dipole

  • Which may flip

– Etc..

𝑔 π‘žπ‘— = ෍

π‘˜β‰ π‘—

π‘ π‘¦π‘˜ π‘žπ‘— βˆ’ π‘žπ‘˜

2 + 𝑐𝑗

25

slide-26
SLIDE 26

Analogy: Spin Glasses

  • When will it stop???

Total field at current dipole: Response of current diplose

𝑦𝑗 = ࡝𝑦𝑗 𝑗𝑔 π‘‘π‘—π‘•π‘œ 𝑦𝑗 𝑔 π‘žπ‘— = 1 βˆ’π‘¦π‘— π‘π‘’β„Žπ‘“π‘ π‘₯𝑗𝑑𝑓

𝑔 π‘žπ‘— = ෍

π‘˜β‰ π‘—

π‘ π‘¦π‘˜ π‘žπ‘— βˆ’ π‘žπ‘˜

2 + 𝑐𝑗

26

slide-27
SLIDE 27

Analogy: Spin Glasses

  • The total potential energy of the system

𝐹 = 𝐷 βˆ’ 1 2 ෍

𝑗

𝑦𝑗𝑔 π‘žπ‘— = 𝐷 βˆ’ ෍

𝑗

෍

π‘˜>𝑗

π‘ π‘¦π‘—π‘¦π‘˜ π‘žπ‘— βˆ’ π‘žπ‘˜

2 βˆ’ ෍ 𝑗

π‘π‘—π‘¦π‘˜

  • The system evolves to minimize the PE

– Dipoles stop flipping if any flips result in increase of PE

Total field at current dipole:

𝑔 π‘žπ‘— = ෍

π‘˜β‰ π‘—

π‘ π‘¦π‘˜ π‘žπ‘— βˆ’ π‘žπ‘˜

2 + 𝑐𝑗

Response of current diplose

𝑦𝑗 = ࡝𝑦𝑗 𝑗𝑔 π‘‘π‘—π‘•π‘œ 𝑦𝑗 𝑔 π‘žπ‘— = 1 βˆ’π‘¦π‘— π‘π‘’β„Žπ‘“π‘ π‘₯𝑗𝑑𝑓

27

slide-28
SLIDE 28

Spin Glasses

  • The system stops at one of its stable configurations

– Where PE is a local minimum

  • Any small jitter from this stable configuration returns it to the stable

configuration

– I.e. the system remembers its stable state and returns to it

state PE

28

slide-29
SLIDE 29

Hopfield Network

𝐹 = βˆ’ ෍

𝑗,π‘˜<𝑗

π‘₯π‘—π‘˜π‘§π‘—π‘§π‘˜ βˆ’ ෍

𝑗

𝑐𝑗𝑧𝑗

  • This is analogous to the potential energy of a spin glass

– The system will evolve until the energy hits a local minimum 𝑧𝑗 = Θ ෍

π‘˜β‰ π‘—

π‘₯

π‘˜π‘—π‘§π‘˜ + 𝑐𝑗

Θ 𝑨 = α‰Š+1 𝑗𝑔 𝑨 > 0 βˆ’1 𝑗𝑔 𝑨 ≀ 0

29

slide-30
SLIDE 30

Hopfield Network

𝐹 = βˆ’ ෍

𝑗,π‘˜<𝑗

π‘₯π‘—π‘˜π‘§π‘—π‘§π‘˜ βˆ’ ෍

𝑗

𝑐𝑗𝑧𝑗

  • This is analogous to the potential energy of a spin glass

– The system will evolve until the energy hits a local minimum 𝑧𝑗 = Θ ෍

π‘˜β‰ π‘—

π‘₯

π‘˜π‘—π‘§π‘˜ + 𝑐𝑗

Θ 𝑨 = α‰Š+1 𝑗𝑔 𝑨 > 0 βˆ’1 𝑗𝑔 𝑨 ≀ 0 Typically will not utilize bias: The bias is similar to having a single extra neuron that is pegged to 1.0 Removing the bias term does not affect the rest of the discussion in any manner But not RIP, we will bring it back later in the discussion

30

slide-31
SLIDE 31

Hopfield Network

𝐹 = βˆ’ ෍

𝑗,π‘˜<𝑗

π‘₯π‘—π‘˜π‘§π‘—π‘§π‘˜

  • This is analogous to the potential energy of a spin glass

– The system will evolve until the energy hits a local minimum 𝑧𝑗 = Θ ෍

π‘˜β‰ π‘—

π‘₯

π‘˜π‘—π‘§π‘˜

Θ 𝑨 = α‰Š+1 𝑗𝑔 𝑨 > 0 βˆ’1 𝑗𝑔 𝑨 ≀ 0

31

slide-32
SLIDE 32

Evolution

  • The network will evolve until it arrives at a

local minimum in the energy contour

𝐹 = βˆ’ ෍

𝑗,π‘˜<𝑗

π‘₯π‘—π‘˜π‘§π‘—π‘§π‘˜

state PE 32

slide-33
SLIDE 33

Content-addressable memory

  • Each of the minima is a β€œstored” pattern

– If the network is initialized close to a stored pattern, it will inevitably evolve to the pattern

  • This is a content addressable memory

– Recall memory content from partial or corrupt values

  • Also called associative memory

state PE

33

slide-34
SLIDE 34

Evolution

  • The network will evolve until it arrives at a

local minimum in the energy contour

Image pilfered from unknown source

𝐹 = βˆ’ ෍

𝑗,π‘˜<𝑗

π‘₯π‘—π‘˜π‘§π‘—π‘§π‘˜

34

slide-35
SLIDE 35

Evolution

  • The network will evolve until it arrives at a local minimum in the

energy contour

  • We proved that every change in the network will result in decrease

in energy

– So path to energy minimum is monotonic

𝐹 = βˆ’ ෍

𝑗,π‘˜<𝑗

π‘₯π‘—π‘˜π‘§π‘—π‘§π‘˜

35

slide-36
SLIDE 36

Evolution

  • For threshold activations the energy contour is only

defined on a lattice

– Corners of a unit cube on [-1,1]N

  • For tanh activations it will be a continuous function

𝐹 = βˆ’ ෍

𝑗,π‘˜<𝑗

π‘₯π‘—π‘˜π‘§π‘—π‘§π‘˜

36

slide-37
SLIDE 37

Evolution

  • For threshold activations the energy contour is only

defined on a lattice

– Corners of a unit cube on [-1,1]N

  • For tanh activations it will be a continuous function

𝐹 = βˆ’ ෍

𝑗,π‘˜<𝑗

π‘₯π‘—π‘˜π‘§π‘—π‘§π‘˜

37

slide-38
SLIDE 38

Evolution

  • For threshold activations the energy contour is only

defined on a lattice

– Corners of a unit cube

  • For tanh activations it will be a continuous function

𝐹 = βˆ’ 1 2 π³π‘ˆπ—π³

In matrix form Note the 1/2

38

slide-39
SLIDE 39

β€œEnergy”contour for a 2-neuron net

  • Two stable states (tanh activation)

– Symmetric, not at corners – Blue arc shows a typical trajectory for sigmoid activation

39

slide-40
SLIDE 40

β€œEnergy”contour for a 2-neuron net

  • Two stable states (tanh activation)

– Symmetric, not at corners – Blue arc shows a typical trajectory for sigmoid activation Why symmetric? Because βˆ’

1 2 π³π‘ˆπ—π³ = βˆ’ 1 2 (βˆ’π³)π‘ˆπ—(βˆ’π³)

If ො 𝐳 is a local minimum, so is βˆ’ΰ·œ 𝐳

40

slide-41
SLIDE 41

3-neuron net

  • 8 possible states
  • 2 stable states (hard thresholded network)

41

slide-42
SLIDE 42

Examples: Content addressable memory

  • http://staff.itee.uq.edu.au/janetw/cmc/chapters/Hopfield/

42

slide-43
SLIDE 43

Hopfield net examples

43

slide-44
SLIDE 44

Computational algorithm

  • Very simple
  • Updates can be done sequentially, or all at once
  • Convergence

𝐹 = βˆ’ ෍

𝑗

෍

π‘˜>𝑗

π‘₯

π‘˜π‘—π‘§π‘˜π‘§π‘—

does not change significantly any more

  • 1. Initialize network with initial pattern

𝑧𝑗 0 = 𝑦𝑗, 0 ≀ 𝑗 ≀ 𝑂 βˆ’ 1

  • 2. Iterate until convergence

𝑧𝑗 𝑒 + 1 = Θ ෍

π‘˜β‰ π‘—

π‘₯

π‘˜π‘—π‘§π‘˜

, 0 ≀ 𝑗 ≀ 𝑂 βˆ’ 1

44

slide-45
SLIDE 45

Issues

  • How do we make the network store a specific

pattern or set of patterns?

  • How many patterns can we store?

45

slide-46
SLIDE 46

Issues

  • How do we make the network store a specific

pattern or set of patterns?

  • How many patterns can we store?

46

slide-47
SLIDE 47

How do we remember a specific pattern?

  • How do we teach a network

to β€œremember” this image

  • For an image with 𝑂 pixels we need a network

with 𝑂 neurons

  • Every neuron connects to every other neuron
  • Weights are symmetric (not mandatory)
  • 𝑂(π‘‚βˆ’1)

2

weights in all

47

slide-48
SLIDE 48

Storing patterns: Training a network

  • A network that stores pattern 𝑄 also naturally stores – 𝑄

– Symmetry 𝐹(𝑄) = 𝐹(βˆ’π‘„) since 𝐹 is a function of yiyj

𝐹 = βˆ’ ෍

𝑗

෍

π‘˜<𝑗

π‘₯

π‘˜π‘—π‘§π‘˜π‘§π‘—

  • 1

1 1 1

  • 1

1

  • 1
  • 1
  • 1

1

48

slide-49
SLIDE 49

A network can store multiple patterns

  • Every stable point is a stored pattern
  • So we could design the net to store multiple patterns

– Remember that every stored pattern 𝑄 is actually two stored patterns, 𝑄 and βˆ’π‘„

state PE

1

  • 1
  • 1
  • 1

1 1 1

  • 1

1

  • 1

49

slide-50
SLIDE 50

Storing a pattern

  • Design {π‘₯π‘—π‘˜} such that the energy is a local

minimum at the desired 𝑄 = {𝑧𝑗}

𝐹 = βˆ’ ෍

𝑗

෍

π‘˜<𝑗

π‘₯

π‘˜π‘—π‘§π‘˜π‘§π‘—

1

  • 1
  • 1
  • 1

1 1 1

  • 1

1

  • 1

50

slide-51
SLIDE 51

Storing specific patterns

  • Storing 1 pattern: We want

π‘‘π‘—π‘•π‘œ ෍

π‘˜β‰ π‘—

π‘₯

π‘˜π‘—π‘§π‘˜

= 𝑧𝑗 βˆ€ 𝑗

  • This is a stationary pattern

1

  • 1
  • 1
  • 1

1 51

slide-52
SLIDE 52

Storing specific patterns

  • Storing 1 pattern: We want

π‘‘π‘—π‘•π‘œ ෍

π‘˜β‰ π‘—

π‘₯

π‘˜π‘—π‘§π‘˜

= 𝑧𝑗 βˆ€ 𝑗

  • This is a stationary pattern

HEBBIAN LEARNING: π‘₯

π‘˜π‘— = π‘§π‘˜π‘§π‘—

1

  • 1
  • 1
  • 1

1 52

slide-53
SLIDE 53

Storing specific patterns

  • π‘‘π‘—π‘•π‘œ Οƒπ‘˜β‰ π‘— π‘₯

π‘˜π‘—π‘§π‘˜ = π‘‘π‘—π‘•π‘œ Οƒπ‘˜β‰ π‘— π‘§π‘˜π‘§π‘—π‘§π‘˜

= π‘‘π‘—π‘•π‘œ ෍

π‘˜β‰ π‘—

π‘§π‘˜

2𝑧𝑗

= π‘‘π‘—π‘•π‘œ 𝑧𝑗 = 𝑧𝑗

HEBBIAN LEARNING: π‘₯

π‘˜π‘— = π‘§π‘˜π‘§π‘—

1

  • 1
  • 1
  • 1

1 53

slide-54
SLIDE 54

Storing specific patterns

  • π‘‘π‘—π‘•π‘œ Οƒπ‘˜β‰ π‘— π‘₯

π‘˜π‘—π‘§π‘˜ = π‘‘π‘—π‘•π‘œ Οƒπ‘˜β‰ π‘— π‘§π‘˜π‘§π‘—π‘§π‘˜

= π‘‘π‘—π‘•π‘œ ෍

π‘˜β‰ π‘—

π‘§π‘˜

2𝑧𝑗

= π‘‘π‘—π‘•π‘œ 𝑧𝑗 = 𝑧𝑗

HEBBIAN LEARNING: π‘₯

π‘˜π‘— = π‘§π‘˜π‘§π‘—

1

  • 1
  • 1
  • 1

1

The pattern is stationary

54

slide-55
SLIDE 55

Storing specific patterns

𝐹 = βˆ’ ෍

𝑗

෍

π‘˜<𝑗

π‘₯

π‘˜π‘—π‘§π‘˜π‘§π‘— = βˆ’ ෍ 𝑗

෍

π‘˜<𝑗

𝑧𝑗

2π‘§π‘˜ 2

= βˆ’ ෍

𝑗

෍

π‘˜<𝑗

1 = βˆ’0.5𝑂(𝑂 βˆ’ 1)

  • This is the lowest possible energy value for the network

HEBBIAN LEARNING: π‘₯

π‘˜π‘— = π‘§π‘˜π‘§π‘—

1

  • 1
  • 1
  • 1

1 55

slide-56
SLIDE 56

Storing specific patterns

𝐹 = βˆ’ ෍

𝑗

෍

π‘˜<𝑗

π‘₯

π‘˜π‘—π‘§π‘˜π‘§π‘— = βˆ’ ෍ 𝑗

෍

π‘˜<𝑗

𝑧𝑗

2π‘§π‘˜ 2

= βˆ’ ෍

𝑗

෍

π‘˜<𝑗

1 = βˆ’0.5𝑂(𝑂 βˆ’ 1)

  • This is the lowest possible energy value for the network

HEBBIAN LEARNING: π‘₯

π‘˜π‘— = π‘§π‘˜π‘§π‘—

1

  • 1
  • 1
  • 1

1

The pattern is STABLE

56

slide-57
SLIDE 57

Storing multiple patterns

π‘₯

π‘˜π‘— = ෍ π‘žβˆˆ{π‘§π‘ž}

𝑧𝑗

π‘žπ‘§π‘˜ π‘ž

  • {π‘§π‘ž} is the set of patterns to store
  • Superscript π‘ž represents the specific pattern

1

  • 1
  • 1
  • 1

1 1 1

  • 1

1

  • 1

57

slide-58
SLIDE 58

Storing multiple patterns

  • Let π³π‘ž be the vector representing π‘ž-th pattern
  • Let 𝐙 = 𝐳1 𝐳2 … be a matrix with all the stored pattern
  • Then..

𝐗 = ෍

𝒒

(π³π‘žπ³π‘ž

π‘ˆ βˆ’ I) = π™π™π‘ˆ βˆ’ π‘‚π‘žπ‰

1

  • 1
  • 1
  • 1

1 1 1

  • 1

1

  • 1

58

Number of patterns

slide-59
SLIDE 59

Storing multiple patterns

  • Note behavior of 𝐅 𝐳 = π³π‘ˆπ—π³ with

𝐗 = π™π™π‘ˆ βˆ’ π‘‚π‘žπ‰

  • Is identical to behavior with

𝐗 = π™π™π‘ˆ

  • Since

π³π‘ˆ π™π™π‘ˆ βˆ’ π‘‚π‘žπ‰ 𝐳 = π³π‘ˆπ™π™π‘ˆπ³ βˆ’ π‘‚π‘‚π‘ž

  • But the latter 𝐗 = π™π™π‘ˆ is easier to analyze. Hence in

the following slides we will use 𝐗 = π™π™π‘ˆ

59

Energy landscape

  • nly differs by

an additive constant

slide-60
SLIDE 60

Storing multiple patterns

  • Let π³π‘ž be the vector representing π‘ž-th pattern
  • Let 𝐙 = 𝐳1 𝐳2 … be a matrix with all the stored

pattern

  • Then..

𝐗 = π™π™π‘ˆ

1

  • 1
  • 1
  • 1

1 1 1

  • 1

1

  • 1

60

Positive semidefinite!

slide-61
SLIDE 61

Issues

  • How do we make the network store a specific

pattern or set of patterns?

  • How many patterns can we store?

61

slide-62
SLIDE 62

Consider the energy function

  • Reinstating the bias term for completeness sake

– Remember that we don’t actually use it in a Hopfield net

𝐹 = βˆ’ 1 2 π³π‘ˆπ—π³ βˆ’ πœπ‘ˆπ³

62

slide-63
SLIDE 63

Consider the energy function

  • Reinstating the bias term for completeness sake

– Remember that we don’t actually use it in a Hopfield net

𝐹 = βˆ’ 1 2 π³π‘ˆπ—π³ βˆ’ πœπ‘ˆπ³ This is a quadratic! W is positive semidefinite

63

slide-64
SLIDE 64

Consider the energy function

  • Reinstating the bias term for completeness sake

– Remember that we don’t actually use it in a Hopfield net

𝐹 = βˆ’ 1 2 π³π‘ˆπ—π³ βˆ’ πœπ‘ˆπ³ This is a quadratic! For Hebbian learning W is positive semidefinite E is convex

64

slide-65
SLIDE 65

The energy function

  • 𝐹 is a convex quadratic

𝐹 = βˆ’ 1 2 π³π‘ˆπ—π³ βˆ’ πœπ‘ˆπ³

65

slide-66
SLIDE 66

The energy function

  • 𝐹 is a convex quadratic

– Shown from above (assuming 0 bias)

  • But components of 𝑧 can only take values Β±1

– I.e 𝑧 lies on the corners of the unit hypercube

𝐹 = βˆ’ 1 2 π³π‘ˆπ—π³ βˆ’ πœπ‘ˆπ³

66

slide-67
SLIDE 67

The energy function

  • 𝐹 is a convex quadratic

– Shown from above (assuming 0 bias)

  • But components of 𝑧 can only take values Β±1

– I.e 𝑧 lies on the corners of the unit hypercube

𝐹 = βˆ’ 1 2 π³π‘ˆπ—π³ βˆ’ πœπ‘ˆπ³

67

slide-68
SLIDE 68

The energy function

  • The stored values of 𝐳 are the ones where all

adjacent corners are higher on the quadratic

– Hebbian learning attempts to make the quadratic steep in the vicinity of stored patterns

𝐹 = βˆ’ 1 2 π³π‘ˆπ—π³ βˆ’ πœπ‘ˆπ³

Stored patterns

68

slide-69
SLIDE 69

Patterns you can store

  • 4-bit patterns
  • Stored patterns would be the corners where the

value of the quadratic energy is lowest

69

slide-70
SLIDE 70

Patterns you can store

  • Ideally must be maximally separated on the hypercube

– The number of patterns we can store depends on the actual distance between the patterns

Stored patterns Ghosts (negations)

70

slide-71
SLIDE 71

How many patterns can we store?

  • Hopfield: For a network of 𝑂 neurons can

store up to 0.15𝑂 patterns

– Provided the patterns are random and β€œfar apart”

71

slide-72
SLIDE 72

How many patterns can we store?

  • Problem with Hebbian learning: Focuses on patterns

that must be stored

– What about patterns that must not be stored?

  • More recent work: can actually store up to 𝑂 patterns
  • Non Hebbian learning
  • 𝑿 loses positive semi-definiteness

72

slide-73
SLIDE 73

Storing N patterns

  • Non Hebbian
  • Requirement: Given 𝐳1, 𝐳2, … , 𝐳𝑄

– Design 𝐗 such that

  • π‘‘π‘—π‘•π‘œ π—π³π‘ž = π³π‘ž for all target patterns
  • There are no other binary vectors for which this holds
  • I.e. 𝐳1, 𝐳2, … , 𝐳𝑄 are the only binary

Eigenvectors of 𝐗, and the corresponding eigen values are positive

73

slide-74
SLIDE 74

Storing N patterns

  • Simple solution: Design 𝐗 such that 𝐳1, 𝐳2, … , 𝐳𝑄 are the Eigen

vectors of 𝐗

  • Easily achieved if 𝐳1, 𝐳2, … , 𝐳𝑄 are orthogonal to one another

– Let 𝑍 = 𝐳1 𝐳2 … 𝐳𝑄 𝐬𝑄+1 … 𝐬𝑂 – 𝑂 is the number of bits – 𝐬𝑄+1 … 𝐬𝑂 are β€œsynthetic” non-binary vectors – 𝐳1 𝐳2 … 𝐳𝑄 𝐬𝑄+1 … 𝐬𝑂 are all orthogonal to one another 𝑋 = π‘Ξ›π‘π‘ˆ

  • Eigen values πœ‡ in diagonal matrix Ξ› determine the steepness of the

energy function around the stored values

– What must the Eigen values corresponding to the 𝐳𝑄s be? – What must the Eigen values corresponding to the β€œπ¬β€s be?

  • Under no condition can more than 𝑂 values be stored

74

slide-75
SLIDE 75

Storing N patterns

  • For non-orthogonal 𝐗, solution is less simple
  • 𝐳1, 𝐳2, … , 𝐳𝑄 can no longer be Eigen values, but now

represent quantized Eigen directions

π‘‘π‘—π‘•π‘œ π—π³π‘ž = π³π‘ž – Note that this is not an exact Eigen value equation

  • Optimization algorithms can provide 𝐗s for many

patterns

  • Under no condition can we store more than N patterns

75

slide-76
SLIDE 76

Alternate Approach to Estimating the Network

  • Estimate 𝐗 (and 𝐜) such that

– 𝐹 is minimized for 𝐳1, 𝐳2, … , 𝐳𝑄 – 𝐹 is maximized for all other 𝐳

  • We will encounter this solution again soon
  • Once again, cannot store more than 𝑂 patterns

𝐹 = βˆ’ 1 2 π³π‘ˆπ—π³ βˆ’ πœπ‘ˆπ³

76

slide-77
SLIDE 77

Storing more than N patterns

  • How do we even solve the problem of storing

N patterns in the first place

  • How do we increase the capacity of the

network

– Store more patterns

  • Common answer to both problems..

77

slide-78
SLIDE 78

Lookahead..

  • Adding capacity to a Hopfield network

78

slide-79
SLIDE 79

Expanding the network

  • Add a large number of neurons whose actual

values you don’t care about!

N Neurons K Neurons

79

slide-80
SLIDE 80

Expanded Network

  • New capacity: ~(N+K) patterns

– Although we only care about the pattern of the first N neurons – We’re interested in N-bit patterns

N Neurons K Neurons

80

slide-81
SLIDE 81

Introducing…

  • The Boltzmann machine…
  • Next regular class…

N Neurons K Neurons

81