Integratingdeeplearningandlogic DeepLearning No constraints on - - PowerPoint PPT Presentation

integrating deep learning and logic
SMART_READER_LITE
LIVE PREVIEW

Integratingdeeplearningandlogic DeepLearning No constraints on - - PowerPoint PPT Presentation

SATNet : Bridgingdeeplearningandlogicalreasoning usingadifferentiablesatisfiabilitysolver Po-Wei Wang 1 Priya L. Donti 1 Bryan Wilder 2 J. Zico Kolter 1 , 3 1 School of Computer Science, 2 School of Engineering and Applied Sciences, 3


slide-1
SLIDE 1

SATNet :

Bridging deep learning and logical reasoning using a differentiable satisfiability solver

Po-Wei Wang 1 Priya L. Donti 1 Bryan Wilder 2

  • J. Zico Kolter 1,3

1 School of Computer Science, 2 School of Engineering and Applied Sciences, 3 Bosch Center for Artificial Intelligence

Carnegie Mellon University Harvard University

1

slide-2
SLIDE 2

Integrating deep learning and logic

Deep Learning No constraints on output Differentiable Solved via gradient optimizers

Sudoku image: ”12 Jan 2006” by SudoFlickr is licensed under CC BY-SA 2.0 2

slide-3
SLIDE 3

Integrating deep learning and logic

Deep Learning No constraints on output Differentiable Solved via gradient optimizers Logical Inference Rich constraints on output Discrete input/output Solved via tree search

Sudoku image: ”12 Jan 2006” by SudoFlickr is licensed under CC BY-SA 2.0 2

slide-4
SLIDE 4

Integrating deep learning and logic

Deep Learning No constraints on output Differentiable Solved via gradient optimizers

Logical Inference Rich constraints on output Discrete input/output Solved via tree search

Sudoku image: ”12 Jan 2006” by SudoFlickr is licensed under CC BY-SA 2.0 2

slide-5
SLIDE 5

Integrating deep learning and logic

Deep Learning No constraints on output Differentiable Solved via gradient optimizers

+

Logical Inference Rich constraints on output Discrete input/output Solved via tree search

Sudoku image: ”12 Jan 2006” by SudoFlickr is licensed under CC BY-SA 2.0 2

slide-6
SLIDE 6

Integrating deep learning and logic

Deep Learning No constraints on output Differentiable Solved via gradient optimizers

+

Logical Inference Rich constraints on output Discrete input/output Solved via tree search

Sudoku image: ”12 Jan 2006” by SudoFlickr is licensed under CC BY-SA 2.0 2

slide-7
SLIDE 7

This talk is not about . . .

Not about learning to find SAT solutions [Selsam et al. 2019]

  • but about learning both constraints and solution from examples

Not about using DL and SAT in a multi-staged manner

  • doing so requires prior knowledge on the stucture and constraints
  • further, current SAT solvers cannot accept probability inputs

3

slide-8
SLIDE 8

This talk is not about . . .

Not about learning to find SAT solutions [Selsam et al. 2019]

  • but about learning both constraints and solution from examples

Not about using DL and SAT in a multi-staged manner

  • doing so requires prior knowledge on the stucture and constraints
  • further, current SAT solvers cannot accept probability inputs

3

slide-9
SLIDE 9

This talk is not about . . .

Not about learning to find SAT solutions [Selsam et al. 2019]

  • but about learning both constraints and solution from examples

Not about using DL and SAT in a multi-staged manner

  • doing so requires prior knowledge on the stucture and constraints
  • further, current SAT solvers cannot accept probability inputs

3

slide-10
SLIDE 10

This talk is about

  • A layer that enables end-to-end learning of both the constraints

and solutions of logic problems within deep networks...

  • A smoothed differentiable (maximum) satisfiability solver

that can be integrated into the loop of deep learning systems.

4

slide-11
SLIDE 11

This talk is about

  • A layer that enables end-to-end learning of both the constraints

and solutions of logic problems within deep networks...

  • A smoothed differentiable (maximum) satisfiability solver

that can be integrated into the loop of deep learning systems.

4

slide-12
SLIDE 12

Review of SAT problems

Example SAT problem:

v2 ∧ (v1 ∨ ¬v2) ∧ (v2 ∨ ¬v3)

Typical SAT: Clause matrix given, find satisfying assignment Our setting: Clause matrix is parameters of the layer (to be learned)

5

slide-13
SLIDE 13

Review of SAT problems

Example SAT problem:

v2 ∧ (v1 ∨ ¬v2) ∧ (v2 ∨ ¬v3) ⇓ S =   1 1 −1 1 −1   v2 v1 ∨ ¬v2 v2 ∨ ¬v3

Typical SAT: Clause matrix given, find satisfying assignment Our setting: Clause matrix is parameters of the layer (to be learned)

5

slide-14
SLIDE 14

Review of SAT problems

Example SAT problem:

v2 ∧ (v1 ∨ ¬v2) ∧ (v2 ∨ ¬v3) ⇓ S =   1 1 −1 1 −1   v2 v1 ∨ ¬v2 v2 ∨ ¬v3

Typical SAT: Clause matrix given, find satisfying assignment Our setting: Clause matrix is parameters of the layer (to be learned)

5

slide-15
SLIDE 15

MAXSAT Problem

MAXSAT is the optimization variant of SAT solving SAT: Find feasible vi s.t. v2 ∧ (v1 ∨ ¬v2) ∧ (v2 ∨ ¬v3) MAXSAT: maximize # of satisfiable clauses Relax the binary variables to smooth & continuous spheres Semidefinite relaxation (Goemans-Williamson, 1995),

minimize

s.t.

diag

6

slide-16
SLIDE 16

MAXSAT Problem

MAXSAT is the optimization variant of SAT solving SAT: Find feasible vi s.t. v2 ∧ (v1 ∨ ¬v2) ∧ (v2 ∨ ¬v3) MAXSAT: maximize # of satisfiable clauses Relax the binary variables to smooth & continuous spheres

vi ∈ {+1, −1}

equiv

− − − → |vi| = 1, vi ∈ R1 relax − − → ∥vi∥ = 1, vi ∈ Rk

Semidefinite relaxation (Goemans-Williamson, 1995),

minimize

s.t.

diag

6

slide-17
SLIDE 17

MAXSAT Problem

MAXSAT is the optimization variant of SAT solving SAT: Find feasible vi s.t. v2 ∧ (v1 ∨ ¬v2) ∧ (v2 ∨ ¬v3) MAXSAT: maximize # of satisfiable clauses Relax the binary variables to smooth & continuous spheres

vi ∈ {+1, −1}

equiv

− − − → |vi| = 1, vi ∈ R1 relax − − → ∥vi∥ = 1, vi ∈ Rk

Semidefinite relaxation (Goemans-Williamson, 1995), X = V TV

minimize ⟨S TS, X ⟩, s.t. X ⪰ 0, diag(X ) = 1.

6

slide-18
SLIDE 18

SATNet: MAXSAT SDP as a layer

7

slide-19
SLIDE 19

SATNet: MAXSAT SDP as a layer

7

slide-20
SLIDE 20

SATNet: MAXSAT SDP as a layer

7

slide-21
SLIDE 21

SATNet: MAXSAT SDP as a layer

7

slide-22
SLIDE 22

SATNet: MAXSAT SDP as a layer

7

slide-23
SLIDE 23

SATNet: MAXSAT SDP as a layer

7

slide-24
SLIDE 24

SATNet: MAXSAT SDP as a layer

7

slide-25
SLIDE 25

Fast solution to MAXSAT SDP approximation

Efficiently solve via low-rank factorization X = V TV , V ∈ Rk×n, ∥vi∥ = 1 (a.k.a. Burer-Monteiro method), and block coordinate descent iters

vi = −normalize(VS Tsi − ∥si∥2vi).

For , the non-convex iterates are guaranteed to converge to global optima of SDP [Wang et al., 2018; Erdogdu et al., 2018] Complexity reduced from

log log

  • f interior point methods to

log

  • f our method, where

is #clauses.

8

slide-26
SLIDE 26

Fast solution to MAXSAT SDP approximation

Efficiently solve via low-rank factorization X = V TV , V ∈ Rk×n, ∥vi∥ = 1 (a.k.a. Burer-Monteiro method), and block coordinate descent iters

vi = −normalize(VS Tsi − ∥si∥2vi).

For k >

√ 2n, the non-convex iterates are guaranteed to converge to

global optima of SDP [Wang et al., 2018; Erdogdu et al., 2018] Complexity reduced from

log log

  • f interior point methods to

log

  • f our method, where

is #clauses.

8

slide-27
SLIDE 27

Fast solution to MAXSAT SDP approximation

Efficiently solve via low-rank factorization X = V TV , V ∈ Rk×n, ∥vi∥ = 1 (a.k.a. Burer-Monteiro method), and block coordinate descent iters

vi = −normalize(VS Tsi − ∥si∥2vi).

For k >

√ 2n, the non-convex iterates are guaranteed to converge to

global optima of SDP [Wang et al., 2018; Erdogdu et al., 2018] Complexity reduced from O(n6 log log 1

ϵ) of interior point methods to

O(n1.5m log 1

ϵ) of our method, where m is #clauses.

8

slide-28
SLIDE 28

Differentiate through the optimization problem

9

slide-29
SLIDE 29

Differentiate through the optimization problem

When converged, the procedure satisfies the fixed-point equation

vi = −normalize(VS Tsi − ∥si∥2vi), ∀i

The fixed-point equation of the block coordinate descent provides an implicit function definition of the solution [Amos et al. 2017] normalize Thus, can apply implicit function theorem on the total derivatives

=

Solve the above linear system of to backprop

10

slide-30
SLIDE 30

Differentiate through the optimization problem

When converged, the procedure satisfies the fixed-point equation

vi = −normalize(VS Tsi − ∥si∥2vi), ∀i

The fixed-point equation of the block coordinate descent provides an implicit function definition of the solution [Amos et al. 2017]

Fi(S, V (S)) = vi + normalize(VS Tsi − ∥si∥2vi) = 0, ∀i

Thus, can apply implicit function theorem on the total derivatives

=

Solve the above linear system of to backprop

10

slide-31
SLIDE 31

Differentiate through the optimization problem

When converged, the procedure satisfies the fixed-point equation

vi = −normalize(VS Tsi − ∥si∥2vi), ∀i

The fixed-point equation of the block coordinate descent provides an implicit function definition of the solution [Amos et al. 2017]

Fi(S, V (S)) = vi + normalize(VS Tsi − ∥si∥2vi) = 0, ∀i

Thus, can apply implicit function theorem on the total derivatives

∂⃗ F(⃗ S, ⃗ V (S)) ∂⃗ S = 0 = ⇒ ∂⃗ F(⃗ S, ⃗ V ) ∂⃗ S + ∂⃗ F(⃗ S, ⃗ V ) ∂ ⃗ V · ∂ ⃗ V ∂⃗ S = 0

Solve the above linear system of ∂ ⃗

V /∂⃗ S to backprop

10

slide-32
SLIDE 32

SATNet: MAXSAT SDP as a layer

11

slide-33
SLIDE 33

Other ingredients in SATNet

Low-rank regularization on S

  • Doubly-exponentially many possible Boolean functions!
  • Low-rank

Regularize the complexity through number of clauses Auxiliary variable (hidden nodes)

  • Only SDP with diagonal constraints, limiting representation
  • Adding auxiliary variable (gadget) increases representation power

12

slide-34
SLIDE 34

Other ingredients in SATNet

Low-rank regularization on S

  • Doubly-exponentially many possible Boolean functions!
  • Low-rank ⇒ Regularize the complexity through number of clauses

Auxiliary variable (hidden nodes)

  • Only SDP with diagonal constraints, limiting representation
  • Adding auxiliary variable (gadget) increases representation power

12

slide-35
SLIDE 35

Other ingredients in SATNet

Low-rank regularization on S

  • Doubly-exponentially many possible Boolean functions!
  • Low-rank ⇒ Regularize the complexity through number of clauses

Auxiliary variable (hidden nodes)

  • Only SDP with diagonal constraints, limiting representation
  • Adding auxiliary variable (gadget) increases representation power

12

slide-36
SLIDE 36

Other ingredients in SATNet

Low-rank regularization on S

  • Doubly-exponentially many possible Boolean functions!
  • Low-rank ⇒ Regularize the complexity through number of clauses

Auxiliary variable (hidden nodes)

  • Only SDP with diagonal constraints, limiting representation
  • Adding auxiliary variable (gadget) increases representation power

12

slide-37
SLIDE 37

Illustration: Learning Parity from single bit supervision

  • Parity problem is surprisingly hard for most deep networks to learn

[Shalev-Swartz et al., 2017]

  • Chained (recurrent) SATNet-based network learns parity function for

up to length 40 strings from 10K examples

13

slide-38
SLIDE 38

Illustration: Learning Parity from single bit supervision

  • Parity problem is surprisingly hard for most deep networks to learn

[Shalev-Swartz et al., 2017]

  • Chained (recurrent) SATNet-based network learns parity function for

up to length 40 strings from 10K examples

20 40 60 80 100

Epoch

0.0 0.2 0.4

Error (L = 40)

SATNet LSTM

13

slide-39
SLIDE 39

Illustration: Learning Sudoku

  • Learning 9x9 Sudoku from 9K examples
  • Single SATNet layer on
  • ne-hot-encoded input puzzles
  • Free parameters are

matrix of clauses, randomly initialized

Model Train Test ConvNet 72.6% 0.04% SATNet (ours) 99.8% 98.3%

Original Sudoku.

Model Train Test ConvNet 0% 0% SATNet (ours) 99.7% 98.3%

Permuted Sudoku.

14

slide-40
SLIDE 40

Illustration: Learning Sudoku

  • Learning 9x9 Sudoku from 9K examples
  • Single SATNet layer on
  • ne-hot-encoded input puzzles
  • Free parameters are

matrix of clauses, randomly initialized

Model Train Test ConvNet 72.6% 0.04% SATNet (ours) 99.8% 98.3%

Original Sudoku.

Model Train Test ConvNet 0% 0% SATNet (ours) 99.7% 98.3%

Permuted Sudoku.

14

slide-41
SLIDE 41

Illustration: Learning Sudoku

  • Learning 9x9 Sudoku from 9K examples
  • Single SATNet layer on
  • ne-hot-encoded input puzzles
  • Free parameters are S matrix of clauses,

randomly initialized

Model Train Test ConvNet 72.6% 0.04% SATNet (ours) 99.8% 98.3%

Original Sudoku.

Model Train Test ConvNet 0% 0% SATNet (ours) 99.7% 98.3%

Permuted Sudoku.

14

slide-42
SLIDE 42

Illustration: Learning Sudoku

  • Learning 9x9 Sudoku from 9K examples
  • Single SATNet layer on
  • ne-hot-encoded input puzzles
  • Free parameters are S matrix of clauses,

randomly initialized

Model Train Test ConvNet 72.6% 0.04% SATNet (ours) 99.8% 98.3%

Original Sudoku.

Model Train Test ConvNet 0% 0% SATNet (ours) 99.7% 98.3%

Permuted Sudoku.

14

slide-43
SLIDE 43

Illustration: MNIST Sudoku

Model Train Test ConvNet 0.31% 0% SATNet (ours) 93.6% 63.2%

  • Getting example “correct” requires

correct Sudoku solution and predicting all MNIST test digits correctly

  • % accuracy on correct ConvNet input

15

slide-44
SLIDE 44

Illustration: MNIST Sudoku

Model Train Test ConvNet 0.31% 0% SATNet (ours) 93.6% 63.2%

  • Getting example “correct” requires

correct Sudoku solution and predicting all MNIST test digits correctly

  • % accuracy on correct ConvNet input

15

slide-45
SLIDE 45

Illustration: MNIST Sudoku

Model Train Test ConvNet 0.31% 0% SATNet (ours) 93.6% 63.2%

  • Getting example “correct” requires

correct Sudoku solution and predicting all MNIST test digits correctly

  • % accuracy on correct ConvNet input

15

slide-46
SLIDE 46

Illustration: MNIST Sudoku

Model Train Test ConvNet 0.31% 0% SATNet (ours) 93.6% 63.2%

  • Getting example “correct” requires

correct Sudoku solution and predicting all MNIST test digits correctly

  • % accuracy on correct ConvNet input

15

slide-47
SLIDE 47

Illustration: MNIST Sudoku

Model Train Test ConvNet 0.31% 0% SATNet (ours) 93.6% 63.2%

  • Getting example “correct” requires

correct Sudoku solution and predicting all MNIST test digits correctly

  • 85% accuracy on correct ConvNet input

15

slide-48
SLIDE 48

Code and Colab

Code available at https://github.com/locuslab/SATNet

16

slide-49
SLIDE 49

Conclusion

We presented

  • SATNet, the first differentiable MAXSAT solver as a layer
  • can be integrated into the loop of deep learning systems

whenever neurons have logical constraints, and it learns both constraints and solutions solely from examples Possible extensions:

  • Incorporating known rules into the system
  • Exploiting structures of the clause matrix

Poster at Pacific Ballroom #26

17

slide-50
SLIDE 50

Conclusion

We presented

  • SATNet, the first differentiable MAXSAT solver as a layer
  • can be integrated into the loop of deep learning systems

whenever neurons have logical constraints, and it learns both constraints and solutions solely from examples Possible extensions:

  • Incorporating known rules into the system
  • Exploiting structures of the clause matrix

Poster at Pacific Ballroom #26

17

slide-51
SLIDE 51

Conclusion

We presented

  • SATNet, the first differentiable MAXSAT solver as a layer
  • can be integrated into the loop of deep learning systems

whenever neurons have logical constraints, and it learns both constraints and solutions solely from examples Possible extensions:

  • Incorporating known rules into the system
  • Exploiting structures of the clause matrix

Poster at Pacific Ballroom #26

17