[PPT] - Integratingdeeplearningandlogic DeepLearning No constraints on PowerPoint Presentation

SLIDE 1

SATNet :

Bridging deep learning and logical reasoning using a differentiable satisfiability solver

Po-Wei Wang 1 Priya L. Donti 1 Bryan Wilder 2

J. Zico Kolter 1,3

1 School of Computer Science, 2 School of Engineering and Applied Sciences, 3 Bosch Center for Artificial Intelligence

Carnegie Mellon University Harvard University

1

SLIDE 2

Integrating deep learning and logic

Deep Learning No constraints on output Differentiable Solved via gradient optimizers

Sudoku image: ”12 Jan 2006” by SudoFlickr is licensed under CC BY-SA 2.0 2

SLIDE 3

Integrating deep learning and logic

Deep Learning No constraints on output Differentiable Solved via gradient optimizers Logical Inference Rich constraints on output Discrete input/output Solved via tree search

Sudoku image: ”12 Jan 2006” by SudoFlickr is licensed under CC BY-SA 2.0 2

SLIDE 4

Integrating deep learning and logic

Deep Learning No constraints on output Differentiable Solved via gradient optimizers

→

Logical Inference Rich constraints on output Discrete input/output Solved via tree search

Sudoku image: ”12 Jan 2006” by SudoFlickr is licensed under CC BY-SA 2.0 2

SLIDE 5

Integrating deep learning and logic

Deep Learning No constraints on output Differentiable Solved via gradient optimizers

+

Logical Inference Rich constraints on output Discrete input/output Solved via tree search

Sudoku image: ”12 Jan 2006” by SudoFlickr is licensed under CC BY-SA 2.0 2

SLIDE 6

Integrating deep learning and logic

Deep Learning No constraints on output Differentiable Solved via gradient optimizers

+

Logical Inference Rich constraints on output Discrete input/output Solved via tree search

Sudoku image: ”12 Jan 2006” by SudoFlickr is licensed under CC BY-SA 2.0 2

SLIDE 7

This talk is not about . . .

Not about learning to find SAT solutions [Selsam et al. 2019]

but about learning both constraints and solution from examples

Not about using DL and SAT in a multi-staged manner

doing so requires prior knowledge on the stucture and constraints
further, current SAT solvers cannot accept probability inputs

3

SLIDE 8

This talk is not about . . .

Not about learning to find SAT solutions [Selsam et al. 2019]

but about learning both constraints and solution from examples

Not about using DL and SAT in a multi-staged manner

doing so requires prior knowledge on the stucture and constraints
further, current SAT solvers cannot accept probability inputs

3

SLIDE 9

This talk is not about . . .

Not about learning to find SAT solutions [Selsam et al. 2019]

but about learning both constraints and solution from examples

Not about using DL and SAT in a multi-staged manner

doing so requires prior knowledge on the stucture and constraints
further, current SAT solvers cannot accept probability inputs

3

SLIDE 10

This talk is about

A layer that enables end-to-end learning of both the constraints

and solutions of logic problems within deep networks...

A smoothed differentiable (maximum) satisfiability solver

that can be integrated into the loop of deep learning systems.

4

SLIDE 11

This talk is about

A layer that enables end-to-end learning of both the constraints

and solutions of logic problems within deep networks...

A smoothed differentiable (maximum) satisfiability solver

that can be integrated into the loop of deep learning systems.

4

SLIDE 12

Review of SAT problems

Example SAT problem:

v2 ∧ (v1 ∨ ¬v2) ∧ (v2 ∨ ¬v3)

Typical SAT: Clause matrix given, find satisfying assignment Our setting: Clause matrix is parameters of the layer (to be learned)

5

SLIDE 13

Review of SAT problems

Example SAT problem:

v2 ∧ (v1 ∨ ¬v2) ∧ (v2 ∨ ¬v3) ⇓ S =   1 1 −1 1 −1   v2 v1 ∨ ¬v2 v2 ∨ ¬v3

Typical SAT: Clause matrix given, find satisfying assignment Our setting: Clause matrix is parameters of the layer (to be learned)

5

SLIDE 14

Review of SAT problems

Example SAT problem:

v2 ∧ (v1 ∨ ¬v2) ∧ (v2 ∨ ¬v3) ⇓ S =   1 1 −1 1 −1   v2 v1 ∨ ¬v2 v2 ∨ ¬v3

Typical SAT: Clause matrix given, find satisfying assignment Our setting: Clause matrix is parameters of the layer (to be learned)

5

SLIDE 15

MAXSAT Problem

MAXSAT is the optimization variant of SAT solving SAT: Find feasible vi s.t. v2 ∧ (v1 ∨ ¬v2) ∧ (v2 ∨ ¬v3) MAXSAT: maximize # of satisfiable clauses Relax the binary variables to smooth & continuous spheres Semidefinite relaxation (Goemans-Williamson, 1995),

minimize

s.t.

diag

6

SLIDE 16

MAXSAT Problem

MAXSAT is the optimization variant of SAT solving SAT: Find feasible vi s.t. v2 ∧ (v1 ∨ ¬v2) ∧ (v2 ∨ ¬v3) MAXSAT: maximize # of satisfiable clauses Relax the binary variables to smooth & continuous spheres

vi ∈ {+1, −1}

equiv

− − − → |vi| = 1, vi ∈ R1 relax − − → ∥vi∥ = 1, vi ∈ Rk

Semidefinite relaxation (Goemans-Williamson, 1995),

minimize

s.t.

diag

6

SLIDE 17

MAXSAT Problem

MAXSAT is the optimization variant of SAT solving SAT: Find feasible vi s.t. v2 ∧ (v1 ∨ ¬v2) ∧ (v2 ∨ ¬v3) MAXSAT: maximize # of satisfiable clauses Relax the binary variables to smooth & continuous spheres

vi ∈ {+1, −1}

equiv

− − − → |vi| = 1, vi ∈ R1 relax − − → ∥vi∥ = 1, vi ∈ Rk

Semidefinite relaxation (Goemans-Williamson, 1995), X = V TV

minimize ⟨S TS, X ⟩, s.t. X ⪰ 0, diag(X ) = 1.

6

SLIDE 18

SATNet: MAXSAT SDP as a layer

7

SLIDE 19

SATNet: MAXSAT SDP as a layer

7

SLIDE 20

SATNet: MAXSAT SDP as a layer

7

SLIDE 21

SATNet: MAXSAT SDP as a layer

7

SLIDE 22

SATNet: MAXSAT SDP as a layer

7

SLIDE 23

SATNet: MAXSAT SDP as a layer

7

SLIDE 24

SATNet: MAXSAT SDP as a layer

7

SLIDE 25

Fast solution to MAXSAT SDP approximation

Efficiently solve via low-rank factorization X = V TV , V ∈ Rk×n, ∥vi∥ = 1 (a.k.a. Burer-Monteiro method), and block coordinate descent iters

vi = −normalize(VS Tsi − ∥si∥2vi).

For , the non-convex iterates are guaranteed to converge to global optima of SDP [Wang et al., 2018; Erdogdu et al., 2018] Complexity reduced from

log log

f interior point methods to

log

f our method, where

is #clauses.

8

SLIDE 26

Fast solution to MAXSAT SDP approximation

Efficiently solve via low-rank factorization X = V TV , V ∈ Rk×n, ∥vi∥ = 1 (a.k.a. Burer-Monteiro method), and block coordinate descent iters

vi = −normalize(VS Tsi − ∥si∥2vi).

For k >

√ 2n, the non-convex iterates are guaranteed to converge to

global optima of SDP [Wang et al., 2018; Erdogdu et al., 2018] Complexity reduced from

log log

f interior point methods to

log

f our method, where

is #clauses.

8

SLIDE 27

Fast solution to MAXSAT SDP approximation

Efficiently solve via low-rank factorization X = V TV , V ∈ Rk×n, ∥vi∥ = 1 (a.k.a. Burer-Monteiro method), and block coordinate descent iters

vi = −normalize(VS Tsi − ∥si∥2vi).

For k >

√ 2n, the non-convex iterates are guaranteed to converge to

global optima of SDP [Wang et al., 2018; Erdogdu et al., 2018] Complexity reduced from O(n6 log log 1

ϵ) of interior point methods to

O(n1.5m log 1

ϵ) of our method, where m is #clauses.

8

SLIDE 28

Differentiate through the optimization problem

9

SLIDE 29

Differentiate through the optimization problem

When converged, the procedure satisfies the fixed-point equation

vi = −normalize(VS Tsi − ∥si∥2vi), ∀i

The fixed-point equation of the block coordinate descent provides an implicit function definition of the solution [Amos et al. 2017] normalize Thus, can apply implicit function theorem on the total derivatives

=

Solve the above linear system of to backprop

10

SLIDE 30

Differentiate through the optimization problem

When converged, the procedure satisfies the fixed-point equation

vi = −normalize(VS Tsi − ∥si∥2vi), ∀i

The fixed-point equation of the block coordinate descent provides an implicit function definition of the solution [Amos et al. 2017]

Fi(S, V (S)) = vi + normalize(VS Tsi − ∥si∥2vi) = 0, ∀i

Thus, can apply implicit function theorem on the total derivatives

=

Solve the above linear system of to backprop

10

SLIDE 31

Differentiate through the optimization problem

When converged, the procedure satisfies the fixed-point equation

vi = −normalize(VS Tsi − ∥si∥2vi), ∀i

The fixed-point equation of the block coordinate descent provides an implicit function definition of the solution [Amos et al. 2017]

Fi(S, V (S)) = vi + normalize(VS Tsi − ∥si∥2vi) = 0, ∀i

Thus, can apply implicit function theorem on the total derivatives

∂⃗ F(⃗ S, ⃗ V (S)) ∂⃗ S = 0 = ⇒ ∂⃗ F(⃗ S, ⃗ V ) ∂⃗ S + ∂⃗ F(⃗ S, ⃗ V ) ∂ ⃗ V · ∂ ⃗ V ∂⃗ S = 0

Solve the above linear system of ∂ ⃗

V /∂⃗ S to backprop

10

SLIDE 32

SATNet: MAXSAT SDP as a layer

11

SLIDE 33

Other ingredients in SATNet

Low-rank regularization on S

Doubly-exponentially many possible Boolean functions!
Low-rank

Regularize the complexity through number of clauses Auxiliary variable (hidden nodes)

Only SDP with diagonal constraints, limiting representation
Adding auxiliary variable (gadget) increases representation power

12

SLIDE 34

Other ingredients in SATNet

Low-rank regularization on S

Doubly-exponentially many possible Boolean functions!
Low-rank ⇒ Regularize the complexity through number of clauses

Auxiliary variable (hidden nodes)

Only SDP with diagonal constraints, limiting representation
Adding auxiliary variable (gadget) increases representation power

12

SLIDE 35

Other ingredients in SATNet

Low-rank regularization on S

Doubly-exponentially many possible Boolean functions!
Low-rank ⇒ Regularize the complexity through number of clauses

Auxiliary variable (hidden nodes)

Only SDP with diagonal constraints, limiting representation
Adding auxiliary variable (gadget) increases representation power

12

SLIDE 36

Other ingredients in SATNet

Low-rank regularization on S

Doubly-exponentially many possible Boolean functions!
Low-rank ⇒ Regularize the complexity through number of clauses

Auxiliary variable (hidden nodes)

Only SDP with diagonal constraints, limiting representation
Adding auxiliary variable (gadget) increases representation power

12

SLIDE 37

Illustration: Learning Parity from single bit supervision

Parity problem is surprisingly hard for most deep networks to learn

[Shalev-Swartz et al., 2017]

Chained (recurrent) SATNet-based network learns parity function for

up to length 40 strings from 10K examples

13

SLIDE 38

Illustration: Learning Parity from single bit supervision

Parity problem is surprisingly hard for most deep networks to learn

[Shalev-Swartz et al., 2017]

Chained (recurrent) SATNet-based network learns parity function for

up to length 40 strings from 10K examples

20 40 60 80 100

Epoch

0.0 0.2 0.4

Error (L = 40)

SATNet LSTM

13

SLIDE 39

Illustration: Learning Sudoku

Learning 9x9 Sudoku from 9K examples
Single SATNet layer on
ne-hot-encoded input puzzles
Free parameters are

matrix of clauses, randomly initialized

Model Train Test ConvNet 72.6% 0.04% SATNet (ours) 99.8% 98.3%

Original Sudoku.

Model Train Test ConvNet 0% 0% SATNet (ours) 99.7% 98.3%

Permuted Sudoku.

14

SLIDE 40

Illustration: Learning Sudoku

Learning 9x9 Sudoku from 9K examples
Single SATNet layer on
ne-hot-encoded input puzzles
Free parameters are

matrix of clauses, randomly initialized

Model Train Test ConvNet 72.6% 0.04% SATNet (ours) 99.8% 98.3%

Original Sudoku.

Model Train Test ConvNet 0% 0% SATNet (ours) 99.7% 98.3%

Permuted Sudoku.

14

SLIDE 41

Illustration: Learning Sudoku

Learning 9x9 Sudoku from 9K examples
Single SATNet layer on
ne-hot-encoded input puzzles
Free parameters are S matrix of clauses,

randomly initialized

Model Train Test ConvNet 72.6% 0.04% SATNet (ours) 99.8% 98.3%

Original Sudoku.

Model Train Test ConvNet 0% 0% SATNet (ours) 99.7% 98.3%

Permuted Sudoku.

14

SLIDE 42

Illustration: Learning Sudoku

Learning 9x9 Sudoku from 9K examples
Single SATNet layer on
ne-hot-encoded input puzzles
Free parameters are S matrix of clauses,

randomly initialized

Model Train Test ConvNet 72.6% 0.04% SATNet (ours) 99.8% 98.3%

Original Sudoku.

Model Train Test ConvNet 0% 0% SATNet (ours) 99.7% 98.3%

Permuted Sudoku.

14

SLIDE 43

Illustration: MNIST Sudoku

Model Train Test ConvNet 0.31% 0% SATNet (ours) 93.6% 63.2%

Getting example “correct” requires

correct Sudoku solution and predicting all MNIST test digits correctly

% accuracy on correct ConvNet input

15

SLIDE 44

Illustration: MNIST Sudoku

Model Train Test ConvNet 0.31% 0% SATNet (ours) 93.6% 63.2%

Getting example “correct” requires

correct Sudoku solution and predicting all MNIST test digits correctly

% accuracy on correct ConvNet input

15

SLIDE 45

Illustration: MNIST Sudoku

Model Train Test ConvNet 0.31% 0% SATNet (ours) 93.6% 63.2%

Getting example “correct” requires

correct Sudoku solution and predicting all MNIST test digits correctly

% accuracy on correct ConvNet input

15

SLIDE 46

Illustration: MNIST Sudoku

Model Train Test ConvNet 0.31% 0% SATNet (ours) 93.6% 63.2%

Getting example “correct” requires

correct Sudoku solution and predicting all MNIST test digits correctly

% accuracy on correct ConvNet input

15

SLIDE 47

Illustration: MNIST Sudoku

Model Train Test ConvNet 0.31% 0% SATNet (ours) 93.6% 63.2%

Getting example “correct” requires

correct Sudoku solution and predicting all MNIST test digits correctly

85% accuracy on correct ConvNet input

15

SLIDE 48

Code and Colab

Code available at https://github.com/locuslab/SATNet

16

SLIDE 49

Conclusion

We presented

SATNet, the first differentiable MAXSAT solver as a layer
can be integrated into the loop of deep learning systems

whenever neurons have logical constraints, and it learns both constraints and solutions solely from examples Possible extensions:

Incorporating known rules into the system
Exploiting structures of the clause matrix

Poster at Pacific Ballroom #26

17

SLIDE 50

Conclusion

We presented

SATNet, the first differentiable MAXSAT solver as a layer
can be integrated into the loop of deep learning systems

whenever neurons have logical constraints, and it learns both constraints and solutions solely from examples Possible extensions:

Incorporating known rules into the system
Exploiting structures of the clause matrix

Poster at Pacific Ballroom #26

17

SLIDE 51

Conclusion

We presented

SATNet, the first differentiable MAXSAT solver as a layer
can be integrated into the loop of deep learning systems

whenever neurons have logical constraints, and it learns both constraints and solutions solely from examples Possible extensions:

Incorporating known rules into the system
Exploiting structures of the clause matrix

Poster at Pacific Ballroom #26

17