Neural Architecture Search CS 4803 / 7643 Deep Learning Erik - - PowerPoint PPT Presentation

neural architecture search
SMART_READER_LITE
LIVE PREVIEW

Neural Architecture Search CS 4803 / 7643 Deep Learning Erik - - PowerPoint PPT Presentation

Neural Architecture Search CS 4803 / 7643 Deep Learning Erik Wijmans, 10/29/2020 Background 2 Background <latexit


slide-1
SLIDE 1

Erik Wijmans, 10/29/2020

Neural Architecture Search

CS 4803 / 7643 Deep Learning

slide-2
SLIDE 2

Background

2

slide-3
SLIDE 3

Background

3

<latexit sha1_base64="CaktXdkYARXyqYFqNAIrfVL3XGI=">ACWXicbVFda9swFJXdJc2yj2btY1/EwiCBEeyx0sFeStdCH/bQwdIWYmNk5ToRlWQjXY8G4z/Zh8HYX+lDldiMLd0BcQ/nfkj3KC2ksBgEvzx/51mnu9t73n/x8tXrvcGb/Subl4bDlOcyNzcpsyCFhikKlHBTGAqlXCd3n5Z569/gLEi19xVUCs2EKLTHCGTkoGRaSETqoIl4CspFiuEzT6rxOqtHde7oa08gK1eicyeqsriMJGc7+KF8bYZQ14e5zMysyYrHEsRvRsibEyWAYTIN6FMStmRIWlwmg/tonvNSgUYumbWzMCgwrphBwSXU/ai0UDB+yxYwc1QzBTauNs7U9J1T5jTLjTsa6Ub9u6NiytqVSl3leh+7nVuL/8vNSsw+xZXQRYmgeXNRVkqKOV3bTOfCAEe5coRxI9xbKV8ywzi6z+g7E8LtlZ+Sqw+T8GgSfPs4PDlt7eiRQ/KWjEhIjskJuSCXZEo4+UkevI7X9X7nt/z+02p7U9B+Qf+AePvBy1iA=</latexit>min

θ

E(x,y)∼D [L (f (x; θ) , y)]

slide-4
SLIDE 4

Background

4

<latexit sha1_base64="CaktXdkYARXyqYFqNAIrfVL3XGI=">ACWXicbVFda9swFJXdJc2yj2btY1/EwiCBEeyx0sFeStdCH/bQwdIWYmNk5ToRlWQjXY8G4z/Zh8HYX+lDldiMLd0BcQ/nfkj3KC2ksBgEvzx/51mnu9t73n/x8tXrvcGb/Subl4bDlOcyNzcpsyCFhikKlHBTGAqlXCd3n5Z569/gLEi19xVUCs2EKLTHCGTkoGRaSETqoIl4CspFiuEzT6rxOqtHde7oa08gK1eicyeqsriMJGc7+KF8bYZQ14e5zMysyYrHEsRvRsibEyWAYTIN6FMStmRIWlwmg/tonvNSgUYumbWzMCgwrphBwSXU/ai0UDB+yxYwc1QzBTauNs7U9J1T5jTLjTsa6Ub9u6NiytqVSl3leh+7nVuL/8vNSsw+xZXQRYmgeXNRVkqKOV3bTOfCAEe5coRxI9xbKV8ywzi6z+g7E8LtlZ+Sqw+T8GgSfPs4PDlt7eiRQ/KWjEhIjskJuSCXZEo4+UkevI7X9X7nt/z+02p7U9B+Qf+AePvBy1iA=</latexit>min

θ

E(x,y)∼D [L (f (x; θ) , y)]

slide-5
SLIDE 5

Background

5

slide-6
SLIDE 6

Background

6

<latexit sha1_base64="kwMgK7bRn8ZwDzGpM64k+2Cfa3M=">ACcnicbVFdaxQxFM2MH61btavFwUbXYQtlGWmKAp9Ka2KDz5UcNvCZhgy2Tu7oUlmSO5Il2F+gH/PN3+FL/0Bze4MfrReCPdwTu69uSdZqaTDKPoZhLdu37m7tn6vt3H/wcPN/qPHJ6orICxKFRhzLuQEkDY5So4Ky0wHWm4DQ7P1rqp9/AOlmYr7goIdF8ZmQuBUdPpf3vTEuT1jl0lCmOc4FV/XHpqGtwHAOyJtWyrL6Q5PWw4tdutihzEn9p+R90zAFOU5+M59bYpi36WK/7cWsnM1x7foUJuStD+IRtEq6E0Qd2BAujhO+z/YtBCVBoNCcecmcVRiUnOLUihoeqxyUHJxzmcw8dBwDS6pV5Y19JVnpjQvrD8G6Yr9u6Lm2rmFzvzN5T7urYk/6dNKszfJbU0ZYVgRDsorxTFgi79p1NpQaBaeMCFlf6tVMy5QL9L/W8CfH1lW+Ck71R/GYUfXk9ODjs7Fgnz8hLMiQxeUsOyCdyTMZEkF/Bk+B5sB1chk/DF2HnXRh0NVvknwh3rwAOiL7n</latexit>min

f∈F min θ

E(x,y)∼D [L (f (x; θ) , y)]

slide-7
SLIDE 7

Background

7

<latexit sha1_base64="kwMgK7bRn8ZwDzGpM64k+2Cfa3M=">ACcnicbVFdaxQxFM2MH61btavFwUbXYQtlGWmKAp9Ka2KDz5UcNvCZhgy2Tu7oUlmSO5Il2F+gH/PN3+FL/0Bze4MfrReCPdwTu69uSdZqaTDKPoZhLdu37m7tn6vt3H/wcPN/qPHJ6orICxKFRhzLuQEkDY5So4Ky0wHWm4DQ7P1rqp9/AOlmYr7goIdF8ZmQuBUdPpf3vTEuT1jl0lCmOc4FV/XHpqGtwHAOyJtWyrL6Q5PWw4tdutihzEn9p+R90zAFOU5+M59bYpi36WK/7cWsnM1x7foUJuStD+IRtEq6E0Qd2BAujhO+z/YtBCVBoNCcecmcVRiUnOLUihoeqxyUHJxzmcw8dBwDS6pV5Y19JVnpjQvrD8G6Yr9u6Lm2rmFzvzN5T7urYk/6dNKszfJbU0ZYVgRDsorxTFgi79p1NpQaBaeMCFlf6tVMy5QL9L/W8CfH1lW+Ck71R/GYUfXk9ODjs7Fgnz8hLMiQxeUsOyCdyTMZEkF/Bk+B5sB1chk/DF2HnXRh0NVvknwh3rwAOiL7n</latexit>min

f∈F min θ

E(x,y)∼D [L (f (x; θ) , y)]

Set of networks

slide-8
SLIDE 8

Neural Architecture Search

8

slide-9
SLIDE 9

Neural Architecture Search

High Level Overview

9

slide-10
SLIDE 10

Neural Architecture Search

High Level Overview

Search Space

10

slide-11
SLIDE 11

Neural Architecture Search

High Level Overview

Search Space

11

<latexit sha1_base64="kwMgK7bRn8ZwDzGpM64k+2Cfa3M=">ACcnicbVFdaxQxFM2MH61btavFwUbXYQtlGWmKAp9Ka2KDz5UcNvCZhgy2Tu7oUlmSO5Il2F+gH/PN3+FL/0Bze4MfrReCPdwTu69uSdZqaTDKPoZhLdu37m7tn6vt3H/wcPN/qPHJ6orICxKFRhzLuQEkDY5So4Ky0wHWm4DQ7P1rqp9/AOlmYr7goIdF8ZmQuBUdPpf3vTEuT1jl0lCmOc4FV/XHpqGtwHAOyJtWyrL6Q5PWw4tdutihzEn9p+R90zAFOU5+M59bYpi36WK/7cWsnM1x7foUJuStD+IRtEq6E0Qd2BAujhO+z/YtBCVBoNCcecmcVRiUnOLUihoeqxyUHJxzmcw8dBwDS6pV5Y19JVnpjQvrD8G6Yr9u6Lm2rmFzvzN5T7urYk/6dNKszfJbU0ZYVgRDsorxTFgi79p1NpQaBaeMCFlf6tVMy5QL9L/W8CfH1lW+Ck71R/GYUfXk9ODjs7Fgnz8hLMiQxeUsOyCdyTMZEkF/Bk+B5sB1chk/DF2HnXRh0NVvknwh3rwAOiL7n</latexit>min

f∈F min θ

E(x,y)∼D [L (f (x; θ) , y)]

Set of networks

slide-12
SLIDE 12

Neural Architecture Search

High Level Overview

Search Space Search Method

12

slide-13
SLIDE 13

Neural Architecture Search

High Level Overview

Search Space Search Method Evaluation Method

13

Proposed 
 Architecture

slide-14
SLIDE 14

Neural Architecture Search

High Level Overview

Search Space Search Method Evaluation Method

14

Proposed 
 Architecture

slide-15
SLIDE 15

Neural Architecture Search

High Level Overview

Search Space Search Method Evaluation Method

Best Model

15

Proposed 
 Architecture

slide-16
SLIDE 16

Neural Architecture Search

High Level Overview

Search Space Search Method Evaluation Method

Best Model

16

Proposed 
 Architecture

slide-17
SLIDE 17

Neural Architecture Search

Evaluation Method

17

slide-18
SLIDE 18

Neural Architecture Search

Evaluation Method

  • Generally, this is performance on held-out data.

18

slide-19
SLIDE 19

Neural Architecture Search

Evaluation Method

  • Generally, this is performance on held-out data.
  • Evaluation is typically done by (partially) training the network and evaluating its

performance on held-out data.

19

slide-20
SLIDE 20

Neural Architecture Search

High Level Overview

20

Search Space Search Method Evaluation Method

Proposed 
 Architecture

slide-21
SLIDE 21

Neural Architecture Search

High Level Overview

21

Search Space Search Method Evaluation Method

Proposed 
 Architecture

slide-22
SLIDE 22

Search via Reinforcement Learning

22

slide-23
SLIDE 23

Search via Reinforcement Learning

NAS-RL

23

slide-24
SLIDE 24
  • Motivated by the observation that a DNN architecture can be specified by a

string of variable length (i.e. Breadth-first traversal of their DAG)

Search via Reinforcement Learning

NAS-RL

24

slide-25
SLIDE 25
  • Motivated by the observation that a DNN architecture can be specified by a

string of variable length (i.e. Breadth-first traversal of their DAG)

  • Use reinforcement learning to train an RNN that builds the network

Search via Reinforcement Learning

NAS-RL

25

slide-26
SLIDE 26

Input Op 1 Op 2 Op N Softmax

Search via Reinforcement Learning

NAS-RL

26

slide-27
SLIDE 27

Input Op 1 Op 2 Op N Softmax

Search via Reinforcement Learning

NAS-RL

27

slide-28
SLIDE 28

Search via Reinforcement Learning

NAS-RL

28

slide-29
SLIDE 29

Search via Reinforcement Learning

NAS-RL

29

slide-30
SLIDE 30

Search via Reinforcement Learning

NAS-RL

30

slide-31
SLIDE 31

Search via Reinforcement Learning

NAS-RL

31

  • Performance is on-par with other CNNs
  • f the time
slide-32
SLIDE 32
  • This is a very general method

Search via Reinforcement Learning

NAS-RL

32

slide-33
SLIDE 33
  • This is a very general method
  • The cost of that is compute: This used 800 GPUs (for an unspecified amount of

time) and trained >12,000 candidate architectures

Search via Reinforcement Learning

NAS-RL

33

slide-34
SLIDE 34
  • Instead, limit the search space with “blocks”

Search via Reinforcement Learning

NASNet

34

slide-35
SLIDE 35
  • Instead, limit the search space with “blocks”
  • This is similar to “Human Neural Architecture Search”

Search via Reinforcement Learning

NASNet

35

slide-36
SLIDE 36
  • Instead, limit the search space with “blocks”

Search via Reinforcement Learning

NASNet

36

slide-37
SLIDE 37
  • Instead, limit the search space with “blocks”

Search via Reinforcement Learning

NASNet

37

slide-38
SLIDE 38
  • Instead, limit the search space with “blocks”

Search via Reinforcement Learning

NASNet

38

slide-39
SLIDE 39
  • Instead, limit the search space with “blocks”

Search via Reinforcement Learning

NASNet

39

slide-40
SLIDE 40

Normal Cell

hi hi-1

...

hi+1

concat sep! 3x3 avg! 3x3 avg! 3x3 sep! 5x5 sep! 3x3 iden! tity iden! tity sep! 3x3 sep! 5x5 avg! 3x3 add add add add add

Search via Reinforcement Learning

NASNet

40

slide-41
SLIDE 41

Search via Reinforcement Learning

NASNet

Reduction Cell

hi hi-1

...

hi+1

concat avg! 3x3 sep! 5x5 sep! 7x7 sep! 5x5 max! 3x3 sep! 7x7 add add add add add sep! 3x3 iden! tity avg! 3x3 max! 3x3

41

slide-42
SLIDE 42

Search via Reinforcement Learning

NASNet

42

  • Performance is on-par with other CNNs at the time

but with less parameters/compute

slide-43
SLIDE 43

Application

Efficient Neural Networks (MnasNet)

43

slide-44
SLIDE 44

Application

  • One benefit of search via RL is that validation

performance need not be the only metric

Efficient Neural Networks (MnasNet)

44

slide-45
SLIDE 45

Application

  • One benefit of search via RL is that validation

performance need not be the only metric

Efficient Neural Networks (MnasNet)

45

slide-46
SLIDE 46

Application

  • One benefit of search via RL is that validation

performance need not be the only metric

Efficient Neural Networks (MnasNet)

46

slide-47
SLIDE 47

Application

  • One benefit of search via RL is that validation

performance need not be the only metric

Efficient Neural Networks (MnasNet)

47

slide-48
SLIDE 48

Search via Gradient Optimization

Differentiable Architecture Search (DARTS)

48

slide-49
SLIDE 49

Search via Gradient Optimization

Differentiable Architecture Search (DARTS)

49

slide-50
SLIDE 50

Search via Gradient Optimization

Differentiable Architecture Search (DARTS)

50

slide-51
SLIDE 51

Search via Gradient Optimization

Differentiable Architecture Search (DARTS)

51

slide-52
SLIDE 52

Search via Gradient Optimization

Differentiable Architecture Search (DARTS)

52

slide-53
SLIDE 53

Search via Gradient Optimization

Differentiable Architecture Search (DARTS)

53

<latexit sha1_base64="M9eU2h7V0GhTc1H2BdmRbasI0=">AChXicbVFNT9tAEF27tND0Ky3HXlZERWnVWjaiHxdU1F564ABSA0jZ1BpvJmTFeu3ujlEjy/+kv4ob/6abxCAIHWmlp/fe6M3OZKVWjuL4KgfrD18tL7xuPk6bPnL7ovXx27orISB7LQhT3NwKFWBgekSONpaRHyTONJdv59rp9coHWqMD9pVuIohzOjJkoCeSrt/t0WuTKpAF1OgYscaCpB1wdNWl+AbvqCpkjw61/6Xj7nreAC9HZFoR/qHYRQ0XvysY81U/3+N8kVAvlWu+uZtFpS5TrvJSLu9OIoXxe+DpAU91tZh2r0U40JWORqSGpwbJnFJoxosKamx6YjKYQnyHM5w6KGBHN2oXmyx4W8M+aTwvpniC/Y2x015M7N8sw756O7VW1O/k8bVjT5MqVKStCI5dBk0pzKvj8JHysLErSMw9AWuVn5XIKFiT5w3X8EpLVL98HxztR8jGKj3Z7+9/adWyw12yL9VnCPrN9oMdsgGTQRj0gyTYCdfD+Fu+GlpDYO2Z5PdqfDrP/Wzwcg=</latexit>min

α Lval(θ∗(α), α)

s.t. θ∗(α) = min

θ(α) Ltrain(θ, α)

slide-54
SLIDE 54

Search via Gradient Optimization

Differentiable Architecture Search (DARTS)

54

<latexit sha1_base64="M9eU2h7V0GhTc1H2BdmRbasI0=">AChXicbVFNT9tAEF27tND0Ky3HXlZERWnVWjaiHxdU1F564ABSA0jZ1BpvJmTFeu3ujlEjy/+kv4ob/6abxCAIHWmlp/fe6M3OZKVWjuL4KgfrD18tL7xuPk6bPnL7ovXx27orISB7LQhT3NwKFWBgekSONpaRHyTONJdv59rp9coHWqMD9pVuIohzOjJkoCeSrt/t0WuTKpAF1OgYscaCpB1wdNWl+AbvqCpkjw61/6Xj7nreAC9HZFoR/qHYRQ0XvysY81U/3+N8kVAvlWu+uZtFpS5TrvJSLu9OIoXxe+DpAU91tZh2r0U40JWORqSGpwbJnFJoxosKamx6YjKYQnyHM5w6KGBHN2oXmyx4W8M+aTwvpniC/Y2x015M7N8sw756O7VW1O/k8bVjT5MqVKStCI5dBk0pzKvj8JHysLErSMw9AWuVn5XIKFiT5w3X8EpLVL98HxztR8jGKj3Z7+9/adWyw12yL9VnCPrN9oMdsgGTQRj0gyTYCdfD+Fu+GlpDYO2Z5PdqfDrP/Wzwcg=</latexit>min

α Lval(θ∗(α), α)

s.t. θ∗(α) = min

θ(α) Ltrain(θ, α)

slide-55
SLIDE 55

Search via Gradient Optimization

Differentiable Architecture Search (DARTS)

55

<latexit sha1_base64="M9eU2h7V0GhTc1H2BdmRbasI0=">AChXicbVFNT9tAEF27tND0Ky3HXlZERWnVWjaiHxdU1F564ABSA0jZ1BpvJmTFeu3ujlEjy/+kv4ob/6abxCAIHWmlp/fe6M3OZKVWjuL4KgfrD18tL7xuPk6bPnL7ovXx27orISB7LQhT3NwKFWBgekSONpaRHyTONJdv59rp9coHWqMD9pVuIohzOjJkoCeSrt/t0WuTKpAF1OgYscaCpB1wdNWl+AbvqCpkjw61/6Xj7nreAC9HZFoR/qHYRQ0XvysY81U/3+N8kVAvlWu+uZtFpS5TrvJSLu9OIoXxe+DpAU91tZh2r0U40JWORqSGpwbJnFJoxosKamx6YjKYQnyHM5w6KGBHN2oXmyx4W8M+aTwvpniC/Y2x015M7N8sw756O7VW1O/k8bVjT5MqVKStCI5dBk0pzKvj8JHysLErSMw9AWuVn5XIKFiT5w3X8EpLVL98HxztR8jGKj3Z7+9/adWyw12yL9VnCPrN9oMdsgGTQRj0gyTYCdfD+Fu+GlpDYO2Z5PdqfDrP/Wzwcg=</latexit>min

α Lval(θ∗(α), α)

s.t. θ∗(α) = min

θ(α) Ltrain(θ, α)

slide-56
SLIDE 56

Search via Gradient Optimization

Differentiable Architecture Search (DARTS)

56

<latexit sha1_base64="liC1gU+cPOANjsyPoQj9p2yLT98=">ACInicbVDLSgMxFM34tr6qLt0Ei1BFyowo6k5048JFBWsLnTrcSdM2mMkMyR2hDP0WN/6KGxeKuhL8GN2Fr4OBA7nMvNPWEihUHX/XAmJqemZ2bn5gsLi0vLK8XVtWsTp5rxGotlrBshGC6F4jUKHkj0RyiUPJ6eHs29Ot3XBsRqyvsJ7wVQVeJjmCAVgqKx76CUELg0x6QP0IsMdAZheDILsDOSj72OMINzvlcWJ7l+YkKJbcijsC/Uu8nJRIjmpQfPbMUsjrpBJMKbpuQm2MtAomOSDgp8angC7hS5vWqog4qaVjU4c0C2rtGkn1vYpCP1+0QGkTH9KLTJ4QnmtzcU/OaKXaOWplQSYpcsfGiTiopxnTYF20LzRnKviXAtLB/pawHGhjaVgu2BO/3yX/J9V7FO6i4l/ulk9O8jmyQTZJmXjkJyQc1IlNcLIPXkz+TFeXCenFfnfRydcPKZdfIDzucX/OGj5A=</latexit>rαLval(θ∗(α), α)
slide-57
SLIDE 57

Search via Gradient Optimization

Differentiable Architecture Search (DARTS)

57

<latexit sha1_base64="3AeO1Cc/5mWn53lmDIZFEJL7lJU=">ACVHicbVHLSgMxFM2Mr1pfVZdugkVQ0DIji5FNy5cVLAqdMpwJ01taCYTkjvFMvQjdSH4JW5cmD4UXxcCJ+ecm+SeJFoKi0Hw6vkzs3PzC6XF8tLyupaZX3j1ma5YbzBMpmZ+wQsl0LxBgqU/F4bDmki+V3Suxjpd31urMjUDQ40b6XwoERHMEBHxZVeBFqb7JFGChIJcQRSd4FGKWCXgSyuhnHRBzncjbDLEejBl3Gy/2FEA0J9Wvfp5Ky9LxBXqkEtGBf9C8IpqJp1ePKc9TOWJ5yhUyCtc0w0NgqwKBgkg/LUW65BtaDB950UEHKbasYhzKkO45p05m3FJIx+z3jgJSawdp4pyjGexvbUT+pzVz7Jy2CqF0jlyxyUWdXFLM6Ch2haGM5QDB4AZ4d5KWRcMHT/UHYhL9H/gtuD2vhcS24PqenU/jKJEtsk12SUhOyBm5JHXSIw8kTePeJ734r37M/7cxOp705N8qP81Q+JKbPQ</latexit>⇡ rαLval(θ rθLtrain(θ, α), α) <latexit sha1_base64="liC1gU+cPOANjsyPoQj9p2yLT98=">ACInicbVDLSgMxFM34tr6qLt0Ei1BFyowo6k5048JFBWsLnTrcSdM2mMkMyR2hDP0WN/6KGxeKuhL8GN2Fr4OBA7nMvNPWEihUHX/XAmJqemZ2bn5gsLi0vLK8XVtWsTp5rxGotlrBshGC6F4jUKHkj0RyiUPJ6eHs29Ot3XBsRqyvsJ7wVQVeJjmCAVgqKx76CUELg0x6QP0IsMdAZheDILsDOSj72OMINzvlcWJ7l+YkKJbcijsC/Uu8nJRIjmpQfPbMUsjrpBJMKbpuQm2MtAomOSDgp8angC7hS5vWqog4qaVjU4c0C2rtGkn1vYpCP1+0QGkTH9KLTJ4QnmtzcU/OaKXaOWplQSYpcsfGiTiopxnTYF20LzRnKviXAtLB/pawHGhjaVgu2BO/3yX/J9V7FO6i4l/ulk9O8jmyQTZJmXjkJyQc1IlNcLIPXkz+TFeXCenFfnfRydcPKZdfIDzucX/OGj5A=</latexit>rαLval(θ∗(α), α)
slide-58
SLIDE 58

Search via Gradient Optimization

Differentiable Architecture Search (DARTS)

58

<latexit sha1_base64="1WZXfNZSeRmUCjHqYxmaX0vV32I=">ACIHicbVDLSgMxFM34tr6qLt0Ei6AgZUaUuiy6ceFCwWqhU4Y7aWqDmUxI7hTL0E9x46+4caGI7vRrTB8LrR4IHM45l9x7Yi2FRd/9KamZ2bn5hcWC0vLK6trxfWNa5tmhvEaS2Vq6jFYLoXiNRQoeV0bDks+U18dzrwb7rcWJGqK+xp3kzgVom2YIBOioqVELQ26T0NFcQSohCk7gANE8AOA5mf96O8C7K/G2KHI+zTUWAvKpb8sj8E/UuCMSmRMS6i4kfYSlmWcIVMgrWNwNfYzMGgYJL3C2FmuQZ2B7e84aiChNtmPjywT3ec0qLt1LinkA7VnxM5JNb2ktglB4vbSW8g/uc1MmwfN3OhdIZcsdFH7UxSTOmgLdoShjOUPUeAGeF2pawDBhi6TguhGDy5L/k+qAcHJX9y8NS9WRcxwLZItklwSkQqrkjFyQGmHkgTyRF/LqPXrP3pv3PopOeOZTfIL3tc3Vjyjqw=</latexit>⇡ rαLval(θ, α) <latexit sha1_base64="3AeO1Cc/5mWn53lmDIZFEJL7lJU=">ACVHicbVHLSgMxFM2Mr1pfVZdugkVQ0DIji5FNy5cVLAqdMpwJ01taCYTkjvFMvQjdSH4JW5cmD4UXxcCJ+ecm+SeJFoKi0Hw6vkzs3PzC6XF8tLyupaZX3j1ma5YbzBMpmZ+wQsl0LxBgqU/F4bDmki+V3Suxjpd31urMjUDQ40b6XwoERHMEBHxZVeBFqb7JFGChIJcQRSd4FGKWCXgSyuhnHRBzncjbDLEejBl3Gy/2FEA0J9Wvfp5Ky9LxBXqkEtGBf9C8IpqJp1ePKc9TOWJ5yhUyCtc0w0NgqwKBgkg/LUW65BtaDB950UEHKbasYhzKkO45p05m3FJIx+z3jgJSawdp4pyjGexvbUT+pzVz7Jy2CqF0jlyxyUWdXFLM6Ch2haGM5QDB4AZ4d5KWRcMHT/UHYhL9H/gtuD2vhcS24PqenU/jKJEtsk12SUhOyBm5JHXSIw8kTePeJ734r37M/7cxOp705N8qP81Q+JKbPQ</latexit>⇡ rαLval(θ rθLtrain(θ, α), α) <latexit sha1_base64="liC1gU+cPOANjsyPoQj9p2yLT98=">ACInicbVDLSgMxFM34tr6qLt0Ei1BFyowo6k5048JFBWsLnTrcSdM2mMkMyR2hDP0WN/6KGxeKuhL8GN2Fr4OBA7nMvNPWEihUHX/XAmJqemZ2bn5gsLi0vLK8XVtWsTp5rxGotlrBshGC6F4jUKHkj0RyiUPJ6eHs29Ot3XBsRqyvsJ7wVQVeJjmCAVgqKx76CUELg0x6QP0IsMdAZheDILsDOSj72OMINzvlcWJ7l+YkKJbcijsC/Uu8nJRIjmpQfPbMUsjrpBJMKbpuQm2MtAomOSDgp8angC7hS5vWqog4qaVjU4c0C2rtGkn1vYpCP1+0QGkTH9KLTJ4QnmtzcU/OaKXaOWplQSYpcsfGiTiopxnTYF20LzRnKviXAtLB/pawHGhjaVgu2BO/3yX/J9V7FO6i4l/ulk9O8jmyQTZJmXjkJyQc1IlNcLIPXkz+TFeXCenFfnfRydcPKZdfIDzucX/OGj5A=</latexit>rαLval(θ∗(α), α)
slide-59
SLIDE 59

Search via Gradient Optimization

Differentiable Architecture Search (DARTS)

59

  • Finds networks with very little computation cost (~1 GPU day) that perform

better or on-par with existing NAS methods

slide-60
SLIDE 60

Search via Scoring

60

slide-61
SLIDE 61

Search via Scoring (without training)

Neural Architecture Search without Training

61

slide-62
SLIDE 62

Search via Scoring (without training)

Neural Architecture Search without Training

62

  • How well a given architecture will do when fully trained can be approximated by

how “flexible” the network is.

slide-63
SLIDE 63

Search via Scoring (without training)

Neural Architecture Search without Training

63

  • How well a given architecture will do when fully trained can be approximated by

how “flexible” the network is. This “flexibility” can be determined without training.

slide-64
SLIDE 64

Search via Scoring (without training)

Neural Architecture Search without Training

64

  • How well a given architecture will do when fully trained can be approximated by

how “flexible” the network is. This “flexibility” can be determined without training.

slide-65
SLIDE 65

Summary

65

  • Neural Architecture Search (NAS) focuses on automatically finding highly

performant network architectures

slide-66
SLIDE 66

Summary

66

  • Neural Architecture Search (NAS) focuses on automatically finding highly

performant network architectures

slide-67
SLIDE 67

Summary

67

  • Neural Architecture Search (NAS) focuses on automatically finding highly

performant network architectures

  • Search is commonly done with either RL or gradient methods (e.g. DARTS)
slide-68
SLIDE 68

Summary

68

  • Neural Architecture Search (NAS) focuses on automatically finding highly

performant network architectures

  • Search is commonly done with either RL or gradient methods (e.g. DARTS)
  • One fruitful use has been searching for compute efficient networks