@aaronrinehart @verica_io #chaosengineering @aaronrinehart - - PowerPoint PPT Presentation

aaronrinehart verica io chaosengineering aaronrinehart
SMART_READER_LITE
LIVE PREVIEW

@aaronrinehart @verica_io #chaosengineering @aaronrinehart - - PowerPoint PPT Presentation

@aaronrinehart @verica_io #chaosengineering @aaronrinehart @verica_io #chaosengineering CONFIDENTIAL Security Precognition @aaronrinehart @verica_io #chaosengineering Resilience is the story of the outage that never happened. - John


slide-1
SLIDE 1 @aaronrinehart @verica_io #chaosengineering
slide-2
SLIDE 2 @aaronrinehart @verica_io #chaosengineering
slide-3
SLIDE 3 CONFIDENTIAL
slide-4
SLIDE 4

Security Precognition

@aaronrinehart @verica_io #chaosengineering
slide-5
SLIDE 5 @aaronrinehart

“Resilience is the story of the

  • utage that never happened.”
  • John Allspaw
slide-6
SLIDE 6 6

About A.A.Ron

  • CTO of Stealthy Startup
  • Former Chief Security Architect
@UnitedHealth responsible for security engineering strategy
  • Led the DevOps and Open Source
Transformation at UnitedHealth Group
  • Former (DOD, NASA, DHS, CollegeBoard )
  • Frequent speaker and author on Chaos
Engineering & Security
  • Pioneer behind Security Chaos Engineering
  • Led ChaoSlingr team at UnitedHealth
slide-7
SLIDE 7

In this Session we will cover

slide-8
SLIDE 8

Our systems have evolved beyond human ability to mentally model their behavior.

8
slide-9
SLIDE 9

Our systems have evolved beyond human ability to mentally model their behavior.

9

everyone

slide-10
SLIDE 10
slide-11
SLIDE 11 Circuit Breaker Patterns 11

Continuous Delivery Distribute d Systems Blue/Green Deployments

Cloud Computing Service Mesh

Containers

Immutable Infrastructure

Infracode Continuous Integration

Microservice Architectures

API Auto Canaries CI/CD DevOps

Automation Pipelines

Complex?

slide-12
SLIDE 12 12

Mostly Monolithic

Requires Domain Knowledge Prevention focused Poorly Aligned Defense in Depth

Stateful in nature

DevSecOps not widely adopted

Security?

Expert Systems Adversary Focused

slide-13
SLIDE 13

Simplify?

slide-14
SLIDE 14

Software Only Increases in Complexity

slide-15
SLIDE 15

Accidental Complexity Essential Complexity

Software Complexity

slide-16
SLIDE 16

Woods Theorem: “As the complexity of a system increases, the accuracy of any single agent’s

  • wn model of that system decreases”
  • Dr. David Woods
slide-17
SLIDE 17

How well do you really understand how your system works?

slide-18
SLIDE 18

Difficult to Grok behavior

slide-19
SLIDE 19

So what does all of this have to do with Security?

slide-20
SLIDE 20

Failure Happens.

slide-21
SLIDE 21

Incidents & System Outages are

Expensive

slide-22
SLIDE 22

Security Incidents are Subjective in Nature

slide-23
SLIDE 23

We really don't know

Where? Why? Who? What? How?

very much

slide-24
SLIDE 24

Lets face it, when outages happen…..

slide-25
SLIDE 25

Teams spend too much time reacting to

  • utages instead
  • f building more

resilient systems.

slide-26
SLIDE 26

“Response” is the problem with Incident Response

slide-27
SLIDE 27

“Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the system’s ability to withstand turbulent conditions”

slide-28
SLIDE 28
slide-29
SLIDE 29
slide-30
SLIDE 30

A P Ch A

Control

A

Experiment
slide-31
SLIDE 31

Who is doing Chaos?

slide-32
SLIDE 32
slide-33
SLIDE 33

Security Engineering

slide-34
SLIDE 34

Security Engineering

slide-35
SLIDE 35

People Operate Differently when they expect things to fail

slide-36
SLIDE 36
slide-37
SLIDE 37
slide-38
SLIDE 38

The Normal Condition of a Human & Systems they Build is to

FAIL

slide-39
SLIDE 39

We need failure to Learn & Grow

39
slide-40
SLIDE 40

Post Mortem = Preparation

Lets Flip the Model

slide-41
SLIDE 41

Bring Order through Chaos

slide-42
SLIDE 42

Use Chaos Engineering to initiate Objective Feedback Loops about Security Effectiveness

slide-43
SLIDE 43

Proactively Manage & Measure

Validate Runbooks Measure Team Skills Determine Control Effectiveness Learn new insights into system behavior Transfer knowledge Build a learning culture

slide-44
SLIDE 44

Testing vs. Experimentation

slide-45
SLIDE 45

Security Crayon Differences

Noisy distributed system behavior Not geared for Cascading Events Point-in-time even if Automated Performed by Security Teams with Specialized skill sets

slide-46
SLIDE 46

Security Chaos Differences

Distributed Systems Focus Goal: Experimentation Human Factors focused Small Isolated Scope Focus on Cascading Events Performed by Mixed Engineering Teams in Gameday During business hours

slide-47
SLIDE 47

2018 Causes of Data Breaches

slide-48
SLIDE 48

2018 Causes of Data Breaches

slide-49
SLIDE 49

2018 Causes of Data Breaches

slide-50
SLIDE 50

2018 Causes of Data Breaches

slide-51
SLIDE 51

‘Human Error’, Root Cause, & Blame Culture

slide-52
SLIDE 52

Proactively Manage & Measure

slide-53
SLIDE 53

Continuous

SECURITY

Validation

slide-54
SLIDE 54

Build Confidence in What Actually Works

slide-55
SLIDE 55

So how does it work?

slide-56
SLIDE 56

Stop looking for better answers and start asking better questions.

  • John Allspaw
slide-57
SLIDE 57

What is the system actually doing?

slide-58
SLIDE 58

What is the system actually doing? Has it done this before?

slide-59
SLIDE 59

What is the system actually doing? Has it done this before? Why is it behaving that way?

slide-60
SLIDE 60

What is the system actually doing? Has it done this before? Why is it behaving that way? What is it supposed to do next?

slide-61
SLIDE 61

What is the system actually doing? Has it done this before? Why is it behaving that way? What is it supposed to do next? How did it get into this state?

slide-62
SLIDE 62

How does My Security Really Work?

slide-63
SLIDE 63

What evidence do I have to prove it?

slide-64
SLIDE 64 64

An Open Source Tool

slide-65
SLIDE 65
  • ChatOps Integration
  • Configuration-as-Code
  • Example Code & Open Framework

ChaoSlingr Product Features

  • Serverless App in AWS
  • 100% Native AWS
  • Configurable Operational Mode &
Frequency
  • Opt-In | Opt-Out Model
slide-66
SLIDE 66 Hypothesis: If someone accidentally or maliciously introduced a misconfigured port then we would immediately detect, block, and alert on the event. Alert SOC? Config Mgmt? Misconfigured Port Injection IR Triage Log data? Wait... Firewall?
slide-67
SLIDE 67 Result: Hypothesis disproved. Firewall did not detect
  • r block the change on all instances. Standard Port
AAA security policy out of sync on the Portal Team
  • instances. Port change did not trigger an alert and
log data indicated successful change audit. However we unexpectedly learned the configuration mgmt tool caught change and alerted the SoC. Alert SOC? Config Mgmt? Misconfigured Port Injection IR Triage Log data? Wait... Firewall?
slide-68
SLIDE 68

More Experiment Examples

  • Internet exposed

Kubernetes API

  • Unauthorized Bad

Container Repo

  • Unencrypted S3 Bucket
  • Disable MFA
  • Bad AWS Automated Block

Rule

  • Software Secret Clear

Text Disclosure

  • Permission collision in

Shared IAM Role Policy

  • Disabled Service Event

Logging

  • Introduce Latency on

Security Controls

  • API Gateway Shutdown
slide-69
SLIDE 69

Q&A

@aaronrinehart aaron@verica.io

slide-70
SLIDE 70

Thank you!

@aaronrinehart aaron@verica.io

slide-71
SLIDE 71 CONFIDENTIAL