Provenance-based Intrusion Detection Thomas Pasquier University of - - PowerPoint PPT Presentation

provenance based intrusion detection
SMART_READER_LITE
LIVE PREVIEW

Provenance-based Intrusion Detection Thomas Pasquier University of - - PowerPoint PPT Presentation

Provenance-based Intrusion Detection Thomas Pasquier University of Bristol https://tfjmp.org 12/11/2020 1 Talk loosely based on following publications Han et al. SIGL: Securing Software Installations Through Deep Graph Learning ,


slide-1
SLIDE 1

Provenance-based Intrusion Detection

Thomas Pasquier University of Bristol https://tfjmp.org 12/11/2020

1

slide-2
SLIDE 2

Talk loosely based on following publications

  • Han et al. “SIGL: Securing Software Installations Through Deep Graph

Learning”, USENIX Security 2021

  • Han et al. “UNICORN: Revisiting Host-Based Intrusion Detection in the Age of

Data Provenance”, NDSS 2020

  • Pasquier et al. “Runtime Analysis of Whole-System Provenance”, ACM CCS

2018

  • Pasquier et al. “Practical Whole-System Provenance Capture”, ACM SoCC 2017

2

slide-3
SLIDE 3

Motivation: System call based intrusion detection

System Calls

3

slide-4
SLIDE 4

Motivation: System call based intrusion detection

Identify abnormal patterns System Calls

4

slide-5
SLIDE 5

Motivation: System call based intrusion detection

Identify abnormal patterns Hidden among benign actions System Calls

5

slide-6
SLIDE 6

Motivation: System call based intrusion detection

Identify abnormal patterns Hidden among benign actions Masquerading as benign action System Calls

6

slide-7
SLIDE 7

Motivation: System call based intrusion detection

Identify abnormal patterns Hidden among benign actions Masquerading as benign action Over a long period of time [...] [...] System Calls

7

slide-8
SLIDE 8

What is provenance?

8

slide-9
SLIDE 9

What is provenance?

  • From the French “provenir” meaning “coming from”
  • Formal set of documents describing the origin of an art piece
  • Sequence of
  • Formal ownership
  • Custody
  • Places of storage
  • Used for authentication

9

slide-10
SLIDE 10

What is data-provenance?

  • Represent interactions between objects of different types
  • Data-items (entities)
  • Processing (activities)
  • Individuals and Organisations (agents)
  • Represented as a directed acyclic graph (think information flows)
  • Edges represent interactions between objects’ states as dependencies
  • It is a representation of history of a system execution
  • Immutable (unless it’s 1984)
  • No dependency to the future

10

slide-11
SLIDE 11

How is this useful?

11

slide-12
SLIDE 12

Provenance-based intrusion detection

▪ Intuition: provenance graph exposes causality relationships

between events

12

slide-13
SLIDE 13

Provenance-based intrusion detection

▪ Intuition: provenance graph exposes causality relationships

between events

13

slide-14
SLIDE 14

Provenance-based intrusion detection

Related events are connected even across long period of time

14

slide-15
SLIDE 15

How to perform detection?

15

slide-16
SLIDE 16

Assumptions (and limitations)

  • Runtime detection
  • We target environment with minimal human intervention
  • relatively consistent behaviour
  • e.g. web servers, CI pipelines etc...
  • Build a model of system behaviour (unsupervised training)
  • in a controlled environment
  • from a representative workload (this is hard!)
  • Detect deviation from the model
  • Several approaches being explored…

16

slide-17
SLIDE 17

Example: UNICORN

▪ Han et al. “UNICORN: Runtime Provenance-Based Detector for Advanced Persistent Threats”, NDSS 2020

17

slide-18
SLIDE 18

Example: UNICORN

1) Graph streamed in, converted to histogram, labelled using (modified) struct2vec

18

slide-19
SLIDE 19

Example: UNICORN

2) At regular interval, histogram converted to a fixed size vector using similarity preserving graph sketching

19

slide-20
SLIDE 20

Example: UNICORN

3) Feature vectors are clustered

20

slide-21
SLIDE 21

Example: UNICORN

4) Cluster forms “meta-state”, transitions are modelled In deployment, anomaly detected via clustering and “meta-state” model

21

slide-22
SLIDE 22

Relatively simple

Labelled directed acyclic graph

– node/edge types – security context (when available)

Modification and combination of existing algorithms

– struct2vec – similarity preserving hashing – clustering

Right combination + domain knowledge

22

slide-23
SLIDE 23

Some insights from this work

23

slide-24
SLIDE 24

We can build practical provenance-based IDSs

We can detect intrusion out of graph structure with little metadata

– Vertex type (thread, file, socket etc…) – Edge type (read, write, connect etc…)

Processing speed

– Current prototype – Data generation speed < processing speed!

24

slide-25
SLIDE 25

Proper evaluation is hard!

  • Dataset are hard to generate
  • What is a good quality dataset?
  • Hard to compare across papers, a lot is not available
  • Experiments (i.e. attacks)
  • Capture Mechanisms
  • Analysis pipelines
  • Leads to unsatisfactory evaluation
  • I may be able to compare to similar techniques (may reuse dataset)
  • … very hard for unrelated one (i.e. ingest different data type)
  • Adversarial ML?

25

slide-26
SLIDE 26

Identifying threats: explainability is a problem

There is a problem within the last batch of X graph elements

– 2,000 in previous figures

Good luck finding out what went wrong

Provenance forensic is an active field of research

– Promising work out of the DARPA programme

… but could we do better during detection?

26

slide-27
SLIDE 27

Ongoing projects

27

slide-28
SLIDE 28

Towards more interpretable provenance-based IDSs

  • PhD student project (Xueyuan “Michael” Han)
  • Collaborators

○ Harvard University ○ UBC ○ NEC Labs America

  • Deep graph learning techniques
  • Precisely identifying attacks within a provenance-graph
  • Generating actionable reports

28

slide-29
SLIDE 29

A framework for Provenance-based forensics

  • PhD student project (Priyanka Badva)
  • Collaborators

○ SRI International

  • Provenance graphs are large and complex (several millions nodes)
  • Designing tools and techniques to identify/explain attacks
  • Working with my colleague Ryan

29

slide-30
SLIDE 30

Distributed IDS

  • Edge network
  • Collaboration with Toshiba (£4M)
  • Exploring distributed learning
  • Poisoning
  • Mechanism
  • Etc.
  • Large testbed planned (work starting January)
  • Hiring 2 postdocs at Bristol
  • Money available for an intern short term (+-covid)

30

slide-31
SLIDE 31

Kernel partitioning

  • PhD student project (Soo Yee Lim)
  • Collaborators

○ HP Labs Bristol ○ Royal Holloway, University of London ○ University of Otago

  • Leveraging CHERI/ARM Morello hardware

○ Hardware capabilities

  • Implement kernel partitioning in the Linux OS

31

slide-32
SLIDE 32

Thank you! Questions?

https://tfjmp.org thomas.pasquier@bristol.ac.uk

32

slide-33
SLIDE 33

How to evaluate?

33

slide-34
SLIDE 34

Comparison state of the art

Manzoor et al. "Fast memory-efficient anomaly detection in streaming heterogeneous graphs" ACM KDD, 2016. R -> neighborhood size for struct2vec algorithm

34

slide-35
SLIDE 35

Evaluation with DARPA datasets

35

slide-36
SLIDE 36

Evaluation with DARPA datasets

SUCH GOOD RESULTS ARE NOT NORMAL

36

slide-37
SLIDE 37

Building our own dataset

▪ Attack designed to look similar to background activity

37

slide-38
SLIDE 38

Building our own dataset

▪ Attack designed to look similar to background activity ▪ Is that enough?

38

slide-39
SLIDE 39

Runtime performance

39

slide-40
SLIDE 40

Runtime performance

40

slide-41
SLIDE 41

Runtime performance

Memory usage: ~500MB CPU usage 15% on 1 core

41