Improved Reconstruction Attacks on Encrypted Data Using Range Query - - PowerPoint PPT Presentation

improved reconstruction attacks on encrypted data using
SMART_READER_LITE
LIVE PREVIEW

Improved Reconstruction Attacks on Encrypted Data Using Range Query - - PowerPoint PPT Presentation

Improved Reconstruction Attacks on Encrypted Data Using Range Query Leakage Marie-Sarah Lacharit, Brice Minaud , Kenny Paterson Information Security Group IEEE Symposium on Security and Privacy, May 21, 2018 Outsourcing Data with Search


slide-1
SLIDE 1

IEEE Symposium on Security and Privacy, May 21, 2018

Information Security Group

Marie-Sarah Lacharité, Brice Minaud, Kenny Paterson

Improved Reconstruction Attacks on Encrypted Data Using Range Query Leakage

slide-2
SLIDE 2

Outsourcing Data with Search Capabilities

2

Client Server

slide-3
SLIDE 3

Outsourcing Data with Search Capabilities

2

Data upload Client Server

slide-4
SLIDE 4

Outsourcing Data with Search Capabilities

2

Data upload Search query Matching records Client Server

slide-5
SLIDE 5

Outsourcing Data with Search Capabilities

2

Data upload Search query Matching records

For an encrypted database management system:

  • Data = collection of records in a database. e.g. health records.
  • Search query examples:
  • find records with given value. e.g. patients aged 57.
  • find records within a given range. e.g. patients aged 55-65.

Client Server

slide-6
SLIDE 6

Security of Data Outsourcing Solutions

3

Adversaries:

  • Snapshot: breaks into server, gets snapshot of memory.
  • Persistent: corrupts server, sees all communication transcripts.

Can be server itself. Security goal = privacy. → Adversary learns as little as possible about the client’s data and queries.

Client Adversarial server Search query Matching records

slide-7
SLIDE 7

Solutions

4

  • Structure-preserving encryption.

Vulnerable to snapshot attackers.

slide-8
SLIDE 8

Solutions

4

  • Structure-preserving encryption.

Vulnerable to snapshot attackers.

  • Second-generation schemes:

Aim to protect against snapshot and persistent attackers.

slide-9
SLIDE 9

Solutions

4

  • Structure-preserving encryption.

Vulnerable to snapshot attackers.

  • Very active research topic.

[AKSX04], [BCLO09], [PKV+14], [BLR+15], [NKW15], [KKNO16], [LW16], [FVY+17], [SDY+17], [DP17], [HLK18], [PVC18], [MPC+18]…

  • Second-generation schemes:

Aim to protect against snapshot and persistent attackers.

slide-10
SLIDE 10

5

Range = [40,100] Client Server

45 6 83 28 1 2 4 3

Schemes Supporting Range Queries

slide-11
SLIDE 11

5

Range = [40,100] Client Server

45 6 83 28 1 2 4 3 45 1 83 3

Schemes Supporting Range Queries

slide-12
SLIDE 12

5

Range = [40,100] Client Server

45 6 83 28 1 2 4 3 45 1 83 3

Schemes Supporting Range Queries

slide-13
SLIDE 13

5

Range = [40,100] Client Server

  • Most schemes leak set of matching records = access pattern leakage.

OPE, ORE schemes, POPE, [HK16], BlindSeer, [Lu12], [FJ+15], …

45 6 83 28 1 2 4 3 45 1 83 3

Schemes Supporting Range Queries

slide-14
SLIDE 14

5

Range = [40,100] Client Server

  • Most schemes leak set of matching records = access pattern leakage.

OPE, ORE schemes, POPE, [HK16], BlindSeer, [Lu12], [FJ+15], …

  • Some schemes also leak #records below queried endpoints = rank leakage.

FH-OPE, Lewi-Wu, Arx, Cipherbase, EncKV, …

45 6 83 28 1 2 4 3 45 1 83 3

Schemes Supporting Range Queries

slide-15
SLIDE 15

Exploiting Leakage

6

  • Most schemes prove that nothing more leaks than their leakage model allows.

For example, leakage = access pattern + rank. What can we really learn from this leakage?

slide-16
SLIDE 16

Exploiting Leakage

6

  • Most schemes prove that nothing more leaks than their leakage model allows.

For example, leakage = access pattern + rank. What can we really learn from this leakage?

  • Our goal: full reconstruction = recovering the exact value of every record.
slide-17
SLIDE 17

Exploiting Leakage

6

  • Most schemes prove that nothing more leaks than their leakage model allows.

For example, leakage = access pattern + rank. What can we really learn from this leakage?

  • Our goal: full reconstruction = recovering the exact value of every record.
  • [KKNO16]: O(N 2 log N) queries suffice for full reconstruction using only access

pattern leakage!

  • where N is the number of possible values (e.g. 125 for age in years).
slide-18
SLIDE 18

Assumptions for our Analysis

7

  • Data is dense: all values appear in at least one record.
  • Queries are uniformly distributed.

Our algorithms don’t actually care though – the assumption is for computing data upper bounds.

slide-19
SLIDE 19

Our Main Results

  • Full reconstruction with O(N·logN) queries from access pattern leakage

– in fact, N · (3 + log N).

8

slide-20
SLIDE 20

Our Main Results

  • Full reconstruction with O(N·logN) queries from access pattern leakage

– in fact, N · (3 + log N).

  • Approximate reconstruction with relative accuracy ε with O(N · (log 1/ε))

queries.

8

slide-21
SLIDE 21

Our Main Results

  • Full reconstruction with O(N·logN) queries from access pattern leakage

– in fact, N · (3 + log N).

  • Approximate reconstruction with relative accuracy ε with O(N · (log 1/ε))

queries.

  • Approximate reconstruction using an auxiliary distribution and access

pattern + rank leakage.

8

slide-22
SLIDE 22

Our Main Results

  • Full reconstruction with O(N·logN) queries from access pattern leakage

– in fact, N · (3 + log N).

  • Approximate reconstruction with relative accuracy ε with O(N · (log 1/ε))

queries.

  • Approximate reconstruction using an auxiliary distribution and access

pattern + rank leakage.

8

slide-23
SLIDE 23

Full reconstruction

slide-24
SLIDE 24

Full Reconstruction Algorithm

10

M1 M5 M3 M4 M2

Set of all records

Assume N = 7 values, and 5 queries. Mi = set of records matched by i-th query.

slide-25
SLIDE 25

Step 1: Partitioning

11

M1 M5 M3 M4 M2

slide-26
SLIDE 26

Step 1: Partitioning

11

… …

M1 M5 M3 M4 M2

slide-27
SLIDE 27

Step 1: Partitioning

11

… …

M1 M5 M3 M4 M2

If there are N minimal subsets → each of them correspond to a single value.

slide-28
SLIDE 28

Step 2a: Finding an Endpoint

12

M1 ∪ M3 cover all but 1 minimal set

M1 M5 M3 M4 M2

slide-29
SLIDE 29

Step 2a: Finding an Endpoint

12

M1 ∪ M3 cover all but 1 minimal set Endpoint!

M1 M5 M3 M4 M2

slide-30
SLIDE 30

Step 2a: Finding an Endpoint

12

M1 ∪ M3 cover all but 1 minimal set Endpoint!

7

M1 M5 M3 M4 M2

slide-31
SLIDE 31

Step 2b: Propagating

13

7

M1 M1 M5 M3 M4 M2

  • Intersect
slide-32
SLIDE 32

Step 2b: Propagating

13

7

M1 M1 M5 M3 M4 M2

  • Trim
  • Intersect
slide-33
SLIDE 33

Step 2b: Propagating

13

7

M1 M1 M5 M3 M4 M2 M1

  • Trim
  • Intersect
slide-34
SLIDE 34

Step 2b: Propagating

13

Next point!

7

M1 M1 M5 M3 M4 M2 M1

  • Trim
  • Intersect
slide-35
SLIDE 35

Step 2b: Propagating

13

Next point!

7 6

M1 M1 M5 M3 M4 M2 M1

  • Trim
  • Intersect
slide-36
SLIDE 36

Step 2b: Propagating

14

5 7 6

M1 M5 M3 M4 M2

  • Intersect
  • Trim
slide-37
SLIDE 37

Step 2b: Propagating

15

4 5 7 6

M1 M5 M3 M4 M2

  • Intersect
  • Trim
slide-38
SLIDE 38

Step 2b: Propagating

16

3 4 5 7 6

M1 M5 M3 M4 M2

  • Intersect
  • Trim
slide-39
SLIDE 39

Step 2b: Propagating

17

2 3 4 5 7 6

M1 M5 M3 M4 M2

  • Intersect
  • Trim
slide-40
SLIDE 40

Done!

18

1 2 3 4 5 7 6

M1 M5 M3 M4 M2

  • Intersect
  • Trim
slide-41
SLIDE 41

Full Reconstruction: Conclusion

  • Generic setting: only access pattern leakage.
  • Partiotioning, then sorting steps.
  • Expectation of #queries sufficient for reconstruction:

N · (3 + log N) for N ≥ 26

  • Expectation of #queries necessary for reconstruction:

1/2 · N · log N – O(N) for any algorithm.

  • Our algorithm is data-optimal.

19

slide-42
SLIDE 42

Reconstruction with Auxiliary Data + Rank Leakage

slide-43
SLIDE 43

Auxiliary Data Attack with Rank Leakage

  • Assume access pattern + rank leakage.
  • Also assume an approximation to the distribution on values is known.

“Auxiliary distribution”. From aggregate data, or from another reference source.

  • We show experimentally that, under these assumptions, far fewer queries

are needed.

21

slide-44
SLIDE 44

Auxiliary Data Attack Algorithm

22

Set of all records

Assume N = 125 values, and 2 queries. Mi = set of records matched by i-th query.

M1 M2

slide-45
SLIDE 45

Partitioning and Matching

23

M1 M2

slide-46
SLIDE 46

Partitioning and Matching

23

M1 M2

slide-47
SLIDE 47

Partitioning and Matching

23

% records below 10%

M1 M2

slide-48
SLIDE 48

Partitioning and Matching

23

32% % records below 10%

M1 M2

slide-49
SLIDE 49

Partitioning and Matching

23

32% 77% % records below 10%

M1 M2

slide-50
SLIDE 50

Partitioning and Matching

23

32% 77% 85% % records below 10%

M1 M2

slide-51
SLIDE 51

Partitioning and Matching

23

12 Matching with

  • aux. distribution

Age 32% 77% 85% % records below 10%

M1 M2

slide-52
SLIDE 52

Partitioning and Matching

23

43 12 Matching with

  • aux. distribution

Age 32% 77% 85% % records below 10%

M1 M2

slide-53
SLIDE 53

Partitioning and Matching

23

43 60 12 Matching with

  • aux. distribution

Age 32% 77% 85% % records below 10%

M1 M2

slide-54
SLIDE 54

Partitioning and Matching

23

43 60 72 12 Matching with

  • aux. distribution

Age 32% 77% 85% % records below 10%

M1 M2

slide-55
SLIDE 55

Partitioning and Matching

23

19 Expectation 43 60 72 12 Matching with

  • aux. distribution

Age 32% 77% 85% % records below 10%

M1 M2

slide-56
SLIDE 56

Partitioning and Matching

23

50 19 Expectation 43 60 72 12 Matching with

  • aux. distribution

Age 32% 77% 85% % records below 10%

M1 M2

slide-57
SLIDE 57

Partitioning and Matching

23

50 65 19 Expectation 43 60 72 12 Matching with

  • aux. distribution

Age 32% 77% 85% % records below 10%

M1 M2

slide-58
SLIDE 58

Auxiliary Data Attack: Experimental Evaluation

  • Ages, N = 125.
  • Health records from US hospitals (NIS HCUP 2009).
  • Target: age of individual hospitals' records.
  • Auxiliary data: aggregate of 200 hospitals' records.
  • Measure of success: proportion of records with value guessed within ε.

24

slide-59
SLIDE 59

Results with Imperfect Auxiliary Data

25

slide-60
SLIDE 60

Conclusions

slide-61
SLIDE 61

Reconstruction Attacks: Conclusions

27

  • Full reconstruction ≈ N log N queries with only access pattern!

Efficient, data-optimal algorithms + matching lower bound.

  • For N = 125 :

800 queries → full reconstruction. 25 queries → majority of records within 5%, using ssssss m sssm auxiliary distribution + rank.

Attack Leakage Other req'ts

  • Suff. # queries

KKNO16 AP Density O(N2 log N) Full AP + rank Density N · (log N + 2) AP Density N · (log N + 3) ε-approx. AP Density 5/4 N·(log 1/ε) + O(N) Auxiliary AP + rank Auxiliary dist. Experimental

slide-62
SLIDE 62

Reconstruction Attacks: Conclusions

28

  • Many clever schemes have been designed, enabling range queries on

encrypted data. OPE, ORE schemes, POPE, [HK16], BlindSeer, [Lu12], [FJKNRS15], FH-OPE, Lewi-Wu, Arx, Cipherbase, EncKV,…

  • Second-generation schemes defeat the snapshot adversary (with caveats).
  • But as our attacks show, no known scheme offers meaningful privacy vs. a

persistent adversary (including server itself).

  • More research needed!