Enc Encryp ypted Sear ed Search h Seny Kamara Brown University - - PowerPoint PPT Presentation

enc encryp ypted sear ed search h
SMART_READER_LITE
LIVE PREVIEW

Enc Encryp ypted Sear ed Search h Seny Kamara Brown University - - PowerPoint PPT Presentation

Enc Encryp ypted Sear ed Search h Seny Kamara Brown University 2 3 4 Q: Why is this happening? 5 Big Data Industry and Governments want more data NaDonal security Machine learning Business analyDcs NLP


slide-1
SLIDE 1

Enc Encryp ypted Sear ed Search h

Seny Kamara Brown University

slide-2
SLIDE 2

2

slide-3
SLIDE 3

3

slide-4
SLIDE 4

4

slide-5
SLIDE 5

Q: Why is this happening?

5

slide-6
SLIDE 6

Big Data

► Industry and Governments want more data ► NaDonal security ► Machine learning ► Business analyDcs ► NLP ► LocaDon-based services ► …

6

slide-7
SLIDE 7

Big Data

7

u More intrusive & sensitive

u Photos, medical records u Location data, email, u browsing history, voicemails

u Greater need for security u Harder to secure

u NSA Bluffdale holds 2EBs! (2K PBs) u Facebook holds 300PBs of photos/

videos

u Vs. nation states, intelligence

agencies,

  • rganized crime, insiders, …
slide-8
SLIDE 8

Big Data

8

u Impossible to work with

u Lose search, DBs, IR u Find your photo among 300PBs? u Rank results?

u End-to-end (e2e) encryption!

u Reduces attack surface u Secure small key instead of Big Data

slide-9
SLIDE 9

Q: Can we search on encrypted data?

9

slide-10
SLIDE 10

An InteresDng QuesDon

10

Cryptography Data Structures

InformaDon Retrieval Graph Theory

Databases

Combinatorial OpDmizaDon StaDsDcs

slide-11
SLIDE 11

A LucraDve QuesDon

11

► Startups ► CipherCloud ($30M+$50M) ► Navajo (Salesforce) ► SkyHigh , Vaultive, Inpher ► Bitglass, Private Machines, … ► Major Corporations ► Microsoft, IBM, ► Google, Yahoo ► Hitachi, Fujitsu ► Funding agencies ► IARPA ► DARPA ► NSF

slide-12
SLIDE 12

12

“There are a lot of advancements in things like encrypted search...but in general it is a difficult problem”

  • - Edward Snowden @ SXSW‘14
slide-13
SLIDE 13

Encrypted Search SoluDons

13

slide-14
SLIDE 14

Usage

14

tk EDB

DB DB

slide-15
SLIDE 15

Desiderata

15

tk EDB

St Storag age le leak akag age Qu Query leakage Siz Size of f EDB EDB Se Sear arch h Dme me Siz Size of f tk tk

slide-16
SLIDE 16

Many Approaches

► Stream ciphers [SWP01] ► BuckeDng [HILM02] ► Structured and searchable encrypDon (StE/SSE) [SWP01,CGKO06,CK10] ► Oblivious RAM (ORAM) [GO96] ► FuncDonal encrypDon (e.g., PEKS) [BCOP06] ► MulD-party computaDon (MPC) [Yao82,GMW87] ► Property-preserving encrypDon (PPE) [AKSX04,BBO06,BCLO09] ► Fully-homomorphic encrypDon [G09]

16

slide-17
SLIDE 17

Tradeoffs: Efficiency vs. Security

17

Effic fficiency ST STE/SSE-based PPE-based FHE-based ORAM-based skFE-based pkFE-based Leak Leakag age e

slide-18
SLIDE 18

Tradeoffs: FuncDonality vs. Efficiency

18

SK-FE-based STE/SSE-based PPE PPE-based

  • based

FHE-based ORAM-based PK-FE-based Effic fficiency Fu FuncDonality SQL QL NoSQL QL

slide-19
SLIDE 19

Leakage

19

► TheoreDcal Cryptography [Goldwasser-Micali82,…] ► A great success story ► Helps us reason about confidenDality, integrity, … ► Focused on leakage-free cryptography ► Real-world systems security relies on tradeoffs ► No cryptographic foundaDons for tradeoffs ► Can we leak X but not Y? ► How do we model leakage?

slide-20
SLIDE 20

Leakage

[Curtmola-Garay-K.-Ostrovsky06, Chase-K.10, Islam-Kuzu-Kantarcioglu12, K.15]

► Leakage analysis: what is being leaked? ► Proof: prove that soluDon leaks no more ► Cryptanalysis: can we exploit the leakage?

20

Leakage analysis Proof of security Leakage cryptanalysis

slide-21
SLIDE 21

ApplicaDons

21

slide-22
SLIDE 22

Encrypted Search Engines

► Desktop search ► Windows search, Apple Spotlight ► Personal cloud storage ► Dropbox, OneDrive, iCloud, … ► Webmail ► Gmail, Yahoo! Mail, Outlook.com,…

22

slide-23
SLIDE 23

Encrypted DBs

► Standard DBs ► DB encrypted in memory ► Cloud DBs ► DB encrypted in cloud

23

slide-24
SLIDE 24

Encrypted NSA Metadata Program [K.14]

► To & from numbers, Dme of call, duraDon for all US-to-US, US-to-Foreign and Foreign-to-US calls ► NSA DB can only be queried by individual phone number (seed) ► Analyst queries must be approved by small number of NSA officials

1 3 2 1 2 3

slide-25
SLIDE 25

Systems (Provably Secure)

25

slide-26
SLIDE 26

Systems

► CS2 (C++) ► Microsos Research, 2012 ► Queries: single keyword search ► 16MB email collecDon in 53ms

26

► BlindSeer (C++) [IARPA] ► Columbia & Bell Labs, 2014 ► Queries: boolean ► SyntheDc dataset ► Search Dme

► Fo

For (w1 an and w w2): 250ms

► w1 in 1 docs ► w2 in 10K docs

slide-27
SLIDE 27

Systems

27

► IBM-UCI (C++) [IARPA] ► IBM Research & UC Irvine, 2013 ► Queries: conjuncDve ► 1.3GB email collecDon ► Search Dme

► Fo

For (w1 an and w w2): 5ms

► w1 in 15 docs ► w2 in 1M docs

► Clusion (Java) ► Brown & Colorado St., 2016 ► Queries: Boolean ► 1.3GB email collecDon ► Search Dme

► For (w

(w1 or w

  • r w2) and (w

) and (w3 or w

  • r w4) in

) in 1.5 1.5ms ms

► (w

(w1 or w

  • r w2)

) in 10 docs

► (w

(w3 or w

  • r w4)

) in 1M docs

slide-28
SLIDE 28

Systems

► GRECS ► Microsos Research, Boston U., Harvard & Ben Gurion, 2015 ► Queries: (approximate) shortest distance on graphs ► 1.6M nodes & 11M edges ► Query Dme: 10ms

28

slide-29
SLIDE 29

Conclusions

► ExciDng and acDve area of research ► Big potenDal impact in pracDce ► Lots of new research direcDons in theory and systems ► PotenDal for collaboraDon between many areas of CS ► Algorithms and data structures ► Databases ► InformaDon retrieval ► Combinatorial opDmizaDon ► StaDsDcs

29

slide-30
SLIDE 30

30