Mylobot, Detecting the Undetected Using Deep Learning Yael Daihes - - PowerPoint PPT Presentation

mylobot detecting the undetected using deep learning
SMART_READER_LITE
LIVE PREVIEW

Mylobot, Detecting the Undetected Using Deep Learning Yael Daihes - - PowerPoint PPT Presentation

01 Mylobot, Detecting the Undetected Using Deep Learning Yael Daihes 02 WHO AM I Yael Daihes Security Data Science Team Lead @ Akamai Technologies My things - Botnets, Traffic, Data, Algorithms, Reading, Painting and a bit of Gaming (:


slide-1
SLIDE 1

Mylobot, Detecting the Undetected Using Deep Learning

Yael Daihes

01

slide-2
SLIDE 2

WHO AM I

Yael Daihes Security Data Science Team Lead @ Akamai Technologies My things - Botnets, Traffic, Data, Algorithms, Reading, Painting and a bit of Gaming (:

02

slide-3
SLIDE 3

03

DNS Empire

slide-4
SLIDE 4

AGENDA

1 Mylobot - What is it? 2 What Is DGA? How Has the Defense Community Tackle the DGA Problem So Far? 3 How Did We Tackle this Issue? Overview of Our Detection System 4 Results in the Wild 5 Mylobot - As we see it

04

slide-5
SLIDE 5

Mylobot?

05

slide-6
SLIDE 6

2018

06

slide-7
SLIDE 7

07

*Anti VM techniques *Anti-sandbox techniques *Anti-debugging techniques *Wrapping internal parts with an encrypted resource file *Code injection *Process hollowing *CNC communication delaying mechanism - 14 days before accessing its command and control servers

*DGA

Tons of Evasion Techniques

[1] www.deepinstinct.com/2018/06/20/meet-mylobot- a-new-highly-sophisticated-never-seen-before-botnet- thats-out-in-the-wild/

slide-8
SLIDE 8

Dear Bot master, what do I do next? Command and Control Send me a screen shot botmaster.com

08

slide-9
SLIDE 9

Dear Bot master, what do I do next? Send me a screen shot Defense botmaster.com

09

slide-10
SLIDE 10

What is DGA?

10

slide-11
SLIDE 11

Generated Domains Asdiuouoi.top NX NX Hakjhsdkjh.top Whjrhkejwh.biz Hjkwrjkhew.biz DNS Response 1.2.3.4 NX ... ...

Day 1

CNC Channel for Sunday

11

slide-12
SLIDE 12

Generated Domains ycrxmen.com 5.6.7.8 NX dtswwomss.eu ljfsmaroqok.com gvzoutzukdzth.ru DNS Response NX NX ... ...

Day 2

CNC Channel for Monday

12

slide-13
SLIDE 13

How had the defense community tackle the DGA problem so far?

13

slide-14
SLIDE 14

Example - Could be used for creating the domain names and add to a block list (given the seed) How to find the seed? Code for creating "Simda" domain names, created by reversing the binary and implementing the logic in Python Cool ways that involve checking what’s in the traffic and brute forcing possibilities with really strong computers

github.com/baderj/domain_generation_algorithms

Reverse the DGA code

14

slide-15
SLIDE 15

But.. How does that detect new malware? What about DGAs we can't break their seed?

15

slide-16
SLIDE 16

Our Goal - Detect DGA Domains

Detect new domains of DGAs I know, but couldn't break Detect new DGAs never seen or reported before Bonus - Detect what I can break and is known 16

slide-17
SLIDE 17

Maybe we should check if the characters used in the domain name are basically.. gibberish?

Let's Solve This Together

17

slide-18
SLIDE 18

Attackers adapt..

18

Let's Solve This Together

slide-19
SLIDE 19

OK OK OK.. hold up - we're smarter. How does the 2- characters distribution look like?

Hooray!

19

Let's Solve This Together

slide-20
SLIDE 20

Or not..?

20

Let's Solve This Together

slide-21
SLIDE 21

Deep Learning to the Rescue

Sequence model (char by char) Should be able to distinguish between a sequence generated artificially (DGA) and a sequence from a natural language Learns the patterns (by training)

WHY DEEP LEARNING?

21 Predicting Domain Generation Algorithms with Long Short-Term Memory Networks, Endgame -https://arxiv.org/abs/1611.00791

slide-22
SLIDE 22

Training the Model

Generated by the reverse codes or captured in the wild [1]

1.2 MILLION DGA DOMAINS

Captured from normal traffic

1.2 MILLION BENIGN DOMAINS

90% Accuracy

Data Results

Can't classify which malware family

[1]dgarchive.caad.fkie.fraunhofer.de

22

slide-23
SLIDE 23

Intuition - The Challenges of Classifying the Malware Family

Some clusters are separable Some clusters are too intertwined

Visualization of Domains as seen by the Deep Learning model

23

slide-24
SLIDE 24

Overview of the System

facebook.com

  • yeeedysb.com

gvzozukdzth.ru goooogle.com ycrxmen.com ljfsaroqok.com 0.0 0.999 0.91 0.0 0.999 0.98

09

  • yeeedysb.com

gvzozukdzth.ru ycrxmen.com ljfsaroqok.com DNS Queries made by

  • ne user in specified time

window Deep Learning model response -"How likely this is a DGA, between 0-1?

What does the Deep Learning model think? Classify the domains detected

Attribute to the specific malware L O C K Y D Y R E U N K N O W N # 1

24

slide-25
SLIDE 25

Results in the Wild

2.5 million domains detected and blocked daily 70 million DNS requests blocked daily ~0% False Positives ±8 unknown (zero day?) DGAs detected

25

slide-26
SLIDE 26

MYLOBOT

26

slide-27
SLIDE 27

How Does Mylobot Look in Traffic

DNS query m8.zdrussle.ru IP: x.x.x.x

1

Connect to C2s for grabbing and executing second stage

Stage 1 - Mylobot (Downloader)

DNS Server

[1] www.deepinstinct.com/2018/06/20/meet-mylobot- a-new-highly-sophisticated-never-seen-before-botnet- thats-out-in-the-wild/

[2] blog.centurylink.com/mylobot- continues-global-infections/

27

slide-28
SLIDE 28

2

Where do I need to go next? Go and get http://1.2.3.4/malware.gif IP: x.x.x.x

[1] www.deepinstinct.com/2018/06/20/meet-mylobot- a-new-highly-sophisticated-never-seen-before-botnet- thats-out-in-the-wild/

[2] blog.centurylink.com/mylobot- continues-global-infections/

28

How Does Mylobot Look in Traffic

Connect to C2s for grabbing and executing second stage

Stage 1 - Mylobot (Downloader)

slide-29
SLIDE 29

3

GET http://1.2.3.4/malware.gif IP: 1.2.3.4 Malware file (unknown)

[1] www.deepinstinct.com/2018/06/20/meet-mylobot- a-new-highly-sophisticated-never-seen-before-botnet- thats-out-in-the-wild/

[2] blog.centurylink.com/mylobot- continues-global-infections/

29

How Does Mylobot Look in Traffic

Connect to C2s for grabbing and executing second stage

Stage 1 - Mylobot (Downloader)

slide-30
SLIDE 30

Unknown malicious activity Reported to have been using Khalesi as second stage [1][2] Run second malware

Stage 2

4

[1] www.deepinstinct.com/2018/06/20/meet-mylobot- a-new-highly-sophisticated-never-seen-before-botnet- thats-out-in-the-wild/

[2] blog.centurylink.com/mylobot- continues-global-infections/

30

How Does Mylobot Look in Traffic

slide-31
SLIDE 31

DNS query m8.zdrussle.ru IP: x.x.x.x

1 2

Unknown malicious activity Reported to have been using Khalesi as second stage [1][2] Run second malware

3 4

DNS Server Where do I need to go next? Go and get http://1.2.3.4/malware.gif IP: x.x.x.x GET http://1.2.3.4/malware.gif IP: 1.2.3.4 Malware file (unknown)

[1] www.deepinstinct.com/2018/06/20/meet-mylobot- a-new-highly-sophisticated-never-seen-before-botnet- thats-out-in-the-wild/

[2] blog.centurylink.com/mylobot- continues-global-infections/

31

How Does Mylobot Look in Traffic

Connect to C2s for grabbing and executing second stage

Stage 1 - Mylobot (Downloader) Stage 2

slide-32
SLIDE 32

What Did We Detect in Traffic?

The first step "DNS query m8.zdrussle.ru" The domain name is generated by a DGA, and the deep learning model detected it ~1,400 domains detected m<number between 0 and 43>.<domain generated by a DGA>.com|in|biz|org|net|me|cc|ru

3

32

slide-33
SLIDE 33

33

slide-34
SLIDE 34

The first step "DNS query m8.zdrussle.ru" The domain name is generated by a DGA, and the deep learning model detected it ~8,000 domains detected, of what we understand to be variants. The four variants differ in the DGA pattern used m<number between 0 and 43>.<domain generated by a DGA>.com|in|biz|org|net|me|cc|ru green<number between 0 and 43>.<domain generated by a DGA>.com|ru| v1.<domain generated by a DGA>.com|ru|net|org|bz|in|biz|su|eu|cc

1 2 3 4

x<number between 0 and 43>.<domain generated by a DGA>.com|ru|net|org|bz|in|biz|su|eu|cc

4

34

What Did We Detect in Traffic?

slide-35
SLIDE 35

As researched by us and by our findings

IOC MAP

M Variant X Variant 2019

35

slide-36
SLIDE 36

As researched by us and by our findings

IOC MAP

M Variant Green Variant 2019 V1 Variant

671 more domains resolving

36

slide-37
SLIDE 37

As researched by us and by our findings

IOC MAP

X Variant M Variant 2020

37 02

slide-38
SLIDE 38

How Did It Look In Our Traffic Over Time

For comparison, DNS queries a day as seen in Akamais traffic: Pykspa ~ 1Million Qsnatch ~1Million Emotet ~500k Gameover Zeus ~200k

38

slide-39
SLIDE 39

How Did It Look In Our Traffic Over Time

Entities - could be single user or a NAT

39

slide-40
SLIDE 40

SUMMARY

What's DGA

Piece of code some malwares have that generate domain names for forming C&C channel

Defense System: DGA detection in traffic

Trained a Deep Learning model and use it over live traffic

What's DGA

Piece of code some malwares have that generate domain names

Mylobot

Super active newly seen botnet this system detected, look out!

40

slide-41
SLIDE 41

Takeaways

Investigate new patterns and domains

Threat Intelligence - Network Perspective

Monitor DNS traffic We will publish IOCs DGA detection beasts are possible!

Mylobot -Countermeasures

41

slide-42
SLIDE 42

Reach Out

@Yael_Daihes

TWITTER

42