Understanding E-commerce Fraud from Autonomous Chat with - - PowerPoint PPT Presentation

understanding e commerce fraud from autonomous chat with
SMART_READER_LITE
LIVE PREVIEW

Understanding E-commerce Fraud from Autonomous Chat with - - PowerPoint PPT Presentation

Into the Deep Web: Understanding E-commerce Fraud from Autonomous Chat with Cybercriminals Peng Wang , Xiaojing Liao, Yue Qin, XiaoFeng Wang Indiana University Bloomington February 26, 2020 E-commerce fraud online fraudsters February 26,


slide-1
SLIDE 1

Peng Wang, Xiaojing Liao, Yue Qin, XiaoFeng Wang Indiana University Bloomington

Into the Deep Web:

Understanding E-commerce Fraud from Autonomous Chat with Cybercriminals

slide-2
SLIDE 2

E-commerce fraud

February 26, 2020

  • nline

fraudsters

slide-3
SLIDE 3

Crowdsourcing in e-commerce fraud

February 26, 2020

Crowdsourcing

slide-4
SLIDE 4

Crowdsourcing via IM

February 26, 2020

Crowdsourcing via Instant Messaging (IM)

slide-5
SLIDE 5

Bonus hunting

February 26, 2020

+ $$ Bonus hunters E-commerce platforms

$$

$$

+

Small-time workers

slide-6
SLIDE 6

Fraud account trading

February 26, 2020

Account merchants $$ Type1-----$0.5 Type2-----$0.8 Type3-----$1.5 Type4-----$4.5 Account trading storefronts $$ Small-time workers $$ E-commerce platforms

slide-7
SLIDE 7

SIM farming

February 26, 2020

SIM farmers SIM farms (websites or software) Carriers

$$ $$

SIM Sources:

  • VoIP cards

$$

Account merchants

slide-8
SLIDE 8

E-commerce fraud ecosystem

February 26, 2020

Account fraudsters

$$

SIM farmers Fake transaction

  • perators

$$

Small-time workers E-commerce platforms

+ $$ +

slide-9
SLIDE 9

E-commerce fraud groups

February 26, 2020

Fake review groups

  • n Telegram

Fraud account groups

  • n QQ
slide-10
SLIDE 10

E-commerce fraud group chat

February 26, 2020

Group chat

slide-11
SLIDE 11

Threat intelligence gathering:

collecting evidence-based threat information about an existing or emerging threat

February 26, 2020

Fraud account merchants: 1) Account types 2) Store link 3) Payment method 4) SIM card source 5) Hack tools 6) Fraud order tasks Fraud account operators: 1) Fraud order tasks 2) Shipping address 3) Report link 4) Hack tools 5) Account merchants SIM farmers: 1) SIM card source 2) Gateway link/tool 3) Account merchants 4) Hack tools

slide-12
SLIDE 12

Group chat V.S. individual chat

February 26, 2020

Group chat

account type account store link hack tool name

Individual chat V.S.

SIM card source

slide-13
SLIDE 13

Intelligence gathering challenges

  • Active intelligence gathering
  • useful intelligence is only shared through one-on-one conversation
  • the number of new fraudsters keep growing

February 26, 2020

slide-14
SLIDE 14

Intelligence gathering challenges

  • Active intelligence gathering
  • useful intelligence is only shared through one-on-one conversation
  • the number of new fraudsters keep growing
  • Automated conversation with fraudsters
  • existing chatbots can not collect e-commerce threat intelligence
  • how to strategically lead the fraudsters to discuss the target threat

intelligence is complicated

February 26, 2020

slide-15
SLIDE 15

Aubrey

Autonomous chatbot for intelligence discovery

  • first autonomous conversation system for active

threat intel. gathering from e-commerce miscreants

  • effectively extract great number of valuable fraud-

related artifacts

  • new insights into the e-commerce fraud ecosystem

February 26, 2020

slide-16
SLIDE 16

Information exchange

February 26, 2020

Account fraudsters

$$

SIM farmers Fake transaction

  • perators

$$

Small-time workers E-commerce platforms

+ $$ +

slide-17
SLIDE 17

Observation

February 26, 2020

E-commerce fraudster Small-time worker

slide-18
SLIDE 18

Observation

February 26, 2020

E-commerce fraudster Small-time worker

Question Answer Question Answer Answer Answer Question Question

slide-19
SLIDE 19

Architecture

February 26, 2020

slide-20
SLIDE 20

Target Finder

February 26, 2020 keyword features behavioral features intent indicators 150 fraud IM groups

slide-21
SLIDE 21

Strategy Generator

February 26, 2020 seed conversations IM group chats E-comm forum posts

slide-22
SLIDE 22

FSM definition

5-tuple: 𝑇, 𝑆, 𝜀, 𝑡&, 𝐹 𝑇: set of states, question Aubrey can send to the target roles 𝑆: set of responses from the target roles 𝜀: 𝑇 × 𝑆 → 𝑇, state transition function, decide the next state 𝑡&: start state 𝐹: end state

February 26, 2020

slide-23
SLIDE 23

Seed conversation

February 26, 2020

slide-24
SLIDE 24

Segmentation

February 26, 2020

Seed conversation

dialog blocks

+ text clustering

slide-25
SLIDE 25

Topic detection

February 26, 2020

Seed conversation

dialog blocks

account types storelink Cross-role SimSource

topic identification + + text clustering

slide-26
SLIDE 26

Dialog Manager

February 26, 2020

slide-27
SLIDE 27

Retrieval model

  • FSM for retrieval model

Current state ✕ Response is interrogative → Retrieval model state

February 26, 2020

Answers for fraudsters

Q&A pairs

sentence similarity most relevant answer

slide-28
SLIDE 28

Evaluation

February 26, 2020

470 miscreants 7,250 communication messages

slide-29
SLIDE 29

Threat intelligence analysis

February 26, 2020

E-commerce miscreants and corresponding threat intelligence

slide-30
SLIDE 30

Intelligence from SIM farmers

February 26, 2020

90% were used for account registration 72% accounts were used to order online

slide-31
SLIDE 31

Intelligence from Account merchants

February 26, 2020

Abused private APIs and hack tools never been known before

slide-32
SLIDE 32

Intelligence from Fraud operators

February 26, 2020

slide-33
SLIDE 33

Hidden criminal infrastructures

February 26, 2020

Complicity of roles

slide-34
SLIDE 34

Conclusion

Lesson learnt

  • Chatbot is effective to study the cybercrime which are highly rely on

crowdsourcing

  • Account trading lies at the center of the fraud ecosystem, more effort

should be put to mitigate the fraud account threats

Future work

  • The current implementation of Aubrey is simple while effective;
  • more complicated conversation (jargon identification), larger open

domain corpora, hybrid model with human analyst involvement https://sites.google.com/view/aubreychatbot

February 26, 2020

slide-35
SLIDE 35

February 26, 2020

Thank you !

slide-36
SLIDE 36

Discussion

  • Scope
  • collected threat intel. is related to Chinese e-commerce platforms
  • Generalization
  • with target intel. and domain-specific corpora, Aubrey can be re-

trained to chat with other roles (drug dealers etc.) and languages

  • Impact
  • fraud-related artifacts can be used as ground truth
  • fix exposed private APIs to raise the bar for automated abuse
  • stop fraudulent activities at the early stage

February 26, 2020

slide-37
SLIDE 37

FSM for fake account trading

February 26, 2020

slide-38
SLIDE 38

FSM for SIM farm and fake order operation

February 26, 2020

FSM for SIM farm FSM for fake order operation

slide-39
SLIDE 39

Knowledge source extension

February 26, 2020

Questions

for miscreants

Answers

to miscreants IM group chats + Forum discussions candidate questions candidate Q&A pairs similar as seed questions extract Q&A pairs

slide-40
SLIDE 40

Data collection

  • Datasets

Dataset # of raw data # of dialog pairs Seed conversation 800 200 IM group discussion 1 Million 50,000 Forum discussion 135,000 700,000

February 26, 2020

slide-41
SLIDE 41

Evaluation

  • Role identification classifier
  • Ground truth:

500 upstream, 180 downstream, 3,000 unrelated actors

  • Unknown set:

20,265 IM group members (from 150 IM groups)

  • Effectiveness:

upstream: 87.0% precision, 91.2% recall downstream: 81.1% precision, 95.6% recall upstream actor: 89.0% precision, 92.8% recall

  • verall:

86.2% F1 score

1,044 SIM farmers, 700 account merchants, 2,648 fraud order ops

  • Accuracy
  • 545 chat attempts, 470 responded (185 SIM farmers, 130 account

merchants, 155 fraud order operators);

  • ne questioned Aubrey
  • 97.4% (458) accuracy

February 26, 2020

slide-42
SLIDE 42

Effectiveness

February 26, 2020

CDF of interaction round per miscreant CDF of interaction round for intel. gathering

52%

slide-43
SLIDE 43

Case study

February 26, 2020

Account inventory and price tracking

Revenue = sales * price = $48K/month