Cloak of Visibility: Detecting When Machines Browse a Different Web - - PowerPoint PPT Presentation

cloak of visibility detecting when machines browse a
SMART_READER_LITE
LIVE PREVIEW

Cloak of Visibility: Detecting When Machines Browse a Different Web - - PowerPoint PPT Presentation

Cloak of Visibility: Detecting When Machines Browse a Different Web Luca Invernizzi *, Kurt Thomas*, Alexandros Kapravelos , Oxana Comanescu*, Jean-Michel Picod*, and Elie Bursztein* * Google - Anti-fraud and abuse research North Carolina


slide-1
SLIDE 1

Cloak of Visibility: Detecting When Machines Browse a Different Web

Luca Invernizzi*, Kurt Thomas*, Alexandros Kapravelos†, Oxana Comanescu*, Jean-Michel Picod*, and Elie Bursztein* * Google - Anti-fraud and abuse research

† North Carolina State University

slide-2
SLIDE 2

Web cloaking

Cloaking site

slide-3
SLIDE 3

Web cloaking

Search Effective for Search Engine Optimization Ads Effective to infringe policies Malware Effective to evade security crawlers

slide-4
SLIDE 4

Responsive design vs cloaking

This is not cloaking.

slide-5
SLIDE 5

Responsive design vs cloaking

404

This is cloaking.

slide-6
SLIDE 6

Keep up with arms race Identify trends Explore alternatives

Research goals

slide-7
SLIDE 7

Blackmarket Investigation

Acquired

Top 10

Cloaking software samples

Can’t go wrong with Cloaky McCloakyFace. I swear by NowYouSeeMe!

slide-8
SLIDE 8

HTTP reverse proxy

$3500+ cloaking software

Network Browser Browsing context Decision based on:

slide-9
SLIDE 9

$3500+ cloaking software

Admin interface

Configures Generates

HTTP reverse proxy

slide-10
SLIDE 10

Input keywords => http://money.site Features

  • Find similar sites through SERPs
  • Content/Template spinning
  • Drip-feeding

Added services

  • Plagiarism detection
  • SERP ranking

Admin interface

slide-11
SLIDE 11

Cloaking techniques

slide-12
SLIDE 12

Technique: referer-based cloaking

GET / Referer: ...tiffany+cheap... GET / Referer: blank GET / Referer: ...tiffany...

slide-13
SLIDE 13

Technique: IP blacklisting

Blacklisted IPs

51m

Subnets

983

Security companies

30

Hacking collectives

2

Proxy networks

3

Entities: companies, universities, registrars

122

slide-14
SLIDE 14

Crowdsourced blacklist

Blacklisted IPs

50k

Subscription

$350+

Honeypot

slide-15
SLIDE 15

Technique: rDNS cloaking

66.249.66.1 Host 66.249.66.1? crawl.googlebot.com.

Google (.*1e100.*, .*google.*) Microsoft Yahoo Yandex Baidu Ask Rambler DirectHit Theoma

slide-16
SLIDE 16

Technique: browsing pattern cloaking

GET /clicked GET / Set-Cookie: now()

slide-17
SLIDE 17

Geolocation: country, city, carrier level. Flash/JS support & fingerprints User-Agent

More techniques

JS

slide-18
SLIDE 18

Prevalence and dominant techniques

404

Is this cloaking? How do they cloak?

slide-19
SLIDE 19

Browser farm

User-Agent: GoogleBot Referer: blank Google IP

Pretend Google bots

User-Agent: Chrome Referer: blank, or simple Cloud provider IPs

Simple honey clients

User-Agent: Chrome Referer: context-aware Residential and mobile IPs

Realistic honey clients wget wget

I’m real!

slide-20
SLIDE 20

Features

Syntactic Content similarity Screenshot similarity Semantic Topic similarity Screenshot topic similarity HTML Image

slide-21
SLIDE 21

95k labeled samples 75k legitimate websites (Alexa) + 20k cloaked storefronts

Classification

False positive rate

.9%

True positive rate

82%

slide-22
SLIDE 22

Prevalence

Cloaking pages in Google Search, for luxury storefronts keywords.

11.7%

Cloaking pages in Google AdWords, for health and software ads.

4.9%

slide-23
SLIDE 23

Traditional techniques: only IP, Referer, and User-Agent

Search: 1 out of 5 Ads: 1 out of 4

slide-24
SLIDE 24

Search: Half Ads: 1 out of 4

Current techniques: JavaScript support

slide-25
SLIDE 25

Current techniques: wait for click

Search: 1 out of 10

Ads: 1 out of 5

slide-26
SLIDE 26

Delivery: same-page cloaking

Uncloaked Cloaked

Search: 1 out of 5 Ads: 2 out of 3

slide-27
SLIDE 27

404

Delivery: 40x/50x errors to bots

Search: 1 out of 7 Ads: 1 out of 8

slide-28
SLIDE 28

Future: client-side detection

Search/Ads links add a parameter with the topics found by the bot. Check that the page matches the same topics.

slide-29
SLIDE 29

Takeaways

Prevalence 5% of ads and 12%

  • f search results

for cloaking-prone keywords cloak. Techniques IP/User-Agent/ Referer only gets ⅕ of cloaking. Moving forward Client side, semantic features needed for hard cases.

slide-30
SLIDE 30

Thank you!

Luca Invernizzi invernizzi@google.com