O F CONTRASE NAS , AND : C HARACTER ENCODING ISSUES - - PowerPoint PPT Presentation

o f contrase nas and
SMART_READER_LITE
LIVE PREVIEW

O F CONTRASE NAS , AND : C HARACTER ENCODING ISSUES - - PowerPoint PPT Presentation

O F CONTRASE NAS , AND : C HARACTER ENCODING ISSUES FOR WEB PASSWORDS Joseph Bonneau Rubin Xu jcb82@cl.cam.ac.uk Computer Laboratory Web 2.0 Security & Privacy San Francisco, CA, USA May 24,


slide-1
SLIDE 1

OF CONTRASE ˜

NAS, תואמסיס AND 密

密 密码 码 码:

CHARACTER ENCODING ISSUES FOR WEB PASSWORDS

Joseph Bonneau Rubin Xu jcb82@cl.cam.ac.uk

Computer Laboratory Web 2.0 Security & Privacy San Francisco, CA, USA May 24, 2012

Bonneau & Xu (University of Cambridge) Character encoding & web passwords May 24, 2012 1 / 26

slide-2
SLIDE 2

How passwords get created

?? correct horse battery staple

correcto caballo pila grapa 马电池主食正确 הסוס הנכון הסוללה מצרך

correct horse battery staple

correct_horse_battery_staple CorrectHorseBatteryStaple CORRECT-HORSE-BATTERY-STAPLE cORRECT hORSE bATTERY sTAPLE

0x636F7272656374...

UTF-8? ASCII? ISO-8859-1?

Bonneau & Xu (University of Cambridge) Character encoding & web passwords May 24, 2012 2 / 26

slide-3
SLIDE 3

Writing systems around the world

Bonneau & Xu (University of Cambridge) Character encoding & web passwords May 24, 2012 3 / 26

slide-4
SLIDE 4

Surprisingly little variation in (weak) passwords!

dictionary de en es fr id it ko pt zh vi global target de 6.5% 3.3% 2.6% 2.9% 2.2% 2.8% 1.6% 2.1% 2.0% 1.6% 3.5% en 4.6% 8.0% 4.2% 4.3% 4.5% 4.3% 3.4% 3.5% 4.4% 3.5% 7.9% es 5.0% 5.6% 12.1% 4.6% 4.1% 6.1% 3.1% 6.3% 3.6% 2.9% 6.9% fr 4.0% 4.2% 3.4% 10.0% 2.9% 3.2% 2.2% 3.1% 2.7% 2.1% 5.0% id 6.3% 8.7% 6.2% 6.3% 14.9% 6.2% 5.8% 6.0% 6.7% 5.9% 9.3% it 6.0% 6.3% 6.8% 5.3% 4.6% 14.6% 3.3% 5.7% 4.0% 3.2% 7.2% ko 2.0% 2.6% 1.9% 1.8% 2.3% 2.0% 5.8% 2.4% 3.7% 2.2% 2.8% pt 3.9% 4.3% 5.8% 3.8% 3.9% 4.4% 3.5% 11.1% 3.9% 2.9% 5.1% zh 1.9% 2.4% 1.7% 1.7% 2.0% 2.0% 2.9% 1.8% 4.4% 2.0% 2.9% vi 5.7% 7.7% 5.5% 5.8% 6.3% 5.7% 6.0% 5.8% 7.0% 14.3% 7.8%

for top 1000 passwords, greatest efficiency loss is only 4.8 (fr/vi)

Bonneau & Xu (University of Cambridge) Character encoding & web passwords May 24, 2012 4 / 26

slide-5
SLIDE 5

Research questions

why is there so little language variation? how do non-English speakers choose passwords? how do websites fail for non-English chraracters? how do users cope with an English-dominated world?

Bonneau & Xu (University of Cambridge) Character encoding & web passwords May 24, 2012 5 / 26

slide-6
SLIDE 6

Character encoding: a mercifully brief history

ASCII (ca 1960)

English subset of Latin alphabet only ≈ 128 code points defined high-order bit preserved for parity checking

ASCII extensions

use high-order bits for extra characters proprietary schemes (Windows code sheets) 1988: ISO 8859 series (16 subsets)

multi-byte encoding schemes

defined for Chinese, Japanese, Korean, and others most use 2 bytes per character

the dawn of the Internet

HTML, HTTP: ISO-8859-1 (Western Latin/Latin-1) DNS: ASCII subset

Bonneau & Xu (University of Cambridge) Character encoding & web passwords May 24, 2012 6 / 26

slide-7
SLIDE 7

Character encoding: a mercifully brief history

ASCII (ca 1960)

English subset of Latin alphabet only ≈ 128 code points defined high-order bit preserved for parity checking

ASCII extensions

use high-order bits for extra characters proprietary schemes (Windows code sheets) 1988: ISO 8859 series (16 subsets)

multi-byte encoding schemes

defined for Chinese, Japanese, Korean, and others most use 2 bytes per character

the dawn of the Internet

HTML, HTTP: ISO-8859-1 (Western Latin/Latin-1) DNS: ASCII subset

Bonneau & Xu (University of Cambridge) Character encoding & web passwords May 24, 2012 6 / 26

slide-8
SLIDE 8

Character encoding: a mercifully brief history

ASCII (ca 1960)

English subset of Latin alphabet only ≈ 128 code points defined high-order bit preserved for parity checking

ASCII extensions

use high-order bits for extra characters proprietary schemes (Windows code sheets) 1988: ISO 8859 series (16 subsets)

multi-byte encoding schemes

defined for Chinese, Japanese, Korean, and others most use 2 bytes per character

the dawn of the Internet

HTML, HTTP: ISO-8859-1 (Western Latin/Latin-1) DNS: ASCII subset

Bonneau & Xu (University of Cambridge) Character encoding & web passwords May 24, 2012 6 / 26

slide-9
SLIDE 9

Character encoding: a mercifully brief history

ASCII (ca 1960)

English subset of Latin alphabet only ≈ 128 code points defined high-order bit preserved for parity checking

ASCII extensions

use high-order bits for extra characters proprietary schemes (Windows code sheets) 1988: ISO 8859 series (16 subsets)

multi-byte encoding schemes

defined for Chinese, Japanese, Korean, and others most use 2 bytes per character

the dawn of the Internet

HTML, HTTP: ISO-8859-1 (Western Latin/Latin-1) DNS: ASCII subset

Bonneau & Xu (University of Cambridge) Character encoding & web passwords May 24, 2012 6 / 26

slide-10
SLIDE 10

Unicode and UTF-8

Unicode

assigns a code point to every character in human writing systems e.g. ~ n → 241 many other features

  • ver 1 M code points defined

UTF-8

assigns code point to a variable number of bytes e.g. 241 (~ n) → 0xc3b1 never allows 0x00 to appear outside code point 0

Bonneau & Xu (University of Cambridge) Character encoding & web passwords May 24, 2012 7 / 26

slide-11
SLIDE 11

Unicode and UTF-8

Unicode

assigns a code point to every character in human writing systems e.g. ~ n → 241 many other features

  • ver 1 M code points defined

UTF-8

assigns code point to a variable number of bytes e.g. 241 (~ n) → 0xc3b1 never allows 0x00 to appear outside code point 0

Bonneau & Xu (University of Cambridge) Character encoding & web passwords May 24, 2012 7 / 26

slide-12
SLIDE 12

Frequency of character encoding schemes today

Bonneau & Xu (University of Cambridge) Character encoding & web passwords May 24, 2012 8 / 26

slide-13
SLIDE 13

The password submission process-step 1

user types password managed by OS/browser code point and encoding known

Bonneau & Xu (University of Cambridge) Character encoding & web passwords May 24, 2012 9 / 26

slide-14
SLIDE 14

The password submission process-step 1

user types password managed by OS/browser code point and encoding known

Bonneau & Xu (University of Cambridge) Character encoding & web passwords May 24, 2012 9 / 26

slide-15
SLIDE 15

The password submission process-step 2

browser transcodes password to page encoding many places for page to specify

HTTP header, HTML header, form attribute

replace with HTML numeric character reference undefined behavior if character entity reference also available!

IE: ~ n → ñ FF/Chrome: ~ n → ñ

Bonneau & Xu (University of Cambridge) Character encoding & web passwords May 24, 2012 9 / 26

slide-16
SLIDE 16

The password submission process-step 3

all characters outside of limited ASCII range are URL-encoded

also called percent encoding

double encoding possible if characters already transcoded direct encoding possible for multipart/formdata form action

Bonneau & Xu (University of Cambridge) Character encoding & web passwords May 24, 2012 9 / 26

slide-17
SLIDE 17

The password submission process-step 3

all characters outside of limited ASCII range are URL-encoded

also called percent encoding

double encoding possible if characters already transcoded direct encoding possible for multipart/formdata form action

encoding of 爱 (love)

encoding submission length GB2312 %B0%AE 6 UTF-8 %E7%88%B1 9 ISO 8859-1 %26%2329233%3B 14

Bonneau & Xu (University of Cambridge) Character encoding & web passwords May 24, 2012 8 / 26

slide-18
SLIDE 18

What sites need to do to support UTF-8 passwords

Bonneau & Xu (University of Cambridge) Character encoding & web passwords May 24, 2012 9 / 26

slide-19
SLIDE 19

What sites need to do to support UTF-8 passwords

NOTHING

Bonneau & Xu (University of Cambridge) Character encoding & web passwords May 24, 2012 9 / 26

slide-20
SLIDE 20

Part 1: what can go wrong

Test of 22 sites: English/UTF-8: Google, Facebook, Microsoft Live, Twitter, Wikipedia,Yahoo! English/ISO-8859-1: Amazon, DeviantArt, Gawker, IMDB, Walmart Chinese/UTF-8: CSDN, Renren, Kaixin001, Sina Weibo, Tianya, Mop, Gamer.com.tw Chinese/GB2312: QQ, Taobao, Baidu, Youku

Bonneau & Xu (University of Cambridge) Character encoding & web passwords May 24, 2012 10 / 26

slide-21
SLIDE 21

Correctly supporting sites

Facebook, Twitter, Wikipedia, DeviantArt1, CSDN, Renren, Kaixin001

1Only non-UTF-8 site Bonneau & Xu (University of Cambridge) Character encoding & web passwords May 24, 2012 11 / 26

slide-22
SLIDE 22

Explicit ban on non-ASCII passwords

UTF-8: Google, Microsoft Live, Yahoo!, Sina Weibo, Tianya

  • ther: Amazon, Taobao, Baidu,Youku

Bonneau & Xu (University of Cambridge) Character encoding & web passwords May 24, 2012 12 / 26

slide-23
SLIDE 23

Counting encoded bytes instead of logical characters

IMDB,Walmart

Bonneau & Xu (University of Cambridge) Character encoding & web passwords May 24, 2012 13 / 26

slide-24
SLIDE 24

Code point truncation

Weibo,QQ call charcodeat() in JavaScript

Bonneau & Xu (University of Cambridge) Character encoding & web passwords May 24, 2012 14 / 26

slide-25
SLIDE 25

Code point truncation

Weibo,QQ call charcodeat() in JavaScript aaaaaaaa = ŁŁŁŁŁŁŁŁ = ssssssss = ≁≁≁≁≁≁≁≁ = 屁屁屁屁屁屁屁屁

Bonneau & Xu (University of Cambridge) Character encoding & web passwords May 24, 2012 14 / 26

slide-26
SLIDE 26

DES-crypt() truncation

Truncation to 8 characters per specfication Gamer.com.tw: 我的中 accepted for 我的中文得很好 underlying bug discovered: ` ACEMOMENT accepted for ` ALAPLAGE

` A → 192 → 0xC380

present in BSD, PHP , PostgresSQL. . .

Bonneau & Xu (University of Cambridge) Character encoding & web passwords May 24, 2012 15 / 26

slide-27
SLIDE 27

DES-crypt() truncation

Truncation to 8 characters per specfication Gamer.com.tw: 我的中 accepted for 我的中文得很好 underlying bug discovered: ` ACEMOMENT accepted for ` ALAPLAGE

` A → 192 → 0xC380

present in BSD, PHP , PostgresSQL. . .

Bonneau & Xu (University of Cambridge) Character encoding & web passwords May 24, 2012 15 / 26

slide-28
SLIDE 28

DES-crypt() truncation

Truncation to 8 characters per specfication Gamer.com.tw: 我的中 accepted for 我的中文得很好 underlying bug discovered: ` ACEMOMENT accepted for ` ALAPLAGE

` A → 192 → 0xC380

present in BSD, PHP , PostgresSQL. . .

Bonneau & Xu (University of Cambridge) Character encoding & web passwords May 24, 2012 15 / 26

slide-29
SLIDE 29

DES-crypt() truncation

Truncation to 8 characters per specfication Gamer.com.tw: 我的中 accepted for 我的中文得很好 underlying bug discovered: ` ACEMOMENT accepted for ` ALAPLAGE

` A → 192 → 0xC380

present in BSD, PHP , PostgresSQL. . .

Bonneau & Xu (University of Cambridge) Character encoding & web passwords May 24, 2012 15 / 26

slide-30
SLIDE 30

Down-conversion in jcrypt()

buggy version of Java implementation of bcrypt() Gawker, Mop: ???????? accepted for 我的中文得很好

Bonneau & Xu (University of Cambridge) Character encoding & web passwords May 24, 2012 16 / 26

slide-31
SLIDE 31

Down-conversion in jcrypt()

buggy version of Java implementation of bcrypt() Gawker, Mop: ???????? accepted for 我的中文得很好

Bonneau & Xu (University of Cambridge) Character encoding & web passwords May 24, 2012 16 / 26

slide-32
SLIDE 32

Down-conversion in jcrypt()

majority of sites don’t support UTF-8 passwords correctly many bugs left to find...

Bonneau & Xu (University of Cambridge) Character encoding & web passwords May 24, 2012 17 / 26

slide-33
SLIDE 33

Part 2: how users cope

Bonneau & Xu (University of Cambridge) Character encoding & web passwords May 24, 2012 18 / 26

slide-34
SLIDE 34

Case study: Chinese

Large leaked data sets now available

70yx-gaming site, 10 M users CSDN-forum site, 6 M users

(nearly) all data in ASCII

graphical Pinyin input disabled for password field

<15% of users enter valid Pinyin passwords 45% numeric only, 90% contain some digits

compare to 15%, 45% for RockYou passwords

11% adjacent keyboard patterns

compare to 3% for RockYou passwords

Bonneau & Xu (University of Cambridge) Character encoding & web passwords May 24, 2012 19 / 26

slide-35
SLIDE 35

Case study: Chinese

Large leaked data sets now available

70yx-gaming site, 10 M users CSDN-forum site, 6 M users

(nearly) all data in ASCII

graphical Pinyin input disabled for password field

<15% of users enter valid Pinyin passwords 45% numeric only, 90% contain some digits

compare to 15%, 45% for RockYou passwords

11% adjacent keyboard patterns

compare to 3% for RockYou passwords

Bonneau & Xu (University of Cambridge) Character encoding & web passwords May 24, 2012 19 / 26

slide-36
SLIDE 36

Case study: Chinese

Large leaked data sets now available

70yx-gaming site, 10 M users CSDN-forum site, 6 M users

(nearly) all data in ASCII

graphical Pinyin input disabled for password field

<15% of users enter valid Pinyin passwords 45% numeric only, 90% contain some digits

compare to 15%, 45% for RockYou passwords

11% adjacent keyboard patterns

compare to 3% for RockYou passwords

Bonneau & Xu (University of Cambridge) Character encoding & web passwords May 24, 2012 19 / 26

slide-37
SLIDE 37

Case study: Chinese

Large leaked data sets now available

70yx-gaming site, 10 M users CSDN-forum site, 6 M users

(nearly) all data in ASCII

graphical Pinyin input disabled for password field

<15% of users enter valid Pinyin passwords 45% numeric only, 90% contain some digits

compare to 15%, 45% for RockYou passwords

11% adjacent keyboard patterns

compare to 3% for RockYou passwords

Bonneau & Xu (University of Cambridge) Character encoding & web passwords May 24, 2012 19 / 26

slide-38
SLIDE 38

Case study: Chinese

Large leaked data sets now available

70yx-gaming site, 10 M users CSDN-forum site, 6 M users

(nearly) all data in ASCII

graphical Pinyin input disabled for password field

<15% of users enter valid Pinyin passwords 45% numeric only, 90% contain some digits

compare to 15%, 45% for RockYou passwords

11% adjacent keyboard patterns

compare to 3% for RockYou passwords

Bonneau & Xu (University of Cambridge) Character encoding & web passwords May 24, 2012 19 / 26

slide-39
SLIDE 39

Case study: Hebrew

תואמסיס

Small leaked data set used

Wondertree-spiritual site, 1K users

2.5% of passwords included Hebrew characters

  • ver 90% of usernames did. . .

40% numeric only, 65% contain some digits

compare to 15%, 45% for RockYou passwords

8% adjacent keyboard patterns

compare to 3% for RockYou passwords

Bonneau & Xu (University of Cambridge) Character encoding & web passwords May 24, 2012 20 / 26

slide-40
SLIDE 40

Case study: Hebrew

תואמסיס

Small leaked data set used

Wondertree-spiritual site, 1K users

2.5% of passwords included Hebrew characters

  • ver 90% of usernames did. . .

40% numeric only, 65% contain some digits

compare to 15%, 45% for RockYou passwords

8% adjacent keyboard patterns

compare to 3% for RockYou passwords

Bonneau & Xu (University of Cambridge) Character encoding & web passwords May 24, 2012 20 / 26

slide-41
SLIDE 41

Case study: Hebrew

תואמסיס

Small leaked data set used

Wondertree-spiritual site, 1K users

2.5% of passwords included Hebrew characters

  • ver 90% of usernames did. . .

40% numeric only, 65% contain some digits

compare to 15%, 45% for RockYou passwords

8% adjacent keyboard patterns

compare to 3% for RockYou passwords

Bonneau & Xu (University of Cambridge) Character encoding & web passwords May 24, 2012 20 / 26

slide-42
SLIDE 42

Case study: Hebrew

תואמסיס

Small leaked data set used

Wondertree-spiritual site, 1K users

2.5% of passwords included Hebrew characters

  • ver 90% of usernames did. . .

40% numeric only, 65% contain some digits

compare to 15%, 45% for RockYou passwords

8% adjacent keyboard patterns

compare to 3% for RockYou passwords

Bonneau & Xu (University of Cambridge) Character encoding & web passwords May 24, 2012 20 / 26

slide-43
SLIDE 43

Case study: Hebrew

תואמסיס

Small leaked data set used

Wondertree-spiritual site, 1K users

2.5% of passwords included Hebrew characters

  • ver 90% of usernames did. . .

40% numeric only, 65% contain some digits

compare to 15%, 45% for RockYou passwords

8% adjacent keyboard patterns

compare to 3% for RockYou passwords

Bonneau & Xu (University of Cambridge) Character encoding & web passwords May 24, 2012 20 / 26

slide-44
SLIDE 44

Hebrew transliteration strategies

Phonetic transliteration

הבהא→ ahava (love)

Keyboard transliteration

ודבלמדועיא→ thigusnkcsu (There is no one else but him)

Bonneau & Xu (University of Cambridge) Character encoding & web passwords May 24, 2012 21 / 26

slide-45
SLIDE 45

Case study: Spanish

Spanish alphabet: mostly English/Latin

~ n considered a letter proper ´ a,´ e,´ ı,´

u used to indicate stress

Tens or hundreds of thousands of Spanish passwords at RockYou

impossible to compute due to cognates

Bonneau & Xu (University of Cambridge) Character encoding & web passwords May 24, 2012 22 / 26

slide-46
SLIDE 46

Spanish transliteration strategies

password meaning proper transliterated ratio ~ n → n contrase~ na password 408 218 34.8% mu~ neca doll 197 354 64.2% cari~ no affection, dear 104 153 59.5% peque~ na little (girl) 87 72 45.2% teextra~ no I miss you 65 27 29.3% ´ a → a teamomam´ a I love you mom 2 151 98.7% ´

  • → o

  • digo

code 5 110 95.7% ´ u → u m´ usica music 2 1447 99.9%

Bonneau & Xu (University of Cambridge) Character encoding & web passwords May 24, 2012 23 / 26

slide-47
SLIDE 47

Spanish transliteration strategies

~ n transliterated about half of the time

varies by password-strongly significant!

stress accents almost always dropped

likely greater than 99% including examples like p´ ajaro (bird)

Bonneau & Xu (University of Cambridge) Character encoding & web passwords May 24, 2012 24 / 26

slide-48
SLIDE 48

Summary

multilingual passwords are poorly supported users rarely make use when they are evidence that security is being harmed

Bonneau & Xu (University of Cambridge) Character encoding & web passwords May 24, 2012 25 / 26

slide-49
SLIDE 49

Future directions

can users enter Chinese passwords securely? how will we cope with mobile devices? more data needed to study linguistic trends

Russian, Arabic, Japanese, Korean, Greek, Hindi, Bengali, etc.

Bonneau & Xu (University of Cambridge) Character encoding & web passwords May 24, 2012 26 / 26

slide-50
SLIDE 50

Thank you

jcb82@cl.cam.ac.uk

major thanks to: Noam Szpiro Elsa Monica Trevi˜ no Ram´ ırez Claudia Diaz (Claudia Maria D´ ıaz Mart´ ınez)