Apache SystemML - Declarative Large-Scale Machine Learning Romeo - - PowerPoint PPT Presentation

apache systemml declarative large scale machine learning
SMART_READER_LITE
LIVE PREVIEW

Apache SystemML - Declarative Large-Scale Machine Learning Romeo - - PowerPoint PPT Presentation

Apache SystemML - Declarative Large-Scale Machine Learning Romeo Kienzler (IBM Waston IoT) Berthold Reinwald (IBM Almaden Research Center) Frederick R. Reiss (IBM Almaden Research Center) Matthias Rieke (IBM Analytics) Swiss Data Science


slide-1
SLIDE 1

Apache SystemML - Declarative Large-Scale Machine Learning

Romeo Kienzler (IBM Waston IoT) Berthold Reinwald (IBM Almaden Research Center) Frederick R. Reiss (IBM Almaden Research Center) Matthias Rieke (IBM Analytics) Swiss Data Science Conference 16 - ZHAW - Winterthur

slide-2
SLIDE 2

–Assembler vs. Python?

“High-level programming”

slide-3
SLIDE 3

Why another lib?

  • Custom machine learning algorithms
  • Declarative ML
  • Transparent distribution on data-parallel framework
  • Scale-up
  • Scale-out
  • Cost-based optimiser generates low level execution

plans

slide-4
SLIDE 4

Why on Spark?

  • Unification of SQL, Graph, Stream, ML
  • Common RDD structure
  • General DAG execution engine
  • lazy evaluation
  • distributed in-memory caching
slide-5
SLIDE 5

2009 2008 2007

2007-2008: Multiple projects at IBM Research – Almaden involving machine learning on Hadoop.

2010

2009-2010: Through engagements with customers, we observe how data scientists create ML solutions. 2009: We form a dedicated team for scalable ML

slide-6
SLIDE 6

2014 2013 2012 2011

Research

slide-7
SLIDE 7

2016 2015

June 2015: IBM Announces open- source SystemML September 2015: Code available on Github November 2015: SystemML enters Apache incubation June 2016: Second Apache release (0.10) February 2016: First release (0.9) of Apache SystemML

slide-8
SLIDE 8

SystemML at

Moved from Hadoop MapReduce to Spark

SystemML supports both frameworks Exact same code 300X faster on 1/40th as many nodes

slide-9
SLIDE 9

R or Python Data Scientist

Results

Systems Programmer Scala

slide-10
SLIDE 10

Products Customers i j

Customer i bought product j.

Alternating Least Squares

slide-11
SLIDE 11

Products Customers i j

Customer i bought product j.

Alternating Least Squares

slide-12
SLIDE 12

Products Customers i j

Customer i bought product j.

Alternating Least Squares

slide-13
SLIDE 13

Products Customers i j

Customer i bought product j.

Products Factor Customers Factor

slide-14
SLIDE 14

Products Customers i j

Customer i bought product j.

Products Factor Customers Factor

slide-15
SLIDE 15

Products Customers i j

Customer i bought product j.

Products Factor Customers Factor

slide-16
SLIDE 16

Products Customers i j

Customer i bought product j.

Products Factor Customers Factor

Multiply these two factors to produce a less- sparse matrix. ×
slide-17
SLIDE 17

Products Customers i j

Customer i bought product j.

Products Factor Customers Factor

Multiply these two factors to produce a less- sparse matrix. ×

New nonzero values become product suggestions.

slide-18
SLIDE 18
slide-19
SLIDE 19 U"="rand(nrow(X),"r,"min"="01.0,"max"="1.0);"" V"="rand(r,"ncol(X),"min"="01.0,"max"="1.0);"" while(i"<"mi)"{" """i"="i"+"1;"ii"="1;" """if"(is_U)" """"""G"="(W"*"(U"%*%"V"0"X))"%*%"t(V)"+"lambda"*"U;" """else" """"""G"="t(U)"%*%"(W"*"(U"%*%"V"0"X))"+"lambda"*"V;" """norm_G2"="sum(G"^"2);"norm_R2"="norm_G2;""""" """R"="0G;"S"="R;" """while(norm_R2">"10E09"*"norm_G2"&"ii"<="mii)"{" """""if"(is_U)"{" """""""HS"="(W"*"(S"%*%"V))"%*%"t(V)"+"lambda"*"S;" """""""alpha"="norm_R2"/"sum"(S"*"HS);" """""""U"="U"+"alpha"*"S;""" """""}"else"{" """""""HS"="t(U)"%*%"(W"*"(U"%*%"S))"+"lambda"*"S;" """""""alpha"="norm_R2"/"sum"(S"*"HS);" """""""V"="V"+"alpha"*"S;""" """""}" """""R"="R"0"alpha"*"HS;" """""old_norm_R2"="norm_R2;"norm_R2"="sum(R"^"2);" """""S"="R"+"(norm_R2"/"old_norm_R2)"*"S;" """""ii"="ii"+"1;" """}""" """is_U"="!"is_U;" }"
slide-20
SLIDE 20 U"="rand(nrow(X),"r,"min"="01.0,"max"="1.0);"" V"="rand(r,"ncol(X),"min"="01.0,"max"="1.0);"" while(i"<"mi)"{" """i"="i"+"1;"ii"="1;" """if"(is_U)" """"""G"="(W"*"(U"%*%"V"0"X))"%*%"t(V)"+"lambda"*"U;" """else" """"""G"="t(U)"%*%"(W"*"(U"%*%"V"0"X))"+"lambda"*"V;" """norm_G2"="sum(G"^"2);"norm_R2"="norm_G2;""""" """R"="0G;"S"="R;" """while(norm_R2">"10E09"*"norm_G2"&"ii"<="mii)"{" """""if"(is_U)"{" """""""HS"="(W"*"(S"%*%"V))"%*%"t(V)"+"lambda"*"S;" """""""alpha"="norm_R2"/"sum"(S"*"HS);" """""""U"="U"+"alpha"*"S;""" """""}"else"{" """""""HS"="t(U)"%*%"(W"*"(U"%*%"S))"+"lambda"*"S;" """""""alpha"="norm_R2"/"sum"(S"*"HS);" """""""V"="V"+"alpha"*"S;""" """""}" """""R"="R"0"alpha"*"HS;" """""old_norm_R2"="norm_R2;"norm_R2"="sum(R"^"2);" """""S"="R"+"(norm_R2"/"old_norm_R2)"*"S;" """""ii"="ii"+"1;" """}""" """is_U"="!"is_U;" }"
slide-21
SLIDE 21 U"="rand(nrow(X),"r,"min"="01.0,"max"="1.0);"" V"="rand(r,"ncol(X),"min"="01.0,"max"="1.0);"" while(i"<"mi)"{" """i"="i"+"1;"ii"="1;" """if"(is_U)" """"""G"="(W"*"(U"%*%"V"0"X))"%*%"t(V)"+"lambda"*"U;" """else" """"""G"="t(U)"%*%"(W"*"(U"%*%"V"0"X))"+"lambda"*"V;" """norm_G2"="sum(G"^"2);"norm_R2"="norm_G2;""""" """R"="0G;"S"="R;" """while(norm_R2">"10E09"*"norm_G2"&"ii"<="mii)"{" """""if"(is_U)"{" """""""HS"="(W"*"(S"%*%"V))"%*%"t(V)"+"lambda"*"S;" """""""alpha"="norm_R2"/"sum"(S"*"HS);" """""""U"="U"+"alpha"*"S;""" """""}"else"{" """""""HS"="t(U)"%*%"(W"*"(U"%*%"S))"+"lambda"*"S;" """""""alpha"="norm_R2"/"sum"(S"*"HS);" """""""V"="V"+"alpha"*"S;""" """""}" """""R"="R"0"alpha"*"HS;" """""old_norm_R2"="norm_R2;"norm_R2"="sum(R"^"2);" """""S"="R"+"(norm_R2"/"old_norm_R2)"*"S;" """""ii"="ii"+"1;" """}""" """is_U"="!"is_U;" }"
slide-22
SLIDE 22 U"="rand(nrow(X),"r,"min"="01.0,"max"="1.0);"" V"="rand(r,"ncol(X),"min"="01.0,"max"="1.0);"" while(i"<"mi)"{" """i"="i"+"1;"ii"="1;" """if"(is_U)" """"""G"="(W"*"(U"%*%"V"0"X))"%*%"t(V)"+"lambda"*"U;" """else" """"""G"="t(U)"%*%"(W"*"(U"%*%"V"0"X))"+"lambda"*"V;" """norm_G2"="sum(G"^"2);"norm_R2"="norm_G2;""""" """R"="0G;"S"="R;" """while(norm_R2">"10E09"*"norm_G2"&"ii"<="mii)"{" """""if"(is_U)"{" """""""HS"="(W"*"(S"%*%"V))"%*%"t(V)"+"lambda"*"S;" """""""alpha"="norm_R2"/"sum"(S"*"HS);" """""""U"="U"+"alpha"*"S;""" """""}"else"{" """""""HS"="t(U)"%*%"(W"*"(U"%*%"S))"+"lambda"*"S;" """""""alpha"="norm_R2"/"sum"(S"*"HS);" """""""V"="V"+"alpha"*"S;""" """""}" """""R"="R"0"alpha"*"HS;" """""old_norm_R2"="norm_R2;"norm_R2"="sum(R"^"2);" """""S"="R"+"(norm_R2"/"old_norm_R2)"*"S;" """""ii"="ii"+"1;" """}""" """is_U"="!"is_U;" }"
slide-23
SLIDE 23 U"="rand(nrow(X),"r,"min"="01.0,"max"="1.0);"" V"="rand(r,"ncol(X),"min"="01.0,"max"="1.0);"" while(i"<"mi)"{" """i"="i"+"1;"ii"="1;" """if"(is_U)" """"""G"="(W"*"(U"%*%"V"0"X))"%*%"t(V)"+"lambda"*"U;" """else" """"""G"="t(U)"%*%"(W"*"(U"%*%"V"0"X))"+"lambda"*"V;" """norm_G2"="sum(G"^"2);"norm_R2"="norm_G2;""""" """R"="0G;"S"="R;" """while(norm_R2">"10E09"*"norm_G2"&"ii"<="mii)"{" """""if"(is_U)"{" """""""HS"="(W"*"(S"%*%"V))"%*%"t(V)"+"lambda"*"S;" """""""alpha"="norm_R2"/"sum"(S"*"HS);" """""""U"="U"+"alpha"*"S;""" """""}"else"{" """""""HS"="t(U)"%*%"(W"*"(U"%*%"S))"+"lambda"*"S;" """""""alpha"="norm_R2"/"sum"(S"*"HS);" """""""V"="V"+"alpha"*"S;""" """""}" """""R"="R"0"alpha"*"HS;" """""old_norm_R2"="norm_R2;"norm_R2"="sum(R"^"2);" """""S"="R"+"(norm_R2"/"old_norm_R2)"*"S;" """""ii"="ii"+"1;" """}""" """is_U"="!"is_U;" }"
slide-24
SLIDE 24

Every line has a clear purpose!

slide-25
SLIDE 25

https://github.com/apache/spark/blob/master/ mllib/src/main/scala/org/apache/spark/mllib/ recommendation/ALS.scala

slide-26
SLIDE 26

25 lines’ worth of algorithm… …mixed with 800 lines of performance code

https://github.com/apache/spark/blob/master/ mllib/src/main/scala/org/apache/spark/mllib/ recommendation/ALS.scala

slide-27
SLIDE 27 U"="rand(nrow(X),"r,"min"="01.0,"max"="1.0);"" V"="rand(r,"ncol(X),"min"="01.0,"max"="1.0);"" while(i"<"mi)"{" """i"="i"+"1;"ii"="1;" """if"(is_U)" """"""G"="(W"*"(U"%*%"V"0"X))"%*%"t(V)"+"lambda"*"U;" """else" """"""G"="t(U)"%*%"(W"*"(U"%*%"V"0"X))"+"lambda"*"V;" """norm_G2"="sum(G"^"2);"norm_R2"="norm_G2;""""" """R"="0G;"S"="R;" """while(norm_R2">"10E09"*"norm_G2"&"ii"<="mii)"{" """""if"(is_U)"{" """""""HS"="(W"*"(S"%*%"V))"%*%"t(V)"+"lambda"*"S;" """""""alpha"="norm_R2"/"sum"(S"*"HS);" """""""U"="U"+"alpha"*"S;""" """""}"else"{" """""""HS"="t(U)"%*%"(W"*"(U"%*%"S))"+"lambda"*"S;" """""""alpha"="norm_R2"/"sum"(S"*"HS);" """""""V"="V"+"alpha"*"S;""" """""}" """""R"="R"0"alpha"*"HS;" """""old_norm_R2"="norm_R2;"norm_R2"="sum(R"^"2);" """""S"="R"+"(norm_R2"/"old_norm_R2)"*"S;" """""ii"="ii"+"1;" """}""" """is_U"="!"is_U;" }"
slide-28
SLIDE 28 U"="rand(nrow(X),"r,"min"="01.0,"max"="1.0);"" V"="rand(r,"ncol(X),"min"="01.0,"max"="1.0);"" while(i"<"mi)"{" """i"="i"+"1;"ii"="1;" """if"(is_U)" """"""G"="(W"*"(U"%*%"V"0"X))"%*%"t(V)"+"lambda"*"U;" """else" """"""G"="t(U)"%*%"(W"*"(U"%*%"V"0"X))"+"lambda"*"V;" """norm_G2"="sum(G"^"2);"norm_R2"="norm_G2;""""" """R"="0G;"S"="R;" """while(norm_R2">"10E09"*"norm_G2"&"ii"<="mii)"{" """""if"(is_U)"{" """""""HS"="(W"*"(S"%*%"V))"%*%"t(V)"+"lambda"*"S;" """""""alpha"="norm_R2"/"sum"(S"*"HS);" """""""U"="U"+"alpha"*"S;""" """""}"else"{" """""""HS"="t(U)"%*%"(W"*"(U"%*%"S))"+"lambda"*"S;" """""""alpha"="norm_R2"/"sum"(S"*"HS);" """""""V"="V"+"alpha"*"S;""" """""}" """""R"="R"0"alpha"*"HS;" """""old_norm_R2"="norm_R2;"norm_R2"="sum(R"^"2);" """""S"="R"+"(norm_R2"/"old_norm_R2)"*"S;" """""ii"="ii"+"1;" """}""" """is_U"="!"is_U;" }"

SystemML: compile and run at scale no performance code needed!

slide-29
SLIDE 29

5000 10000 15000 20000 1.2GB (sparse binary) 12GB 120GB Running Time (sec) R MLLib SystemML >24h >24h

OOM OOM

slide-30
SLIDE 30

Architecture

SystemML Optimizer

High-Level Algorithm Parallel Spark Program

slide-31
SLIDE 31

Architecture

High-Level Operations (HOPs) General representation of statements in the data analysis language Low-Level Operations (LOPs) General representation of operations in the runtime framework

High-level language front-ends Multiple execution environments

Cost Based Optimizer

slide-32
SLIDE 32
slide-33
SLIDE 33 W S U U × S

*

( (

t(U) t(U)×(W*(U×S)))(

×

Large dense intermediate Can compute directly from U, S, and W!

t(U)(%*%((W(*((U(%*%(S))(

wdivmm W U S 1.2GB
 sparse 800MB
 dense 800MB
 dense 800MB
 dense

(weighted divide matrix multiplication)

slide-34
SLIDE 34 %*% W U S * t() %*% 1.2GB
 sparse 80GB
 dense 80GB
 dense 800MB
 dense 800MB
 dense 800MB
 dense 800MB
 dense

All operands fit into heap ! use one node

WDivMM (MapWDivMM)

slide-35
SLIDE 35
slide-36
SLIDE 36

Browse the source!

slide-37
SLIDE 37

Browse the source! Try out some tutorials!

slide-38
SLIDE 38

Browse the source! Try out some tutorials! Contribute to the project!

slide-39
SLIDE 39

Browse the source! Try out some tutorials! Contribute to the project! Download the binary release!

slide-40
SLIDE 40

Demo