SCALING GILT From Monolith Ruby App to Distributed Scala - - PowerPoint PPT Presentation

scaling gilt
SMART_READER_LITE
LIVE PREVIEW

SCALING GILT From Monolith Ruby App to Distributed Scala - - PowerPoint PPT Presentation

SCALING GILT From Monolith Ruby App to Distributed Scala Micro-Services QCon - Brooklyn - 2014 Yoni (Jonathan) Goldberg - GiltDirect, Sale Personalization, Loyalty, SEO, Post-purchase, Login/Registration - MIT CS BS/Meng | Google | IBM | IDF


slide-1
SLIDE 1

SCALING GILT

From Monolith Ruby App to Distributed Scala Micro-Services QCon - Brooklyn - 2014 Yoni (Jonathan) Goldberg

slide-2
SLIDE 2
  • GiltDirect, Sale Personalization, Loyalty,

SEO, Post-purchase, Login/Registration

  • MIT CS BS/Meng | Google | IBM | IDF
  • Israel | Brooklyn | Coffee | JS/Node |

Arduino | Running | Kite Surfing | Poker

slide-3
SLIDE 3

The lessons and challenges that we had/have with micro-service architecture

slide-4
SLIDE 4
slide-5
SLIDE 5

Flash Sales Business Founded in 2007 Top 50 Internet-Retailer ~150 Engineers

WHAT IS GILT?

slide-6
SLIDE 6
slide-7
SLIDE 7

ANOTHER WAY TO LOOK AT GILT

slide-8
SLIDE 8

THE CLASSIC STARTUP STORY

slide-9
SLIDE 9

THE EARLY DAYS

2007 - Ruby on Rails the hottest new thing The goal was to get to market fast

slide-10
SLIDE 10
slide-11
SLIDE 11

We were able to handle our traffic pretty well

slide-12
SLIDE 12

UNTIL LOUBOUTIN CAME TO GILT

slide-13
SLIDE 13

TECHNOLOGY PAIN POINTS - 2009

Spike required to launch 1,000s of ruby processes Postgres was overloaded Routing traffic between ruby processes sucked

|Note to self| hide from the ruby fans

slide-14
SLIDE 14

DEV PAIN POINTS

1000 Models/Controllers, 200K LOC, 100s of jobs Lots of contributors + no ownership Difficult deployments with long integration cycles Hard to identify root causes

slide-15
SLIDE 15

WE NEEDED TO SOLVE THE PROBLEM FAST

slide-16
SLIDE 16

THREE THINGS HAPPENED

Started the transition to the JVM M(a/i)cro-Service Era Started Dedicated data stores

slide-17
SLIDE 17

WHY JVM?

Widely adopted Stable Better support for concurrency Better GC vs MRI

slide-18
SLIDE 18

FIRST 10 SERVICES

slide-19
SLIDE 19
slide-20
SLIDE 20

We solved 90% of our arch scaling problem But not the Dev points

slide-21
SLIDE 21

SOLVED PAIN POINTS

Spike required to launch 1,000s of ruby processes Postgres was overloaded Routing traffic between ruby processes sucked

slide-22
SLIDE 22

STILL OPEN PAIN POINTS

New services became semi-monolithic 1000 Models/Controllers, 200K LOC, 100s of jobs Lots of contributors + no ownership Difficult deployments with long integration cycles

slide-23
SLIDE 23

WHY WE DOUBLED DOWN ON MICRO-SERVICES

Empower teams and ownership Smaller scope Simpler and Easier deployments and rollbacks

slide-24
SLIDE 24

As of last week we have around 400 services in Prod

slide-25
SLIDE 25

We began the transition to Scala and Play LOSA - Lots Of Small (Web) Apps

Same as micro-services but for web-apps

slide-26
SLIDE 26

DEMO

slide-27
SLIDE 27
slide-28
SLIDE 28
slide-29
SLIDE 29
slide-30
SLIDE 30
slide-31
SLIDE 31
slide-32
SLIDE 32
slide-33
SLIDE 33
slide-34
SLIDE 34

why the increase?

slide-35
SLIDE 35

APP BOOTSTRAP

rake bootstrap:admin-web # Bootstrap a admin-web service rake bootstrap:babylon-docs # Bootstrap a babylon-docs service rake bootstrap:client-server-core # Bootstrap a client-server-core service rake bootstrap:jersey-java # Bootstrap a jersey-java service rake bootstrap:jersey-scala # Bootstrap a jersey-scala service rake bootstrap:play # Bootstrap a play service rake bootstrap:play-ui-build # Bootstrap a play-ui-build service rake bootstrap:sbt-library # Bootstrap a sbt-library service rake bootstrap:schema # Bootstrap a schema service

slide-36
SLIDE 36

HOW TO DEFINE A MICROSERVICE?

Functionality scope Number of devs involved

slide-37
SLIDE 37
slide-38
SLIDE 38

NEW CHALLENGES

Deployments and Testing (Functional/Integration) Dev/Integration Environments Who owns this service!? Monitoring

slide-39
SLIDE 39

ON DEPLOYMENTS AND TESTING

"Testing is HARD" - the dev that sits on your left

slide-40
SLIDE 40

THE CHALLENGES THAT WE FACED:

Hard to execute functional tests between services Frustrating to deploy semi-manually (Capistrano) Scary to deploy other teams services

slide-41
SLIDE 41

SBT

Motivation: Scala adaption Complex Scala syntax Cool features: ~test, shell, console Hard to debug

slide-42
SLIDE 42

GILT-SBT-BUILD

Simple config for all the services Pulls many plugins: [nexus, testing, RPMs, run scripts, Monitoring, SemVer, ...] Custom commands (e.g 'sbt release')

slide-43
SLIDE 43

ION-CANNON + SBT

Run tests on dedicated Env Supports Canary releases Easy rollbacks Integrated health checks

slide-44
SLIDE 44
slide-45
SLIDE 45

On Dev/Integration Environments

The hardware is not strong enough No one wants to compile 20 services Service Dependencies

slide-46
SLIDE 46

EACH TEAM HAS A STAGING ENV

SERVICE_PORTS=[ 4001, #listing-service 8235, #svc-user-set 9420, #svc-free-fall 7895, #svc-Loyalty 8155, #web-loyalty 9410, #web inventory status 7898, #admin-loyalty 7899, #notification 7102, #rouge 9530, #svc-component 6802, #svc-waitlist-submit 4066, #svc-action-sale ....

slide-47
SLIDE 47

STAGING DIFFICULTIES:

Hard to keep all the services up to date Maxed our staging env capacities Requires to have internet connection for some of the services (e.g LOSA-apps)

slide-48
SLIDE 48

Dependency Fun [Demo]

slide-49
SLIDE 49
slide-50
SLIDE 50
slide-51
SLIDE 51
slide-52
SLIDE 52

THE FUTURE

GO Reactive

slide-53
SLIDE 53

Docker An extension to Linux Containers (LXC)

Decentralization Simple Configurations Much lighter than a VM Immutable Supports multiple platforms

slide-54
SLIDE 54

ON OWNERSHIP

"code stays much longer than people" - SB

slide-55
SLIDE 55

CODE OWNERSHIP

slide-56
SLIDE 56

CURRENT APPROACH

Code Review!Code Review!Code Review! Team owns services, not individual developers Ownership transfer

slide-57
SLIDE 57
slide-58
SLIDE 58
slide-59
SLIDE 59

DATA OWNERSHIP

slide-60
SLIDE 60

WE TRANSITIONED TO MICRO- DBS

Third of the services have their own MongoDB | Postgres | Voldemort

slide-61
SLIDE 61

MANAGE MICRO-RELATIONAL DBS SCHEMA EVOLUTION MANAGER

https://github.com/gilt/schema- evolution-manager

slide-62
SLIDE 62

PRINCIPLES OF SCHEMA EVOLUTION MANAGER

Can manage the schema evolutions in a Git repo Schema changes are deployed as tar flies No rollbacks Schema changes are required to be incremental

slide-63
SLIDE 63

ON MONITORING

slide-64
SLIDE 64
slide-65
SLIDE 65

THE TOOLS WE USE

graphite / openTSDB

slide-66
SLIDE 66
slide-67
SLIDE 67
slide-68
SLIDE 68
slide-69
SLIDE 69
slide-70
SLIDE 70
slide-71
SLIDE 71

Cheat Sheet

Your organization has > 30 developers Deployments and integrations are difficult [You need a team for that] You can abstractly separate features and parts of your site Special hardware or performance needs for some features

slide-72
SLIDE 72

MAIN TAKEAWAYS

Simplicity - Do you really need it? MicroServices promise works for most cases As of 2014 - You will need to invest in Tools! We feel that it was the right choice for us

slide-73
SLIDE 73

WHAT'S NEXT ? BUILD YOUR NEXT FEATURE IN A NEW SERVICE

slide-74
SLIDE 74

@yoni_goldberg jgoldberg@gilt.com

QUESTION TIME

We are hiring... www.yonigoldberg.com

slide-75
SLIDE 75

SCALA BREAK

slide-76
SLIDE 76

Why switch to Scala from Java

Object-Functional Programming Akka Immutability that leads to easier concurrency Great libraries: like Salat, Scalaz Less boilerplate code - e.g Case classes, App Scala's Collections

slide-77
SLIDE 77

Traits Cake Pattern Console SBT (in scala, release process) Option