CS 754 Advanced Distributed Systems Overview Intro Samer - - PowerPoint PPT Presentation

cs 754 advanced distributed systems overview intro
SMART_READER_LITE
LIVE PREVIEW

CS 754 Advanced Distributed Systems Overview Intro Samer - - PowerPoint PPT Presentation

CS 754 Advanced Distributed Systems Overview Intro Samer AlKiswany PhD, UBC, 2013 Postdoc, U. Wisconsin Madison 5 internships at: Microsoft research labs, IBM Research, NEC Labs, Argonne National Labs Research Interests


slide-1
SLIDE 1

CS 754 Advanced Distributed Systems Overview

slide-2
SLIDE 2

Samer Al‐Kiswany

  • PhD, UBC, 2013
  • Postdoc, U. Wisconsin ‐ Madison
  • 5 internships at: Microsoft research labs, IBM Research,

NEC Labs, Argonne National Labs Research Interests Storage and file systems, operating systems, distributed systems, cloud computing, data processing engines, high performance computing.

Intro

slide-3
SLIDE 3

Distributed Software Systems

Computation Landscape

  • Provide an easy to

use abstractions

  • Hide complexities
  • Handle failures
  • Efficiently use

resources

  • Leverage emerging

technologies

slide-4
SLIDE 4

Course Overview

How to build systems that are:

  • High‐performance
  • Scalable
  • Reliable
  • Secure
  • Easy to manage
  • Useful

Reality: Very hard and complex task. But: What is hard about it?

slide-5
SLIDE 5

Communication

  • UDP
  • TCP
  • Messaging or pub/sub systems
  • Remote procedure calls (RPC), remote method invocation (RMI).
slide-6
SLIDE 6

Fault Tolerance

Failure model: partial failure. Goal: continue running correctly (maybe slower) Fault tolerance main questions: Where to recover? End‐to‐end principle: keep network core simple/fast, application features reside in the end nodes, not in network. e.g., reliability, security Which state to recover to? Depends on application. e.g., bank, Facebook When to recover? Eager, lazy, when needed?

slide-7
SLIDE 7

Concurrency

Modern systems are fundamentally concurrent. Goal: Utilize multiple levels of concurrency: data center, cluster, node, multi CPUs, CPU cores, and accelerators, to build faster systems. Challenge: correctness (consistency).

slide-8
SLIDE 8
slide-9
SLIDE 9

Topics

  • Distributed middleware
  • Fault tolerance
  • Consensus
  • Storage systems
  • Scalability
  • Scheduling
  • Security
  • Data processing engines
  • Case studies of production systems
slide-10
SLIDE 10

Course Format

  • Lecture‐based
  • 3 Mini projects
  • Assignments

Lectures are a mix of:

  • Core algorithms and techniques
  • Case studies from core systems at Google, Facebook, and

Amazon.

slide-11
SLIDE 11

Course Focus and Objectives

How to build high-performance, scalable, reliable, secure, easy to manage, and useful systems. Objectives

  • Gain deep theoretical background
  • Gain hands on experience
  • Gain research experience

Start your career in systems research with confidence, or gain skills that are in high-demand in the industry.

slide-12
SLIDE 12

https://cs.uwaterloo.ca/~alkiswan/Classes/CS754‐20F