CS 754 Advanced Distributed Systems Overview Intro Samer - - PowerPoint PPT Presentation

▶

Dec 16, 2023 286 likes •419 views

CS 754 Advanced Distributed Systems Overview Intro Samer AlKiswany PhD, UBC, 2013 Postdoc, U. Wisconsin Madison 5 internships at: Microsoft research labs, IBM Research, NEC Labs, Argonne National Labs Research Interests

SLIDE 1

CS 754 Advanced Distributed Systems Overview

SLIDE 2

Samer Al‐Kiswany

PhD, UBC, 2013
Postdoc, U. Wisconsin ‐ Madison
5 internships at: Microsoft research labs, IBM Research,

NEC Labs, Argonne National Labs Research Interests Storage and file systems, operating systems, distributed systems, cloud computing, data processing engines, high performance computing.

Intro

SLIDE 3

Distributed Software Systems

Computation Landscape

Provide an easy to

use abstractions

Hide complexities
Handle failures
Efficiently use

resources

Leverage emerging

technologies

SLIDE 4

Course Overview

How to build systems that are:

High‐performance
Scalable
Reliable
Secure
Easy to manage
Useful

Reality: Very hard and complex task. But: What is hard about it?

SLIDE 5

Communication

UDP
TCP
Messaging or pub/sub systems
Remote procedure calls (RPC), remote method invocation (RMI).

SLIDE 6

Fault Tolerance

Failure model: partial failure. Goal: continue running correctly (maybe slower) Fault tolerance main questions: Where to recover? End‐to‐end principle: keep network core simple/fast, application features reside in the end nodes, not in network. e.g., reliability, security Which state to recover to? Depends on application. e.g., bank, Facebook When to recover? Eager, lazy, when needed?

SLIDE 7

Concurrency

Modern systems are fundamentally concurrent. Goal: Utilize multiple levels of concurrency: data center, cluster, node, multi CPUs, CPU cores, and accelerators, to build faster systems. Challenge: correctness (consistency).

SLIDE 8

SLIDE 9

Topics

Distributed middleware
Fault tolerance
Consensus
Storage systems
Scalability
Scheduling
Security
Data processing engines
Case studies of production systems

SLIDE 10

Course Format

Lecture‐based
3 Mini projects
Assignments

Lectures are a mix of:

Core algorithms and techniques
Case studies from core systems at Google, Facebook, and

Amazon.

SLIDE 11

Course Focus and Objectives

How to build high-performance, scalable, reliable, secure, easy to manage, and useful systems. Objectives

Gain deep theoretical background
Gain hands on experience
Gain research experience

Start your career in systems research with confidence, or gain skills that are in high-demand in the industry.

SLIDE 12

https://cs.uwaterloo.ca/~alkiswan/Classes/CS754‐20F