CS 754 Advanced Distributed Systems Overview Intro Samer - - PowerPoint PPT Presentation
CS 754 Advanced Distributed Systems Overview Intro Samer - - PowerPoint PPT Presentation
CS 754 Advanced Distributed Systems Overview Intro Samer AlKiswany PhD, UBC, 2013 Postdoc, U. Wisconsin Madison 5 internships at: Microsoft research labs, IBM Research, NEC Labs, Argonne National Labs Research Interests
Samer Al‐Kiswany
- PhD, UBC, 2013
- Postdoc, U. Wisconsin ‐ Madison
- 5 internships at: Microsoft research labs, IBM Research,
NEC Labs, Argonne National Labs Research Interests Storage and file systems, operating systems, distributed systems, cloud computing, data processing engines, high performance computing.
Intro
Distributed Software Systems
Computation Landscape
- Provide an easy to
use abstractions
- Hide complexities
- Handle failures
- Efficiently use
resources
- Leverage emerging
technologies
Course Overview
How to build systems that are:
- High‐performance
- Scalable
- Reliable
- Secure
- Easy to manage
- Useful
Reality: Very hard and complex task. But: What is hard about it?
Communication
- UDP
- TCP
- Messaging or pub/sub systems
- Remote procedure calls (RPC), remote method invocation (RMI).
Fault Tolerance
Failure model: partial failure. Goal: continue running correctly (maybe slower) Fault tolerance main questions: Where to recover? End‐to‐end principle: keep network core simple/fast, application features reside in the end nodes, not in network. e.g., reliability, security Which state to recover to? Depends on application. e.g., bank, Facebook When to recover? Eager, lazy, when needed?
Concurrency
Modern systems are fundamentally concurrent. Goal: Utilize multiple levels of concurrency: data center, cluster, node, multi CPUs, CPU cores, and accelerators, to build faster systems. Challenge: correctness (consistency).
Topics
- Distributed middleware
- Fault tolerance
- Consensus
- Storage systems
- Scalability
- Scheduling
- Security
- Data processing engines
- Case studies of production systems
Course Format
- Lecture‐based
- 3 Mini projects
- Assignments
Lectures are a mix of:
- Core algorithms and techniques
- Case studies from core systems at Google, Facebook, and
Amazon.
Course Focus and Objectives
How to build high-performance, scalable, reliable, secure, easy to manage, and useful systems. Objectives
- Gain deep theoretical background
- Gain hands on experience
- Gain research experience
Start your career in systems research with confidence, or gain skills that are in high-demand in the industry.
https://cs.uwaterloo.ca/~alkiswan/Classes/CS754‐20F