DryadLINQ
A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language Arman Idani 14 Feb 2012 R202 – Data Centric Networking
DryadLINQ A System for General-Purpose Distributed Data-Parallel - - PowerPoint PPT Presentation
DryadLINQ A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language Arman Idani 14 Feb 2012 R202 Data Centric Networking Background Major Distributed Computing Frameworks MapReduce Dryad
A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language Arman Idani 14 Feb 2012 R202 – Data Centric Networking
Building Blocks (original paper)
Processing vertices
Channels (file, pipe, shared memory) Inputs Outputs
Local machine .Net program (C#, VB, F#, etc) Execution engines
Query Objects
PLINQ LINQ-to-SQL LINQ-to-Obj
LINQ provider interface Scalability
Single-core Multi-core
programs for a computer cluster?
single computer
.Net program (C#, VB, F#, etc)
PLINQ
Local machine Execution engines
Query Objects
LINQ-to-SQL DryadLINQ LINQ-to-Obj
LINQ provider interface Scalability
Single-core Multi-core Cluster
Query
DryadLINQ PLINQ
Subquery
DryadLINQ
Subquery Subquery Subquery Subquery Subquery
Query
LINQ-to-SQL LINQ-to-SQL
PLINQ
Query
DryadLINQ
Local machine Cluster
LINQ-to-Object
debug production
Computers 1 2 10 20 40 80 240 Time 119 241 242 245 271 294 319 Data Sorted (GB) 3.87 7.74 38.7 77.4 154.8 309.6 926.4 GB/s 0.03 0.03 0.16 0.32 0.57 1.16 2.90 Local One switch More than one switch
table in Dryad and DryadLINQ
updates)
Azure, Cosmos DFS)