SLIDE 1 Performance Modeling of High Performance Computing [HPC]
- S. Amirhossein Abtahizadeh
March 18th , 2016
SLIDE 2
Section 1 Failed Attempt
2
SLIDE 3
Section 1 Ontological Performance Modeling
3
SLIDE 4
Section 1 Ontology Internet Of Things (IoT) Semantic Web Web of Data
4
SLIDE 5
Section 1 Web of Data Web 1.0: HTTP Web 2.0: Social Networks Web 3.0: Web of Data
5
SLIDE 6
Section 1 Web of Data Collections of data Distributed among machines How to present data?
6
SLIDE 7 Section 1
<?xml version="1.0"?> <!DOCTYPE Ontology [ <!ENTITY xsd "http://www.w3.org/XMLSchema#" > ]> <owlx:Ontology owlx:name="http://www.example.org/wine" xmlns:owlx="http://www.w3.org/2003/05/owl-xml"> </owlx:Ontology>
7
SLIDE 8
Section 1 Web of Data
Resource Description Framework (RDF) Query another dataset (OWL)
To understand what is “Wine”?
8
SLIDE 9
Section 1 Semantic Web Interconnected Ontologies Languages: RDF, XML, Turtle Query: SPARQL
9
SLIDE 10
Section 1 Semantic Web Google:
“polymtl gigl professors” 10
SLIDE 11 Section 1 Semantic Web What about:
“How many professors at polymtl are working
- n research topic cloud computing now?”
11
SLIDE 12
Section 1 SPARQL For machines! Not human
12
SLIDE 13 Section 1
PREFIX ex: <http://example.com/exampleOntology#> SELECT ?capital ?country WHERE { ?x ex:cityname ?capital ; ex:isCapitalOf ?y . ?y ex:countryname ?country ; ex:isInContinent ex:Africa . }
13
SLIDE 14 Section 1 Internet of Things (IoT) Semantic Web Infrastructure Interconnected machines Raspberry PI*
14
* h.ps://www.raspberrypi.org
SLIDE 15
Section 1 Research Objective: Present “performance” with Ontology
15
SLIDE 16
Section 1 Research Objective: Share common understanding of Application Performance
16
SLIDE 17
Section 1 Research Objective: Tackle the ambiguity of WHAT IS PERFORMANCE?
Nice and clear description in terms of OWL XML fields
17
SLIDE 18 Section 1
Define quality attributes (What is the availability of the system?)
18
QA 1 QA 2 QA 3 QA 4 QA 5
SLIDE 19
Section 1 Research Methodology: Develop a Cloud-based app Define scenarios Measure performance with OWL Using logical axioms
19
SLIDE 20
Section 1 Axioms: Inference Response time < 2ms à P = 90%
20
SLIDE 21
Section 1 Building Ontology: 100+ axioms Cascaded classification 10+ inferred rules
21
SLIDE 22
Section 1
22
SLIDE 23
Section 1
23
SLIDE 24
Section 1 FAILED! OWL/XML is not efficiently designed Performance is subjective
24
SLIDE 25 Section 2 Performance and Energy Modeling
High Performance Computing (HPC)
25
SLIDE 26
Section 2 What is HPC? Parallel processing Advanced applications Massive computations Scientific programs
26
SLIDE 27
Section 2 What is HPC? Super fast transactions Distributed algorithms Message Passing Interface (MPI)
27
SLIDE 28
Section 2 In the domain of HPC We deal with CPU-intensive apps
Data might be just an array! Computations might be exponential.
28
SLIDE 29 Section 2 Aggregate computer powers
Clustering at very large-scale “Computing at the speed of innovation!”*
* IBM (www.ibm.com
29
SLIDE 30
Section 2 Message Passing Interface (MPI)
30
SLIDE 31
Section 2 MPI is a Library To write parallel programs It provides collective functions
31
SLIDE 32
Section 2 MPI is available in many languages: C, C++, Java, Python, R
32
SLIDE 33 Section 2
- When we have networking libraries,
why bother using MPI ?!
- Optimized for performance
- Fastest network transport found
- Within a computer: MPI will use shared memory (not network!)
- Fast cluster interconnects: MPI will use Infinibands, …
- Enforces guarantees (reliable messages, In-Order)
- Think about the problem, forget about the network
33
SLIDE 34
Section 2
Research Objective: Given a set of input variables:
Network bandwidth, CPU power, Throughput, Disk speed, Memory, …
What is the optimized configuration for the best performance/energy achieved?
34
SLIDE 35
Section 2 Def (Performance && Energy): Return Multi-objective Optimization
35
SLIDE 36 Section 2 Why?
- Efficient resource provisioning (what to choose?)
- Predict the changes in your system (what will happen if.. ?)
- Performance becomes part of the design
- Itemized scenarios (what is important?)
- Avoid surprises with performance when deploying
Enterprise reputation (risk management!)
36
SLIDE 37
Section 2 How?
37
SLIDE 38 Section 2
38
Dataset ScienDfic ApplicaDon Master
Node 1 Node 2 Node 3 Node 100
MPI Benchmark Model
SLIDE 39
Section 2 Architecture Not yet accessible! 100 nodes ORACLE Solaris Cluster ORACLE VirtualBox
39
SLIDE 40
Section 2 Test Architecture Digital Ocean Cloud Platform 10 nodes ORACLE Solaris Cluster MPI4PY
40
SLIDE 41
Section 2 Scientific Application Schaffer problem
41
SLIDE 42
Section 2 Run time Your laptop: life time HPC XT-Cluster: less than a minute
42
SLIDE 43 Section 2
from platypus.algorithms import NSGAII from platypus.core import Problem, evaluator from platypus.types import Real class Schaffer(Problem): def __init__(self): super(Schaffer, self).__init__(1, 2) self.types[:] = Real(-10, 10) @evaluator def evaluate(self, solution): x = solution.variables[:] solution.objectives[:] = [x[0]**2, (x[0]-2)**2] algorithm = NSGAII(Schaffer()) algorithm.run(10000)
43
SLIDE 44
Section 2 Methodology Memetic Algorithm Recurrent Neural Network Prediction Model
44
SLIDE 45
Section 2 Memetic Algorithm Genomes: Set of observed values of Performance & Energy
45
SLIDE 46
Section 2 Memetic Algorithm Genomes: NumPy array
46
SLIDE 47
Section 2 Memetic Algorithm Fitness Function: Schaffer optimization
47
SLIDE 48
Section 2 Memetic Algorithm Cross-Over: Alternating-position Operator
48
SLIDE 49
Section 2 Recurrent Neural Network Bi-directional data Both past and future
49
SLIDE 50
Section 2 Prediction Model Compare with benchmarks Linear trend estimation Least Square Error
50
SLIDE 51
Section 2 Measuring Energy Consumption Power-API Physical devices
51
SLIDE 52
Section 2 Correlation? Between performance & energy Coefficient: +0.217
52
SLIDE 54 Section 2
Per slice of one separate run Noise? I’m working on it ..
54
SLIDE 55 Conclusion Multi-objective optimization
In High Performance Computing
55
SLIDE 56
Conclusion Predict performance and energy SAVE MONEY! Show that this approach is scalable
56
SLIDE 57
Conclusion Cloud resource selection
57