Design Patterns Leveraging Spark in PDI Chris Skirde Pentaho - PowerPoint PPT Presentation
Design Patterns Leveraging Spark in PDI Chris Skirde Pentaho Director of Sales Engineering, Hitachi Vantara Rakesh Saha Pentaho Senior Product Manager, Hitachi Vantara Quiz Time! What is Spark? A. A good way to start a fire. B. Necessary
Design Patterns Leveraging Spark in PDI Chris Skirde Pentaho Director of Sales Engineering, Hitachi Vantara Rakesh Saha Pentaho Senior Product Manager, Hitachi Vantara
Quiz Time! • What is Spark? A. A good way to start a fire. B. Necessary for a well running internal combustion engine. C. Fast and general purpose engine for large-scale data processing. D. All of the above. • True or False, Pentaho supports Spark? • Who is using Spark today (with or without Pentaho)?
Agenda • Introduction to Spark • Common design patterns • How to leverage Spark with Pentaho
Introduction to Spark • Why are we interested? • What is it really? • What’s been done?
Spark Application Architecture PDI/Server Daemon
What Do Those Applications Have in Common?
Common Design Patterns • Filter/Organize • Join • Sum • Transform/Enrich • Query • Machine Learning/Data Science
Filter/ Organize
Join
Sum (and Other Aggregations)
Transform/Enrich • Any step you like!
Query – Easy! • Cloudera use Hive-on-Spark with Hive2 • Hortonworks use SparkSQL via Simba
Machine Learning/Data Science
Recap What we covered today: • Reviewed what Spark is and why organizations are adopting it • Discussed several common data integration design patterns • Linked those design patterns to Pentaho features for you to try
Questions?
Next Steps Want to learn more? • “Meet the Experts” Matt Casters and Mark Hall! • Adaptive Execution Layer http://www.pentaho.com/blog/introducing-adaptive- execution-layer-spark-architecture • SQL on Spark http://www.pentaho.com/blog/operationalize-spark-big-data- newest-enhancements
Recommend
More recommend
Explore More Topics
Stay informed with curated content and fresh updates.