Roadmap: Operating Pentaho at Scale Jens Bleuel Senior Product - - PowerPoint PPT Presentation

roadmap operating pentaho at scale
SMART_READER_LITE
LIVE PREVIEW

Roadmap: Operating Pentaho at Scale Jens Bleuel Senior Product - - PowerPoint PPT Presentation

Roadmap: Operating Pentaho at Scale Jens Bleuel Senior Product Manager, Pentaho Agenda Worker Nodes Hear about new upcoming capabilities for scaling out the Pentaho platform in large enterprise operations. This will cover 8.0 and roadmap


slide-1
SLIDE 1

Roadmap: Operating Pentaho at Scale

Jens Bleuel Senior Product Manager, Pentaho

slide-2
SLIDE 2

Agenda – Worker Nodes

Hear about new upcoming capabilities for scaling out the Pentaho platform in large enterprise operations. This will cover 8.0 and roadmap topics.

  • Worker Nodes: Overview and Business Benefits
  • How is this different from AEL / Hadoop MapReduce
  • Typical Customer Scenarios
  • Architecture & Capabilities including Monitoring & Logging
  • Improvements in Related Areas
  • Demonstration
  • Availability & Roadmap
slide-3
SLIDE 3

Worker Nodes – Overview

  • Worker Nodes can scale work items across multiple nodes (containers) like:

– PDI jobs and transformations (in 8.0) – Report executions (not in 8.0) – […]

  • It operates easily and securely across an elastic architecture, which adds

additional machine resources as they are required for processing

  • Worker Nodes can operate on premise or in the cloud
  • Uses Popular technologies under the hood such as Docker (Container Platform),

Chronos (Scheduler) and Mesos/Marathon (Container Orchestration)

Worker Node (a) Worker Node (b) Worker Node (c…) Distribute and Scale

slide-4
SLIDE 4

Worker Nodes – Business Benefits

Large enterprises need the ability to seamlessly and efficiently spin up resources to handle 100s+ work items at different times, with different dependencies and processing requirements. Worker Nodes addresses these needs and delivers:

  • Faster time to value and reduced TCO because it enables customers to deploy

their own scale-out processes without required services

  • Manage changing workloads more efficiently by spinning resources up and

down as needed

  • Increased business agility thanks to containerization – which enables

portability of processes across on-prem and cloud environments without the need to re-engineer them.

– Even in pure on-prem, WN provides elasticity and resource optimization.

slide-5
SLIDE 5

How Is This Different from AEL / Hadoop MapReduce?

These two architectures can also be combined: Within a Worker Node, a PDI transformation can also scale out with AEL or Map Reduce

SCALE OUT ON DATA SCALE OUT ON PROCESSES

(WORK ITEMS)

AEL / Hadoop Map Reduce (simplified):

  • Data is distributed across nodes
  • The processing takes place at the node level
  • Helps in scale out data volume

Worker Nodes (simplified):

  • Work Items like PDI Jobs, PDI Transformations get

distributed across nodes – this is about the processing and orchestration (in contrast to distributing data)

  • Helps in scale out Pentaho processes
slide-6
SLIDE 6

Typical Customer Scenarios

Customer Type Typical Number of Work Items Scale-Out Need Small Up to 10 No Medium 10 through 100 Sometimes Enterprise with one department +/- 100 Yes Enterprise with multiple departments Hundreds or thousands Yes

slide-7
SLIDE 7

Typical Customer Examples – SLA’s and Time Windows

  • Need to meet customer SLA’s

– Data from hundreds of sources need to get collected and aggregated – This is done by hundreds of PDI jobs and transformations – All these jobs and transformations need to be finished within a defined time window (for example between 5am and 7am) so that the data is available and accurate for the target audience

  • Worker Nodes provides the technology to run processes in parallel and scale
  • ut when needed, for example at peak times (end of month)
slide-8
SLIDE 8

Typical Customer Examples – Shared Services

Example of one project:

  • 800 daily batches from different departments in an enterprise
  • One server with 120GB memory and many CPUs
  • This machine hosts lots of VM in parallel

Issue: When there is too much workload, one machine is not enough

  • Worker Nodes solves this in scaling out on a cluster
slide-9
SLIDE 9

Typical Customer Examples – Scalable on Demand

  • Need to support growing data volumes and customer requirements
  • Worker Nodes provides a flexible and scalable architecture on-promise or in the

cloud for growing demand

  • This is seamless and does not need to change the underlying architecture

Worker Node (1) Worker Node (2) Worker Node (3) Distribute and Scale Worker Node (1) Worker Node (2) Worker Node (3) Distribute and Scale Worker Node (4) Worker Node (5)

BASE TIMES PEAK TIMES

slide-10
SLIDE 10

WORKER NODES

Orchestration Framework Container Framework

Worker Nodes – New in 8.0

  • Containerized scale-out
  • Pentaho PDI “work items”

Pentaho Server

WN 1

e.g. KJB

WN 2

e.g. KTR

WN …n “Executor” Orchestration (Scheduler, monitoring, security, etc.) Controller Master (Standby) Master (Standby) Master (Working)

Pentaho Repository Pentaho Clients

slide-11
SLIDE 11

Worker Nodes Capabilities

  • Deploy consistently in physical, virtual, and cloud environments

Adapts to customer needs (bare-metal vs. virtualization vs. Cloud) and no need to modify the product when the strategy changes

  • Scale and load balance services

This helps to deal with peaks and limited time-windows, allocate the resources that are needed.

  • Hybrid deployments can be used to distribute load

Even when the on-premise resources are not sufficient, scaling out into the Cloud is possible to provide more resources.

slide-12
SLIDE 12

Monitoring and Logging

slide-13
SLIDE 13

Monitoring – Overview

slide-14
SLIDE 14

Monitoring – Worker Node Example

slide-15
SLIDE 15

Improvements in Related Areas Open and Save Dialogs

slide-16
SLIDE 16
  • Whenever you save a new transformation/job into the repository,

the default folder is set to the user’s home folder.

Pain Point: Save a New Job/Transformation

In previous versions: The user will need to change the folder for every time they save a new transformation or job.

slide-17
SLIDE 17

New Save Dialog in 8.0 – Overview

  • Remembers the last
  • pened folder!
  • Just enter the filename!

(and/or change the folder)

  • Similar to the Open Dialog

with additional functionality (see next slide).

slide-18
SLIDE 18

New Open Dialog in 8.0 – Overview

Recents

Open shows the last opened

  • folder. This is a

big time saver!

Search

slide-19
SLIDE 19

Improvements in Related Areas Run Configurations

slide-20
SLIDE 20

Pain Point: Remote Pentaho Server Execution before 8.0

To execute on the Pentaho Server before 8.0, you need to define a Slave server and give the credentials. Then execute on the selected Server.

slide-21
SLIDE 21

Execute on the Pentaho Server

  • By selecting the Pentaho server option, you

do not need to define a Slave server anymore when you want to execute remotely.

  • Behind the scenes, this option executes the

transformation or job via the Scheduler. This is the same as you would do a “Schedule Now.”

This new functionality improves the ease of use, also for Worker Nodes

slide-22
SLIDE 22

Run Configurations within Job Entries

  • Run Configuration can be used in the Run dialog and also in the job entries

that could execute jobs or transformations remotely and on Worker Nodes

7.1

Example

8.0

slide-23
SLIDE 23

Demonstration

slide-24
SLIDE 24

Availability and Roadmap

slide-25
SLIDE 25

Availability

  • Worker Nodes is EE only
  • Initially, 8.0 Worker Nodes will be Limited Availability

– Fully supported, production deployment – Distribution to a limited number of customers

  • Requires additional download and implementation services
slide-26
SLIDE 26
  • Pentaho Server & Repository as a Service including High Availability
  • Improved Monitoring and Logging
  • Extend to other Pentaho work items such as Reports
  • Integrated with other Hitachi Vantara Services and Products

Roadmap

Container Framework

Pentaho Server

WN 1

e.g. KJB

WN 2

e.g. KTR

WN …n “Executor”

Pentaho Repository

slide-27
SLIDE 27

Summary

What we covered today:

  • The upcoming capabilities for scaling out the Pentaho platform and

when to use them

  • How to use the new way of scaling out work items (Pentaho

processes such as PDI jobs and transformations) across multiple nodes

slide-28
SLIDE 28

Next Steps

Want to learn more?

  • Meet-the-Expert:

– Pedro Teixera

  • Other recommended breakout sessions:

– Matt Howard: Pentaho 8.0 and Roadmap – Rakesh Saha and Jens Bleuel: Roadmap: Processing Big Data – Matt Casters: PDI Best Architecture Practices – Steve Szabo: PDI Sizing Overview and Case Study – Jonathan Jarvis: Understanding Parallelism with PDI and Adaptive Execution with Spark – Mark Burnett: Understanding the Big Data Technology Ecosystem

slide-29
SLIDE 29