Roadmap: Operating Pentaho at Scale Jens Bleuel Senior Product - - PowerPoint PPT Presentation
Roadmap: Operating Pentaho at Scale Jens Bleuel Senior Product - - PowerPoint PPT Presentation
Roadmap: Operating Pentaho at Scale Jens Bleuel Senior Product Manager, Pentaho Agenda Worker Nodes Hear about new upcoming capabilities for scaling out the Pentaho platform in large enterprise operations. This will cover 8.0 and roadmap
Agenda – Worker Nodes
Hear about new upcoming capabilities for scaling out the Pentaho platform in large enterprise operations. This will cover 8.0 and roadmap topics.
- Worker Nodes: Overview and Business Benefits
- How is this different from AEL / Hadoop MapReduce
- Typical Customer Scenarios
- Architecture & Capabilities including Monitoring & Logging
- Improvements in Related Areas
- Demonstration
- Availability & Roadmap
Worker Nodes – Overview
- Worker Nodes can scale work items across multiple nodes (containers) like:
– PDI jobs and transformations (in 8.0) – Report executions (not in 8.0) – […]
- It operates easily and securely across an elastic architecture, which adds
additional machine resources as they are required for processing
- Worker Nodes can operate on premise or in the cloud
- Uses Popular technologies under the hood such as Docker (Container Platform),
Chronos (Scheduler) and Mesos/Marathon (Container Orchestration)
Worker Node (a) Worker Node (b) Worker Node (c…) Distribute and Scale
Worker Nodes – Business Benefits
Large enterprises need the ability to seamlessly and efficiently spin up resources to handle 100s+ work items at different times, with different dependencies and processing requirements. Worker Nodes addresses these needs and delivers:
- Faster time to value and reduced TCO because it enables customers to deploy
their own scale-out processes without required services
- Manage changing workloads more efficiently by spinning resources up and
down as needed
- Increased business agility thanks to containerization – which enables
portability of processes across on-prem and cloud environments without the need to re-engineer them.
– Even in pure on-prem, WN provides elasticity and resource optimization.
How Is This Different from AEL / Hadoop MapReduce?
These two architectures can also be combined: Within a Worker Node, a PDI transformation can also scale out with AEL or Map Reduce
SCALE OUT ON DATA SCALE OUT ON PROCESSES
(WORK ITEMS)
AEL / Hadoop Map Reduce (simplified):
- Data is distributed across nodes
- The processing takes place at the node level
- Helps in scale out data volume
Worker Nodes (simplified):
- Work Items like PDI Jobs, PDI Transformations get
distributed across nodes – this is about the processing and orchestration (in contrast to distributing data)
- Helps in scale out Pentaho processes
Typical Customer Scenarios
Customer Type Typical Number of Work Items Scale-Out Need Small Up to 10 No Medium 10 through 100 Sometimes Enterprise with one department +/- 100 Yes Enterprise with multiple departments Hundreds or thousands Yes
Typical Customer Examples – SLA’s and Time Windows
- Need to meet customer SLA’s
– Data from hundreds of sources need to get collected and aggregated – This is done by hundreds of PDI jobs and transformations – All these jobs and transformations need to be finished within a defined time window (for example between 5am and 7am) so that the data is available and accurate for the target audience
- Worker Nodes provides the technology to run processes in parallel and scale
- ut when needed, for example at peak times (end of month)
Typical Customer Examples – Shared Services
Example of one project:
- 800 daily batches from different departments in an enterprise
- One server with 120GB memory and many CPUs
- This machine hosts lots of VM in parallel
Issue: When there is too much workload, one machine is not enough
- Worker Nodes solves this in scaling out on a cluster
Typical Customer Examples – Scalable on Demand
- Need to support growing data volumes and customer requirements
- Worker Nodes provides a flexible and scalable architecture on-promise or in the
cloud for growing demand
- This is seamless and does not need to change the underlying architecture
Worker Node (1) Worker Node (2) Worker Node (3) Distribute and Scale Worker Node (1) Worker Node (2) Worker Node (3) Distribute and Scale Worker Node (4) Worker Node (5)
BASE TIMES PEAK TIMES
WORKER NODES
Orchestration Framework Container Framework
Worker Nodes – New in 8.0
- Containerized scale-out
- Pentaho PDI “work items”
Pentaho Server
WN 1
e.g. KJB
WN 2
e.g. KTR
WN …n “Executor” Orchestration (Scheduler, monitoring, security, etc.) Controller Master (Standby) Master (Standby) Master (Working)
Pentaho Repository Pentaho Clients
Worker Nodes Capabilities
- Deploy consistently in physical, virtual, and cloud environments
Adapts to customer needs (bare-metal vs. virtualization vs. Cloud) and no need to modify the product when the strategy changes
- Scale and load balance services
This helps to deal with peaks and limited time-windows, allocate the resources that are needed.
- Hybrid deployments can be used to distribute load
Even when the on-premise resources are not sufficient, scaling out into the Cloud is possible to provide more resources.
Monitoring and Logging
Monitoring – Overview
Monitoring – Worker Node Example
Improvements in Related Areas Open and Save Dialogs
- Whenever you save a new transformation/job into the repository,
the default folder is set to the user’s home folder.
Pain Point: Save a New Job/Transformation
In previous versions: The user will need to change the folder for every time they save a new transformation or job.
New Save Dialog in 8.0 – Overview
- Remembers the last
- pened folder!
- Just enter the filename!
(and/or change the folder)
- Similar to the Open Dialog
with additional functionality (see next slide).
New Open Dialog in 8.0 – Overview
Recents
Open shows the last opened
- folder. This is a
big time saver!
Search
Improvements in Related Areas Run Configurations
Pain Point: Remote Pentaho Server Execution before 8.0
To execute on the Pentaho Server before 8.0, you need to define a Slave server and give the credentials. Then execute on the selected Server.
Execute on the Pentaho Server
- By selecting the Pentaho server option, you
do not need to define a Slave server anymore when you want to execute remotely.
- Behind the scenes, this option executes the
transformation or job via the Scheduler. This is the same as you would do a “Schedule Now.”
This new functionality improves the ease of use, also for Worker Nodes
Run Configurations within Job Entries
- Run Configuration can be used in the Run dialog and also in the job entries
that could execute jobs or transformations remotely and on Worker Nodes
7.1
Example
8.0
Demonstration
Availability and Roadmap
Availability
- Worker Nodes is EE only
- Initially, 8.0 Worker Nodes will be Limited Availability
– Fully supported, production deployment – Distribution to a limited number of customers
- Requires additional download and implementation services
- Pentaho Server & Repository as a Service including High Availability
- Improved Monitoring and Logging
- Extend to other Pentaho work items such as Reports
- Integrated with other Hitachi Vantara Services and Products
Roadmap
Container Framework
Pentaho Server
WN 1
e.g. KJB
WN 2
e.g. KTR
WN …n “Executor”
Pentaho Repository
Summary
What we covered today:
- The upcoming capabilities for scaling out the Pentaho platform and
when to use them
- How to use the new way of scaling out work items (Pentaho
processes such as PDI jobs and transformations) across multiple nodes
Next Steps
Want to learn more?
- Meet-the-Expert:
– Pedro Teixera
- Other recommended breakout sessions: