Relational Non-Relational Rational Agile Predictable Flexible - - PowerPoint PPT Presentation

relational non relational
SMART_READER_LITE
LIVE PREVIEW

Relational Non-Relational Rational Agile Predictable Flexible - - PowerPoint PPT Presentation

B IG D ATA A NALYTICS R EFERENCE A RCHITECTURES AND C ASE S TUDIES Relational vs. Non-Relational Architecture Relational Non-Relational Rational Agile Predictable Flexible Traditional Modern 2 Agenda Tips for Big Data


slide-1
SLIDE 1

BIG DATA ANALYTICS

REFERENCE ARCHITECTURES AND CASE STUDIES

slide-2
SLIDE 2

Relational vs. Non-Relational Architecture

2

Relational Non-Relational

  • Rational
  • Predictable
  • Traditional
  • Agile
  • Flexible
  • Modern
slide-3
SLIDE 3

Agenda

3

Big Data Challenges

Big Data Reference Architectures

Case Studies Tips for Designing Big Data Solutions

slide-4
SLIDE 4

Big Data Challenges

4

UNSTRUCTURED STRUCTURED HIGH MEDIUM LOW

Archives Docs Business Apps Media Social Networks Public Web Data Storages Machine Log Data Sensor Data

Data Storages

RDBMS, NoSQL, Hadoop, file systems etc.

Machine Log Data

Application logs, event logs, server data, CDRs, clickstream data etc.

Sensor Data

Smart electric meters, medical devices, car sensors, road cameras etc.

Archives

Scanned documents, statements, medical records, e-mails etc..

Docs

XLS, PDF, CSV, HTML, JSON etc.

Business Apps

CRM, ERP systems, HR, project management etc.

Social Networks

Twitter, Facebook, Google+, LinkedIn etc.

Public Web

Wikipedia, news, weather, public finance etc

Media

Images, video, audio etc.

Velocity Variety Volume Complexity

slide-5
SLIDE 5

Big Data Analytics

5

Traditional Analytics (BI) Big Data Analytics

Focus on Data Sets Supports

  • Descriptive analytics
  • Diagnosis analytics
  • Limited data sets
  • Cleansed data
  • Simple models
  • Large scale data sets
  • More types of data
  • Raw data
  • Complex data models
  • Predictive analytics
  • Data Science

Causation: what happened, and why? Correlation: new insight More accurate answers

vs

slide-6
SLIDE 6

Big Data Analytics Use Cases

6

Data Discovery Business Reporting Real Time Intelligence Data Quality Self Service

Business Users Intelligent Agents Consumers

Low Latency Reliability Volume Performance

Data Scientists/ Analysts

slide-7
SLIDE 7

Big Data Analytics Reference Architectures

7

Architecture Drivers: Reference Architectures:

▪ Volume ▪ Sources ▪ Throughput ▪ Latency ▪ Extensibility ▪ Data Quality ▪ Reliability ▪ Security ▪ Self-Service ▪ Cost ▪ Extended Relational ▪ Non-Relational ▪ Hybrid

slide-8
SLIDE 8

Relational Reference Architecture

8 Web Services Mobile Devices Native Desktop Web Browsers Advanced Analytics OLAP Cubes Query & Reporting Operational Data Stores Data Marts Data Warehouses Replication API/ODBC Messaging ETL Unstructured Semi- Structured

Data Sources Integration Data Storages Analytics Presentation

Structured

slide-9
SLIDE 9

9

Extended Relational Reference Architecture

Web Services Mobile Devices Native Desktop Web Browsers Advanced Analytics OLAP Cubes Query & Reporting Operational Data Stores Data Marts Data Warehouses Replication API/ODBC Messaging ETL Unstructured Semi- Structured

Data Sources Integration Data Storages Analytics Presentation

Structured

Key components affected with Big Data challenges

slide-10
SLIDE 10

Non-Relational Reference Architecture

10 Web Services Mobile Devices Native Desktop Web Browsers Advanced Analytics Map Reduce Query & Reporting Search Engines Distributed File Systems NoSQL Databases API Messaging ETL Unstructured Semi- Structured

Data Sources Integration Data Storages Analytics Presentation

Structured

Key components introduced with non-relational movement

slide-11
SLIDE 11

Extended Relational vs. Non-Relational Architecture

11

Architecture Drivers Extended Relational Non‐Relational Large data volume Self‐service (ad‐hoc reporting) Unstructured data processing High data model extensibility High data quality and consistency Extensive security Reliability and fault‐tolerance Low latency (near‐real time) Low cost Skills availability

slide-12
SLIDE 12

Extended Relational vs. Non-Relational Architecture

12

Architecture Drivers Extended Relational Non‐Relational Large data volume Self‐service (ad‐hoc reporting) Unstructured data processing High data model extensibility High data quality and consistency Extensive security Reliability and fault‐tolerance Low latency (near‐real time) Low cost Skills availability

slide-13
SLIDE 13

Extended Relational vs. Non-Relational Architecture

13

Architecture Drivers Extended Relational Non‐Relational Large data volume Self‐service (ad‐hoc reporting) Unstructured data processing High data model extensibility High data quality and consistency Extensive security Reliability and fault‐tolerance Low latency (near‐real time) Low cost Skills availability

slide-14
SLIDE 14

Relational vs. Non-Relational Architecture

14

Relational Non-Relational

  • Rational
  • Predictable
  • Traditional
  • Agile
  • Flexible
  • Modern
slide-15
SLIDE 15

Data Discovery Business Reporting Real Time Intelligence

Big Data Analytics Use Cases

15 Business Users Intelligent Agents Consumers

Performance Volume

Data Scientists

slide-16
SLIDE 16

Data Discovery: Non-Relational Architecture

16 Web Services Mobile Devices Native Desktop Web Browsers Advanced Analytics Map Reduce Query & Reporting Search Engines Distributed File Systems NoSQL Databases API Messaging ETL Unstructured Semi- Structured

Data Sources Integration Data Storages Analytics Presentation

Structured

slide-17
SLIDE 17

Data Discovery Business Reporting Real Time Intelligence

Big Data Analytics Use Cases

17 Intelligent Agents Consumers Data Scientists

Data Quality Self Service

Business Users

slide-18
SLIDE 18

Business Reporting: Hybrid Architecture

18 Web Services Mobile Devices Native Desktop Web Browsers Map Reduce SQL Query & Reporting Distributed File Systems API Messaging ETL Unstructured Semi- Structured

Data Sources Integration Data Storages Analytics Presentation

Structured Relational DWH/DM Advanced Analytics Search Engines

Extended Relational components Non-relational components

slide-19
SLIDE 19

Data Discovery Business Reporting Real Time Intelligence

Big Data Analytics Use Cases

19 Data Scientists Business Users Intelligent Agents Consumers

Low Latency Reliability

slide-20
SLIDE 20

Lambda Architecture

20

Source:

slide-21
SLIDE 21

21

Business Goals: Business Goals:

 Provide visual environment for building custom mobile application  Charge customers based on the platform they are using, number of consumers’ applications etc.

Business Ar Business Area: ea:

Cloud based platform for building, deploying, hosting and managing of mobile applications

Case Study #1: Usage & Billing Analysis

slide-22
SLIDE 22

Architectural Decisions

22

▪ Volume (> 10 TB) ▪ Sources (Semi-structured - JSON) ▪ Throughput (> 10K/sec) ▪ Latency (2 min) ▪ Extensibility (Custom m tensibility (Custom metrics) trics) ▪ Data Quality (Consisten Data Quality (Consistency) cy) ▪ Reliability (24/7) ▪ Security (Multitenancy) ▪ Self Self-Ser

  • Service (Ad-Ho

vice (Ad-Hoc r repor ports) s) ▪ Cost (The less the better ) ▪ Constraints (Public Cloud)

Ar Architectur chitecture Driver Drivers: s: Tr Trade-off:

// Extended Relational Non-Relational Extensibility

‐ +

Data Quality

+ ‐

Self-Service

+ ‐  Extended Relational Architecture  Extensibility via Pre‐allocated Fields pattern

slide-23
SLIDE 23

Solution Architecture

23

Technologies:

  • Amazon Redshift
  • Amazon SQS
  • Amazon S3
  • Elastic Beanstalk
  • Jaspersoft BI Professional
  • Python
slide-24
SLIDE 24

24

Business Goals: Business Goals:

 Build in-house Analytics Platform for ROI measurement and performance analysis of every product and feature delivered by the e-commerce platform;  Provide the ability to understand how end-users are interacting with service content, products, and features on sites;  Do clickstream analysis;  Perform A/B T esting

Business Ar Business Area: ea:

  • Retail. A platform for e-commerce and

collecting feedbacks from customers

Case Study #2: Clickstream for retail website

slide-25
SLIDE 25

// Extended Relational Non- Relational Volume/Scalability

+/‐ +

Throughput

+ +

Self-Service

+ +/‐

Extensibility

‐ +

Architectural Decisions

25

▪ Volume (45 TB) lume (45 TB) ▪ Sources (Semi-structured - JSON) ▪ Thr Throughput (> 20K/sec) ughput (> 20K/sec) ▪ Latency (1 hour) ▪ Extensibility (Custom tags) tensibility (Custom tags) ▪ Data Quality (Not critical) ▪ Reliability (24/7) ▪ Security (Multitenancy) ▪ Self Self-Ser

  • Service (Canned r

vice (Canned repor ports, Data s, Data scien science) e) ▪ Cost (The less the better ) ▪ Constraints (Public Cloud)

Ar Architectur chitecture Driver Drivers: s: Tr Trade-off:  Non‐Relational Architecture  Reporting via Materialized View pattern

slide-26
SLIDE 26

Solution Architecture

26

Technologies:

  • Amazon S3
  • Flume
  • Hadoop/HDFS, MapReduce
  • HBase
  • Oozie
  • Hive

Node 1 Node 2 Node N

slide-27
SLIDE 27

Tips for Designing Big Data Solutions

27

 Understand data users and sources  Discover architecture drivers  Select proper reference architecture  Do trade-off analysis, address cons  Map reference architecture to technology stack  Prototype, re-evaluate architecture  Estimate implementation efforts  Set up devops practices from the very beginning  Advance in solution development through “small wins”  Be ready for changes, big data technologies are evolving rapidly

slide-28
SLIDE 28

28

Leading global Product and Application Development partner founded in 1993

3,300+ employees across North America, Ukraine and Western Europe

Thousands of successful outsourcing projects!

SaaS/Cloud Solutions . Mobility Solutions . UX/UI BI/Analytics/Big Data . Software Architecture . Security

Clients include:

slide-29
SLIDE 29

Thank You!

29

SoftServe US Office

One Congress Plaza, 111 Congress Avenue, Suite 2700 Austin, TX 78701 Tel: 512.516.8880

Contacts

Serhiy Haziyev: shaziyev@softserveinc.com Olha Hrytsay: ohrytsay@softserveinc.com