Relational Non-Relational Rational Agile Predictable Flexible - - PowerPoint PPT Presentation
Relational Non-Relational Rational Agile Predictable Flexible - - PowerPoint PPT Presentation
B IG D ATA A NALYTICS R EFERENCE A RCHITECTURES AND C ASE S TUDIES Relational vs. Non-Relational Architecture Relational Non-Relational Rational Agile Predictable Flexible Traditional Modern 2 Agenda Tips for Big Data
Relational vs. Non-Relational Architecture
2
Relational Non-Relational
- Rational
- Predictable
- Traditional
- Agile
- Flexible
- Modern
Agenda
3
Big Data Challenges
Big Data Reference Architectures
Case Studies Tips for Designing Big Data Solutions
Big Data Challenges
4
UNSTRUCTURED STRUCTURED HIGH MEDIUM LOW
Archives Docs Business Apps Media Social Networks Public Web Data Storages Machine Log Data Sensor Data
Data Storages
RDBMS, NoSQL, Hadoop, file systems etc.
Machine Log Data
Application logs, event logs, server data, CDRs, clickstream data etc.
Sensor Data
Smart electric meters, medical devices, car sensors, road cameras etc.
Archives
Scanned documents, statements, medical records, e-mails etc..
Docs
XLS, PDF, CSV, HTML, JSON etc.
Business Apps
CRM, ERP systems, HR, project management etc.
Social Networks
Twitter, Facebook, Google+, LinkedIn etc.
Public Web
Wikipedia, news, weather, public finance etc
Media
Images, video, audio etc.
Velocity Variety Volume Complexity
Big Data Analytics
5
Traditional Analytics (BI) Big Data Analytics
Focus on Data Sets Supports
- Descriptive analytics
- Diagnosis analytics
- Limited data sets
- Cleansed data
- Simple models
- Large scale data sets
- More types of data
- Raw data
- Complex data models
- Predictive analytics
- Data Science
Causation: what happened, and why? Correlation: new insight More accurate answers
vs
Big Data Analytics Use Cases
6
Data Discovery Business Reporting Real Time Intelligence Data Quality Self Service
Business Users Intelligent Agents Consumers
Low Latency Reliability Volume Performance
Data Scientists/ Analysts
Big Data Analytics Reference Architectures
7
Architecture Drivers: Reference Architectures:
▪ Volume ▪ Sources ▪ Throughput ▪ Latency ▪ Extensibility ▪ Data Quality ▪ Reliability ▪ Security ▪ Self-Service ▪ Cost ▪ Extended Relational ▪ Non-Relational ▪ Hybrid
Relational Reference Architecture
8 Web Services Mobile Devices Native Desktop Web Browsers Advanced Analytics OLAP Cubes Query & Reporting Operational Data Stores Data Marts Data Warehouses Replication API/ODBC Messaging ETL Unstructured Semi- Structured
Data Sources Integration Data Storages Analytics Presentation
Structured
9
Extended Relational Reference Architecture
Web Services Mobile Devices Native Desktop Web Browsers Advanced Analytics OLAP Cubes Query & Reporting Operational Data Stores Data Marts Data Warehouses Replication API/ODBC Messaging ETL Unstructured Semi- Structured
Data Sources Integration Data Storages Analytics Presentation
Structured
Key components affected with Big Data challenges
Non-Relational Reference Architecture
10 Web Services Mobile Devices Native Desktop Web Browsers Advanced Analytics Map Reduce Query & Reporting Search Engines Distributed File Systems NoSQL Databases API Messaging ETL Unstructured Semi- Structured
Data Sources Integration Data Storages Analytics Presentation
Structured
Key components introduced with non-relational movement
Extended Relational vs. Non-Relational Architecture
11
Architecture Drivers Extended Relational Non‐Relational Large data volume Self‐service (ad‐hoc reporting) Unstructured data processing High data model extensibility High data quality and consistency Extensive security Reliability and fault‐tolerance Low latency (near‐real time) Low cost Skills availability
Extended Relational vs. Non-Relational Architecture
12
Architecture Drivers Extended Relational Non‐Relational Large data volume Self‐service (ad‐hoc reporting) Unstructured data processing High data model extensibility High data quality and consistency Extensive security Reliability and fault‐tolerance Low latency (near‐real time) Low cost Skills availability
Extended Relational vs. Non-Relational Architecture
13
Architecture Drivers Extended Relational Non‐Relational Large data volume Self‐service (ad‐hoc reporting) Unstructured data processing High data model extensibility High data quality and consistency Extensive security Reliability and fault‐tolerance Low latency (near‐real time) Low cost Skills availability
Relational vs. Non-Relational Architecture
14
Relational Non-Relational
- Rational
- Predictable
- Traditional
- Agile
- Flexible
- Modern
Data Discovery Business Reporting Real Time Intelligence
Big Data Analytics Use Cases
15 Business Users Intelligent Agents Consumers
Performance Volume
Data Scientists
Data Discovery: Non-Relational Architecture
16 Web Services Mobile Devices Native Desktop Web Browsers Advanced Analytics Map Reduce Query & Reporting Search Engines Distributed File Systems NoSQL Databases API Messaging ETL Unstructured Semi- Structured
Data Sources Integration Data Storages Analytics Presentation
Structured
Data Discovery Business Reporting Real Time Intelligence
Big Data Analytics Use Cases
17 Intelligent Agents Consumers Data Scientists
Data Quality Self Service
Business Users
Business Reporting: Hybrid Architecture
18 Web Services Mobile Devices Native Desktop Web Browsers Map Reduce SQL Query & Reporting Distributed File Systems API Messaging ETL Unstructured Semi- Structured
Data Sources Integration Data Storages Analytics Presentation
Structured Relational DWH/DM Advanced Analytics Search Engines
Extended Relational components Non-relational components
Data Discovery Business Reporting Real Time Intelligence
Big Data Analytics Use Cases
19 Data Scientists Business Users Intelligent Agents Consumers
Low Latency Reliability
Lambda Architecture
20
Source:
21
Business Goals: Business Goals:
Provide visual environment for building custom mobile application Charge customers based on the platform they are using, number of consumers’ applications etc.
Business Ar Business Area: ea:
Cloud based platform for building, deploying, hosting and managing of mobile applications
Case Study #1: Usage & Billing Analysis
Architectural Decisions
22
▪ Volume (> 10 TB) ▪ Sources (Semi-structured - JSON) ▪ Throughput (> 10K/sec) ▪ Latency (2 min) ▪ Extensibility (Custom m tensibility (Custom metrics) trics) ▪ Data Quality (Consisten Data Quality (Consistency) cy) ▪ Reliability (24/7) ▪ Security (Multitenancy) ▪ Self Self-Ser
- Service (Ad-Ho
vice (Ad-Hoc r repor ports) s) ▪ Cost (The less the better ) ▪ Constraints (Public Cloud)
Ar Architectur chitecture Driver Drivers: s: Tr Trade-off:
// Extended Relational Non-Relational Extensibility
‐ +
Data Quality
+ ‐
Self-Service
+ ‐ Extended Relational Architecture Extensibility via Pre‐allocated Fields pattern
Solution Architecture
23
Technologies:
- Amazon Redshift
- Amazon SQS
- Amazon S3
- Elastic Beanstalk
- Jaspersoft BI Professional
- Python
24
Business Goals: Business Goals:
Build in-house Analytics Platform for ROI measurement and performance analysis of every product and feature delivered by the e-commerce platform; Provide the ability to understand how end-users are interacting with service content, products, and features on sites; Do clickstream analysis; Perform A/B T esting
Business Ar Business Area: ea:
- Retail. A platform for e-commerce and
collecting feedbacks from customers
Case Study #2: Clickstream for retail website
// Extended Relational Non- Relational Volume/Scalability
+/‐ +
Throughput
+ +
Self-Service
+ +/‐
Extensibility
‐ +
Architectural Decisions
25
▪ Volume (45 TB) lume (45 TB) ▪ Sources (Semi-structured - JSON) ▪ Thr Throughput (> 20K/sec) ughput (> 20K/sec) ▪ Latency (1 hour) ▪ Extensibility (Custom tags) tensibility (Custom tags) ▪ Data Quality (Not critical) ▪ Reliability (24/7) ▪ Security (Multitenancy) ▪ Self Self-Ser
- Service (Canned r
vice (Canned repor ports, Data s, Data scien science) e) ▪ Cost (The less the better ) ▪ Constraints (Public Cloud)
Ar Architectur chitecture Driver Drivers: s: Tr Trade-off: Non‐Relational Architecture Reporting via Materialized View pattern
Solution Architecture
26
Technologies:
- Amazon S3
- Flume
- Hadoop/HDFS, MapReduce
- HBase
- Oozie
- Hive
Node 1 Node 2 Node N
Tips for Designing Big Data Solutions
27
Understand data users and sources Discover architecture drivers Select proper reference architecture Do trade-off analysis, address cons Map reference architecture to technology stack Prototype, re-evaluate architecture Estimate implementation efforts Set up devops practices from the very beginning Advance in solution development through “small wins” Be ready for changes, big data technologies are evolving rapidly
28
▪
Leading global Product and Application Development partner founded in 1993
▪
3,300+ employees across North America, Ukraine and Western Europe
▪
Thousands of successful outsourcing projects!
SaaS/Cloud Solutions . Mobility Solutions . UX/UI BI/Analytics/Big Data . Software Architecture . Security
Clients include:
Thank You!
29
SoftServe US Office
One Congress Plaza, 111 Congress Avenue, Suite 2700 Austin, TX 78701 Tel: 512.516.8880
Contacts
Serhiy Haziyev: shaziyev@softserveinc.com Olha Hrytsay: ohrytsay@softserveinc.com