GoDataDriven
PROUDLY PART OF THE XEBIA GROUP@fzk frisovanvollenhoven@godatadriven.com
Building a Big Data DWH
Friso van Vollenhoven CTO
Data Warehousing on Hadoop
Building a Big Data DWH Data Warehousing on Hadoop Friso van - - PowerPoint PPT Presentation
Building a Big Data DWH Data Warehousing on Hadoop Friso van Vollenhoven @fzk CTO frisovanvollenhoven@godatadriven.com Go DataDriven PROUDLY PART OF THE XEBIA GROUP In computing, a data warehouse or enterprise data warehouse (DW, DWH, or
GoDataDriven
PROUDLY PART OF THE XEBIA GROUP@fzk frisovanvollenhoven@godatadriven.com
Building a Big Data DWH
Friso van Vollenhoven CTO
Data Warehousing on Hadoop
“In computing, a data warehouse or enterprise data warehouse (DW, DWH, or EDW) is a database used for reporting and data analysis.”
ETL
How to:
to hour?
aggregation of facts?
Schema’s are designed with questions in mind. Changing it requires to redo the ETL.
Schema’s are designed with questions in mind. Changing it requires to redo the ETL. Push things to the facts level. Keep all source data available all times.
And now?
distributed storage distributed processing metadata + query engine
Deployment
git push jenkins master
'februari-22 2013'
A: Yes, sometimes as
week at 3K files / day.
CREATE TABLE browsers ( browser_id STRING, browser STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '-2';
data from it)
Independent jobs
source (external) staging (HDFS) hive-staging (HDFS) Hive
HDFS upload + move in place MapReduce + HDFS move Hive map external table + SELECT INTO
Out of order jobs
to Hive
delivery is going to be three hours late
later in the day
Fixable data store
drop the partition and re-insert
transactional, repair afterwards
purpose
Metrics
Metrics service
time
We’re hiring / Questions? / Thank you!
@fzk frisovanvollenhoven@godatadriven.com Friso van Vollenhoven CTO