CS 591: Da
Data S Systems Arch chitect ctures
- Prof. Manos Athanassoulis
mathan@bu.edu http://manos.athanassoulis.net/classes/CS591
A B C D A B C D A B C D A B C D A B C D A B C D A B C D A B - - PowerPoint PPT Presentation
CS 591: Da Data S Systems Arch chitect ctures Prof. Manos Athanassoulis mathan@bu.edu http://manos.athanassoulis.net/classes/CS591 CS591 progress bar Storage Layouts Rows vs Cols vs Hybrid A B C D A B C D A B C D A B C D A B C D A B
mathan@bu.edu http://manos.athanassoulis.net/classes/CS591
CS591 progress bar
Storage Layouts Rows vs Cols vs Hybrid
A B C D A B C D A B C D A B C D A B C D A B C D A B C D A B C D A B C D A B C D A B C D
CS591 progress bar
Storage Layouts Rows vs Cols vs Hybrid New Hardware Flash Storage Multi-core
CS591 progress bar
Storage Layouts Rows vs Cols vs Hybrid New Hardware Flash Storage Multi-core Indexing When to use? UpBit
index scan
CS591 progress bar
Storage Layouts Rows vs Cols vs Hybrid New Hardware Flash Storage Multi-core Indexing When to use? UpBit
1 1 1 1 1 1 1 1 A=10 A=20 A=30
UB UB UB
CS591 progress bar
Storage Layouts Rows vs Cols vs Hybrid New Hardware Flash Storage Multi-core Indexing When to use? UpBit NoSQL Engines LSM-Trees Hash-based
memory storage fence pointers
X
Bloom filters buf buffer
CS591 progress bar
Storage Layouts Rows vs Cols vs Hybrid New Hardware Flash Storage Multi-core Indexing When to use? UpBit NoSQL Engines LSM-Trees Hash-based
Stable
LA = 0
Read-Only
LA = ∞
Mutable
In-Memory Disk Increasing Logical Address Read-Copy-Update In-Place-Update
Figure 5: Logical Address Space in
CS591 progress bar
Storage Layouts Rows vs Cols vs Hybrid New Hardware Flash Storage Multi-core Indexing When to use? UpBit NoSQL Engines LSM-Trees Hash-based Indexing Data Skipping Adaptive Indexing
2012 A DB 2011 A AI 2011 B OS 2013 C DB 2011 A AI 2011 B OS 2012 A DB 2013 C DB grade A A B C year 2011 2011 2012 2013
t1 t2 t3 t4 t1 t2 t3 t4 t2 t3 t1 t4 t2 t3 t1 t4
year grade course course AI OS DB DB year grade course
CS591 progress bar
Storage Layouts Rows vs Cols vs Hybrid New Hardware Flash Storage Multi-core Indexing When to use? UpBit NoSQL Engines LSM-Trees Hash-based Indexing Data Skipping Adaptive Indexing
?
Index Column
< 13 >= 13 < 42 >= 42
Index Column Q0=[13,42) Index Column sorted Q2 Qn ... Q1=[6,27)
< 6 >= 6 < 13 >= 13 < 27 >=27 < 42 >= 42
Index Column
CS591 progress bar
Storage Layouts Rows vs Cols vs Hybrid New Hardware Flash Storage Multi-core Indexing When to use? UpBit NoSQL Engines LSM-Trees Hash-based Indexing Data Skipping Adaptive Indexing Scientific Data Management In-situ Query Processing
Raw Data File
Positional Map
BF BF+BTree BF BTree BF BF BTree
Adaptive Partitioning Cache
CS591 progress bar
Storage Layouts Rows vs Cols vs Hybrid New Hardware Flash Storage Multi-core Indexing When to use? UpBit NoSQL Engines LSM-Trees Hash-based Indexing Data Skipping Adaptive Indexing Scientific Data Management In-situ Query Processing Today: Array Data
Up to now: uni uni-dim dimensio nsional nal data (integers, real, string) Array Data: mu multi-dim dimensio nsional nal data No unique order (cannot sort!) How to store? Co Concepts: multi-dimensional arrays, storage manager, tiles, thread-safe, dense vs. sparse arrays, global cell order, fragments, dense vs. sparse fragments, consolidation why is this a challenge?
CS591 progress bar
Storage Layouts Rows vs Cols vs Hybrid New Hardware Flash Storage Multi-core Indexing When to use? UpBit NoSQL Engines LSM-Trees Hash-based Indexing Data Skipping Adaptive Indexing Scientific Data Management In-situ Query Processing Today: Array Data
CS591 progress bar
Storage Layouts Rows vs Cols vs Hybrid New Hardware Flash Storage Multi-core Indexing When to use? UpBit NoSQL Engines LSM-Trees Hash-based Indexing Data Skipping Adaptive Indexing Scientific Data Management In-situ Query Processing Today: Array Data Distributed DB Database Systems at Global Scale MapReduce Computing at Scale Systems for ML ML building blocks ML for Systems Automatic Data System Design Learned Indexes Learn Data Distributions for Indexing Data Calculator Synthesize Indexes
You can skip up to 3 reviews 18 classes: 5 long + 10 short + 3 skipped ne new r w rule ule: you can do extra long reviews, 1 long counts as 3 short Normally for full marks: 5 long + 10 short
Do not leave your project work for last minute! Until Tu Tuesday April 16th
th every group in OH to discuss progress
April 30 and May 2 project presentations: problem + approach + results + open questions Project presentations will also be peer-evaluated