Log-structured Merge Tree (LSM) 1 Big Data Indexing We covered - - PowerPoint PPT Presentation

log structured merge tree lsm
SMART_READER_LITE
LIVE PREVIEW

Log-structured Merge Tree (LSM) 1 Big Data Indexing We covered - - PowerPoint PPT Presentation

Log-structured Merge Tree (LSM) 1 Big Data Indexing We covered the two-layered global/local indexing scheme Ideal for static data Question: How to update these indexes? HDFS limitation: Random updates are not allowed Nave approach:


slide-1
SLIDE 1

Log-structured Merge Tree (LSM)

1

slide-2
SLIDE 2

Big Data Indexing

We covered the two-layered global/local indexing scheme Ideal for static data Question: How to update these indexes? HDFS limitation: Random updates are not allowed Naïve approach: Rebuild the index after each (batch) insert A better approach: Log-structured Merge Tree

2

slide-3
SLIDE 3

DBMS Indexing

3

New record Index Log

slide-4
SLIDE 4

Index Update

4

New record

Randomly updated disk page(s) Append a disk page

slide-5
SLIDE 5

LSM Tree

Key idea: Use the log as the index Regularly: Merge the logs to consolidate the index (i.e., remove redundant entries)

5

New records Log Log Log Log Log Flush Merge Bigger log

O’Neil, Patrick, Edward Cheng, Dieter Gawlick, and Elizabeth O’Neil. "The log-structured merge-tree (LSM-tree)." Acta Informatica 33, no. 4 (1996): 351-385.

slide-6
SLIDE 6

LSM in Big Data

First major application: BigTable (Google)

6

20 40 60 80 100 120

Citations

Citations

First report from Google mentioning LSM BigTable paper

slide-7
SLIDE 7

LSM in Big Data

Buffer data in memory (memory component) Flush records to disk into an LSM as a disk component (sequential write) Disk components are sorted by key Compact (merge) disk components in the background (sequential read/write)

7