FAWN - Fast Array of Wimpy Nodes David G. Andersen et al. - - PowerPoint PPT Presentation

fawn fast array of wimpy nodes
SMART_READER_LITE
LIVE PREVIEW

FAWN - Fast Array of Wimpy Nodes David G. Andersen et al. - - PowerPoint PPT Presentation

FAWN - Fast Array of Wimpy Nodes David G. Andersen et al. Presented by: Ravi Kiran Boggavarapu 1001541261 A cluster architecture for low-power and data-intensive computing. Wimpy nodes = A combination of low-power CPUs and small


slide-1
SLIDE 1

FAWN - Fast Array of Wimpy Nodes

David G. Andersen et al.

Presented by: Ravi Kiran Boggavarapu 1001541261

slide-2
SLIDE 2
  • A cluster architecture for low-power and data-intensive computing.
  • Wimpy nodes = A combination of low-power CPUs and small flash.

○ design centers around log-structured datastores that provide high performance on flash.

  • Goal of the architecture??

○ Increase performance while minimizing power consumption -- Save the electricity bills of the Data Centers!

  • How performance is measured?

○ This paper uses queries per Joule as metric. FAWN handles roughly 350 k-v qpJ.

2

slide-3
SLIDE 3

The above photo is taken from: http://www.cs.cmu.edu/~fawnproj/

3

slide-4
SLIDE 4
  • Flash provides a non-volatile memory store with several significant benefits
  • ver traditional magnetic disks:

○ Fast random reads. ○ Efficient power consumption for I/O

  • But it also introduces challenges:

○ Small writes on flash are very expensive. ○ Updating a single page requires first erasing the entire block of pages and writing the entire modified block.

4

Trade-offs of using Flash:

slide-5
SLIDE 5
  • An append-only file system.
  • Writes are appended to a sequential log Data log.
  • Reads require a single random access

5

Log-structured datastore

slide-6
SLIDE 6

Q1) “The workloads these systems support share several characteristics, they are:

  • I/O, not computation, intensive,
  • requiring random access over large datasets,
  • and the size of objects stored is typically small.”

Why workloads of these characteristics represent a challenge to the system design?

6

slide-7
SLIDE 7

Ans - Q1)

  • Increasing gap between CPU performance - I/O bandwidth.
  • "For data-intensive workloads storage, network, and memory bandwidth

bottleneck often cause low CPU utilization."

  • The "Small-write problem." - Multiple random disk writes(very slow).

7

slide-8
SLIDE 8

Q2) “The key design choice in FAWN-KV is the use of a log structured per-node datastore called FAWN-DS that provides high performance reads and writes using flash memory.” “These performance problems motivate log-structured techniques for flash filesystems and data structures” What key benefit does a log structured data organization bring to the KV store design?

8

slide-9
SLIDE 9

Ans - Q2)

  • get() = Random read.
  • While, put() and delete() = Append.
  • log-structured design = append only filesystem.
  • Hence, using a log-structured data store prevents small random writes on

disk.

9

slide-10
SLIDE 10

Q3) “To provide this property, FAWN-DS maintains an in-DRAM hash table (Hash Index) that maps keys to an offset in the append-only Data Log on flash.”

10

slide-11
SLIDE 11

Ans - Q3)

  • Large metadata - long buckets(nodes) and multiple pointers for each

node(Linked List).

  • RAM is volatile - in case of failure, the whole Hash Table is will be lost!

11

slide-12
SLIDE 12

Q4) “It stores only a fragment of the actual key in memory to find a location in the log;” Is there concern on correctness of this design?

12

slide-13
SLIDE 13

Ans - Q4)

  • What if multiple keys have have the fragment part that is similar?

○ Reads the full key from the log and verifies it with the key it read. ○ Therefore, no worries about the correctness.

13

slide-14
SLIDE 14

Q5) Explain "Basic functions:" Store, Lookup, Delete

14

slide-15
SLIDE 15

Ans - Q5)

  • Store:

○ appends entry log updates the corresponding hash table entry.

  • Lookup:

○ gets offset from hash entry and indexes into Data log, and returns the data blob

  • Delete:

○ invalidates the hash entry by clearing the valid flag. ○ appends Delete entry to the log.

  • Why append delete? - Discussed in the answer to the next question.

Figure copied from http://vijay.vasu.org/static/talks/fawn-sosp2009-slides.pdf

15

slide-16
SLIDE 16

Q6) “As an optimization, FAWN-DS periodically checkpoints the index by writing the Hash Index and a pointer to the last log entry to flash.” Why does this checkpointing help with the recovery efficiency? Why is a Delete entry needed in the log for a correct recovery?

16

slide-17
SLIDE 17

Ans - Q6)

  • How check point helps with recovery efficient?

○ After a failure only the contents starting from the checkpoint are necessary to create the Hash Index.

  • Why the Delete entry?

○ Fault tolerance. ○ Avoid random writes to disks.

17

slide-18
SLIDE 18

Thank you

18