FAWN - Fast Array of Wimpy Nodes David G. Andersen et al. - - PowerPoint PPT Presentation

▶

Nov 05, 2023 24 likes •205 views

FAWN - Fast Array of Wimpy Nodes David G. Andersen et al. Presented by: Ravi Kiran Boggavarapu 1001541261 A cluster architecture for low-power and data-intensive computing. Wimpy nodes = A combination of low-power CPUs and small

SLIDE 1

FAWN - Fast Array of Wimpy Nodes

David G. Andersen et al.

Presented by: Ravi Kiran Boggavarapu 1001541261

SLIDE 2

A cluster architecture for low-power and data-intensive computing.
Wimpy nodes = A combination of low-power CPUs and small flash.

○ design centers around log-structured datastores that provide high performance on flash.

Goal of the architecture??

○ Increase performance while minimizing power consumption -- Save the electricity bills of the Data Centers!

How performance is measured?

○ This paper uses queries per Joule as metric. FAWN handles roughly 350 k-v qpJ.

SLIDE 3

The above photo is taken from: http://www.cs.cmu.edu/~fawnproj/

SLIDE 4

Flash provides a non-volatile memory store with several significant benefits
ver traditional magnetic disks:

○ Fast random reads. ○ Efficient power consumption for I/O

But it also introduces challenges:

○ Small writes on flash are very expensive. ○ Updating a single page requires first erasing the entire block of pages and writing the entire modified block.

Trade-offs of using Flash:

SLIDE 5

An append-only file system.
Writes are appended to a sequential log Data log.
Reads require a single random access

Log-structured datastore

SLIDE 6

Q1) “The workloads these systems support share several characteristics, they are:

I/O, not computation, intensive,
requiring random access over large datasets,
and the size of objects stored is typically small.”

Why workloads of these characteristics represent a challenge to the system design?

SLIDE 7

Ans - Q1)

Increasing gap between CPU performance - I/O bandwidth.
"For data-intensive workloads storage, network, and memory bandwidth

bottleneck often cause low CPU utilization."

The "Small-write problem." - Multiple random disk writes(very slow).

SLIDE 8

Q2) “The key design choice in FAWN-KV is the use of a log structured per-node datastore called FAWN-DS that provides high performance reads and writes using flash memory.” “These performance problems motivate log-structured techniques for flash filesystems and data structures” What key benefit does a log structured data organization bring to the KV store design?

SLIDE 9

Ans - Q2)

get() = Random read.
While, put() and delete() = Append.
log-structured design = append only filesystem.
Hence, using a log-structured data store prevents small random writes on

disk.

SLIDE 10

Q3) “To provide this property, FAWN-DS maintains an in-DRAM hash table (Hash Index) that maps keys to an offset in the append-only Data Log on flash.”

SLIDE 11

Ans - Q3)

Large metadata - long buckets(nodes) and multiple pointers for each

node(Linked List).

RAM is volatile - in case of failure, the whole Hash Table is will be lost!

SLIDE 12

Q4) “It stores only a fragment of the actual key in memory to find a location in the log;” Is there concern on correctness of this design?

SLIDE 13

Ans - Q4)

What if multiple keys have have the fragment part that is similar?

○ Reads the full key from the log and verifies it with the key it read. ○ Therefore, no worries about the correctness.

SLIDE 14

Q5) Explain "Basic functions:" Store, Lookup, Delete

SLIDE 15

Ans - Q5)

Store:

○ appends entry log updates the corresponding hash table entry.

Lookup:

○ gets offset from hash entry and indexes into Data log, and returns the data blob

Delete:

○ invalidates the hash entry by clearing the valid flag. ○ appends Delete entry to the log.

Why append delete? - Discussed in the answer to the next question.

Figure copied from http://vijay.vasu.org/static/talks/fawn-sosp2009-slides.pdf

SLIDE 16

Q6) “As an optimization, FAWN-DS periodically checkpoints the index by writing the Hash Index and a pointer to the last log entry to flash.” Why does this checkpointing help with the recovery efficiency? Why is a Delete entry needed in the log for a correct recovery?

SLIDE 17

Ans - Q6)

How check point helps with recovery efficient?

○ After a failure only the contents starting from the checkpoint are necessary to create the Hash Index.

Why the Delete entry?

○ Fault tolerance. ○ Avoid random writes to disks.

SLIDE 18

FAWN - Fast Array of Wimpy Nodes

David G. Andersen et al.

Thank you