Gigascope: A Stream Database for Network Applications Authors: - - PowerPoint PPT Presentation

gigascope a stream database for network applications
SMART_READER_LITE
LIVE PREVIEW

Gigascope: A Stream Database for Network Applications Authors: - - PowerPoint PPT Presentation

Gigascope: A Stream Database for Network Applications Authors: Cranor, Johnson, Spataschek (AT&T Labs), Shkapenyuk (CMU) Presented by: Brian Agala Overview Problem Goals Background: Data Streams Gigascope Data Stream


slide-1
SLIDE 1

Gigascope: A Stream Database for Network Applications

Authors: Cranor, Johnson, Spataschek (AT&T Labs), Shkapenyuk (CMU) Presented by: Brian Agala

slide-2
SLIDE 2

Overview

  • Problem
  • Goals
  • Background: Data Streams
  • Gigascope Data Stream Management System
  • Conclusions

Brian Agala 10/28/2014 2

slide-3
SLIDE 3

Problem: Managing a Large Data Communications Network

  • Requires constant network

monitoring

  • Decentralized  Difficult to manage
  • Analyze network trace dumps
  • Limited set of network monitoring

reports

Brian Agala 10/28/2014 3

slide-4
SLIDE 4

Goals

Develop a network data analysis tool which has:

  • Speed and flexibility that network

analysts require

  • Provides structured querying

environment to make complex analysis easy to control

Brian Agala 10/28/2014 4

slide-5
SLIDE 5

Goals

Create a data analysis engine that will be used in many settings:

  • traffic analysis
  • performance monitoring
  • debugging
  • protocol analysis and development
  • router configuration
  • intrusion detection
  • network monitoring

Brian Agala 10/28/2014 5

slide-6
SLIDE 6

Data Streams: Why Now?

  • Haven’t data feeds into databases always existed? Yes
  • Modify underlying databases and data warehouses
  • Complex queries are specified over stored data
  • With traditional data feeds
  • Simple queries needed in real-time
  • Complex queries performed offline

Brian Agala 10/28/2014 6

DB

Queries

slide-7
SLIDE 7

Data Streams: Real-Time Queries, High-Volume and High-Velocity Data

  • Two recent developments: application and technology driven
  • Need for sophisticated real-time queries/analyses
  • Massive data volumes of transactions and measurements

Brian Agala 10/28/2014 7

DB

Queries

Massive volumes of data … arriving at high-velocity … with the need for real-time queries

slide-8
SLIDE 8

Databases vs Data Streams

Database Systems

  • Relation: tuple set
  • Data Update: modifications
  • Query: transient
  • Query Answer: exact
  • Query Evaluation: arbitrary

Data Stream Systems

  • Relation: tuple sequence
  • Data Update: appends
  • Query: persistent
  • Query Answer: approximate
  • Query Evaluation: one pass

Brian Agala 10/28/2014 8

slide-9
SLIDE 9

Gigascope: Data Stream Management System (DSMS) for Network Applications

  • Designed for monitoring high-rate data streams
  • Pure stream database (no stored relations or continuous queries)
  • Pipelined operators that rely on properties of the stream
  • Uses SQL-like language, named GSQL
  • Input is a data stream, output is a data stream
  • Simplicity of implementation, does not transform input data

stream into a windowed table, operate on data stream directly

Brian Agala 10/28/2014 9

slide-10
SLIDE 10

The Language

  • Supports selection, join, aggregation, and stream merge
  • GSQL processor is a code generator, translating the query to C or C++ code

resulting in a fast execution system

  • Example 1: Get destination IP, port, and timestamp from TCP packet on the first

Ethernet interface card

DEFINE { query_name tcpDest0; } Select destIP, destPort, time From eth0.TCP Where IPVersion = 4 and Protocol = 6

  • Example 2: Combine streams from multiple sources into a single stream

DEFINE { query_name tcpDest; } Merge tcpDest0.time : tcpDest1.time From tcpDest0, tcpDest1

Brian Agala 10/28/2014 10

slide-11
SLIDE 11

Gigascope Architecture

  • Two layer architecture for early data

reduction

  • High level queries for expensive

processing (High-level Filtering, Transformation, and Aggregation – HFTA)

  • Fast lightweight data reduction queries

(Low-level Filtering, Transformation, and Aggregation – LFTA)

  • Possible to push the query as far down as

the NIC as an optimization

Brian Agala 10/28/2014 11

App high high low low low NIC Ring buffer

slide-12
SLIDE 12

Gigascope: Hidden P2P Traffic Detection

  • Business Challenge: AT&T IP customer wanted to accurately monitor peer-

to-peer (P2P) traffic within their network

  • Previous Approach: Using TCP port number found in Netflow data
  • Issues: P2P traffic might not use known P2P port

numbers

  • Solution:
  • Use Gigascope to search for P2P related keywords

within each TCP datagram

  • Identified 3 times more P2P traffic than when

using Netflow

Brian Agala 10/28/2014 12

slide-13
SLIDE 13

Gigascope: Web Client Performance Monitoring

  • Business Challenge: AT&T IP customer wanted to monitor latency
  • bserved by clients to find performance problems
  • Previous Approach: Measure latency from “active clients” that establish

network connections with servers

  • Issues: Use of “active clients” is not very

representative

  • Solution:
  • Use Gigascope to track TCP synchronization

and acknowledgement packets

  • Report round trip time statistics: latency

Brian Agala 10/28/2014 13

slide-14
SLIDE 14

Gigascope: Other Applications

Desired goals for Gigascope:

  • traffic analysis (E.g. Hidden P2P Traffic Detection)
  • performance monitoring (E.g. Web Client Performance Monitoring)
  • debugging
  • protocol analysis and development
  • router configuration
  • intrusion detection
  • network monitoring

Brian Agala 10/28/2014 14

slide-15
SLIDE 15

Conclusions

  • Querying and finding patterns in massive streams is a real

problem with many real-world applications

  • Need for sophisticated real-time queries
  • Massive data volumes of transactions
  • Fundamentally rethink data management issues under

stringent constraints:

  • Single-pass algorithms with limited memory resources
  • Resource limitations at low-level
  • Important to think of end-to-end architecture

Brian Agala 10/28/2014 15