The CEAS The CEAS Programming Language Programming Language Luis - - PowerPoint PPT Presentation

the ceas the ceas programming language programming
SMART_READER_LITE
LIVE PREVIEW

The CEAS The CEAS Programming Language Programming Language Luis - - PowerPoint PPT Presentation

The CEAS The CEAS Programming Language Programming Language Luis Alonso (lra2103) Hila Becker (hb2143) Kate McCarthy (km2302) Isa Muqattash (imm2104) Motivation Webpages contain clutter extraneous menus, links, corporate logos,


slide-1
SLIDE 1

The CEAS The CEAS Programming Language Programming Language

Luis Alonso (lra2103) Hila Becker (hb2143) Kate McCarthy (km2302) Isa Muqattash (imm2104)

slide-2
SLIDE 2

Motivation

  • Webpages contain clutter – extraneous

menus, links, corporate logos, banners, etc.

  • Maj or disadvantages for

− visually disabled users − users of handheld devices with constrained

screens

slide-3
SLIDE 3

Solution – Crunch

  • Collection of heuristic filters that

recognize and remove “ clutter”

  • Implementation

− Web proxy − GUI − Document Obj ect Model

  • Avoid “ one size fits all” pitfall by

adapting to the target application

slide-4
SLIDE 4

CEAS

  • A programmatic interface to the Crunch

system

  • Defines a simple and rich set of

commands for content extraction

  • Features

− Tabbed Browsing − Offline Browsing − User defined filters via functions

slide-5
SLIDE 5

Language Characteristics Language Characteristics

  • Program flow control
  • Proceed from the first statement in a file to the last
  • Data types
  • int, boolean, String, void, Page and List
  • Operators
  • +, -, *, /, %, >, <, >=, <=, ==, !=, !, &, |
  • Control structures
  • if, while, do while, for
  • Functions
  • createPage(), extract(), append(), title(), rank(), status(),

length(), print(), println(), show(), savePage()

  • Extraction constants
  • IMAGES, ADS, FLASH, SCRIPTS, TXTLINKS,

IMGLINKS, XTRNSTYLE, STYLES, FORMS, LINKLISTS, EMPTYTBLS, INPUT, META, BUTTON, and IFRAME

slide-6
SLIDE 6

Page Data Type Page Data Type

  • Creating a Page

Page createPage(“ http:/ / www.cnn.com/ ” ); Page createPage(webpage); Page createPage(“ http:/ / www.nytimes.com/ ” , “ pages/ world/ index.html” );

  • Manipulating a Page

extract(webpage, IMAGES , ADS ); append(webpage, “ next” ); title(webpage, “ My Webpage” ); rank(webpage, 2); status(webpage);

slide-7
SLIDE 7

Some Simple Examples Some Simple Examples

Page blog1 = createPage(“ http:/ / www.myblog.com/ latestpost” ); extract(blog1, IMAGES ); show(blog1); List pl[3]; pl[0] = createPage(“ http:/ / www.myblog.com/ latestpost” ); pl[1] = createPage(” http:/ / www.otherblog.com/ latest” ); pl[2] = createPage(” http:/ / www.newblog.com/ ” ); for (int i=0 : 3) { extract(pl[i], IMAGES ); } show(pl);

slide-8
SLIDE 8

High-Level Implementation

http://... CEAS source Lexer Parser AST Semantic Analyzer If No Errors AST Interpreter Crunch File to disk File to Browser

slide-9
SLIDE 9

Implementation Details

  • S

emantic Analyzer − Uses simplified data type − Every line checked

  • Interpreter

− Data types actually store values − Performs calculations − Uses Crunch to extract from web pages − Displays HTML using built-in browser

slide-10
SLIDE 10

Implementation Details

  • Functions

− S

tatic symbol table for functions

− Function body, as AS

T, stored in table

  • Variables

− Nested symbol tables for each scope − Create new interpreter for included files − Page types are treated as Java-like

  • bj ects
slide-11
SLIDE 11

Testing

  • Challenges

− LRM − Complexity

  • What to look for

− Code that should compile − Code that should not compile

  • Unit Testing

− Lexer/ Parser − Overall Compiler

slide-12
SLIDE 12

Testing

  • Methodology

− Overall process − Test hardness

  • Types of Tests

− False positives − False negatives − Visual cases (title, show, rank)

slide-13
SLIDE 13

Lessons Learned

  • Good planning is extremely important
  • Really evaluate design before

implementation

  • Keep everyone in the loop
  • S

tick to your deadlines

  • Good communication is key