Database-Powered Web Servers Alexandros Labrinidis Univ. of - - PowerPoint PPT Presentation

database powered web servers
SMART_READER_LITE
LIVE PREVIEW

Database-Powered Web Servers Alexandros Labrinidis Univ. of - - PowerPoint PPT Presentation

Database-Powered Web Servers Alexandros Labrinidis Univ. of Pittsburgh labrinid@cs.pitt.edu Examples of db-powered web sites Google.com (search engine) Amazon.com (shopping) eBay.com (auctions) WellsFargo.com (online banking)


slide-1
SLIDE 1

Database-Powered Web Servers

Alexandros Labrinidis

  • Univ. of Pittsburgh

labrinid@cs.pitt.edu

slide-2
SLIDE 2

Alexandros Labrinidis, Univ. of Pittsburgh

2

CMU . CS 415 . 07 November 2002

Examples of db-powered web sites

Google.com (search engine) Amazon.com (shopping) eBay.com (auctions) WellsFargo.com (online banking) weather.com (forecasts) expedia.com (travel) NSF.gov (proposal submission) my.yahoo.con (personalized newspaper) NYTimes.com (electronic newspaper)

slide-3
SLIDE 3

Alexandros Labrinidis, Univ. of Pittsburgh

3

CMU . CS 415 . 07 November 2002

Motivating Example: nytimes.com

slide-4
SLIDE 4

Alexandros Labrinidis, Univ. of Pittsburgh

4

CMU . CS 415 . 07 November 2002

Why is WebDB so popular?

Arguments from web “producers”:

easy to “publish” databases over the Web wealth of information available enable personalization allow targeted advertising

Arguments from web “consumers”:

no need to install special software easy to learn - uniform user interface personalized content

Today – even seemingly static sites have WebDB

slide-5
SLIDE 5

Alexandros Labrinidis, Univ. of Pittsburgh

5

CMU . CS 415 . 07 November 2002

Typical WebDB architecture – 3 tiers

Web server: handle HTTP requests Application server: web workflow DB server: data storage & queries

User 1 User 2 User 3 User … User n

internet web server app server db server

slide-6
SLIDE 6

Alexandros Labrinidis, Univ. of Pittsburgh

6

CMU . CS 415 . 07 November 2002

Typical WebDB architecture – 2 tiers

2-tiers: incorporate application server within web server

User 1 User 2 User 3 User … User n

internet web & app server db server

slide-7
SLIDE 7

Alexandros Labrinidis, Univ. of Pittsburgh

7

CMU . CS 415 . 07 November 2002

HTTP: HyperText Transfer Protocol

Client request:

nitrogen{ 4} telnet www.google.com 80 Trying 216.239.37.101... Connected to www.google.com. Escape character is '^ ]'. GET /index.html HTTP/1.0 [two carriage returns]

slide-8
SLIDE 8

Alexandros Labrinidis, Univ. of Pittsburgh

8

CMU . CS 415 . 07 November 2002

HTTP: HyperText Transfer Protocol

Server response:

HTTP/1.0 200 OK Content-Length: 2532 Connection: Close Server: GWS/2.0 Date: Thu, 07 Nov 2002 16:57:59 GMT Content-Type: text/html Cache-control: private Set-Cookie:

PREF= ID= 24bce47555c1db8b:TM= 1036688279:LM= 1036688279:S= t4XqRr3VPTPwK MEp; expires= Sun, 17-Jan-2038 19:14:07 GMT; path= /; domain= .google.com

< html> …. < /html> Connection closed by foreign host.

slide-9
SLIDE 9

Alexandros Labrinidis, Univ. of Pittsburgh

9

CMU . CS 415 . 07 November 2002

HTML: HyperText Markup Language

page wrapped in < html> < /html> many formatting commands:

< font color= “red”> SOMETHING IN RED< /font> CSS: Cascading StyleSheets

Form example:

<form action=“/activate.cgi” method=POST> Please give your name: <input type=text name=“username” size=15 maxlength=30> <input type=submit value=“Submit Name”> </form>

slide-10
SLIDE 10

Alexandros Labrinidis, Univ. of Pittsburgh

10

CMU . CS 415 . 07 November 2002

CGI: Common Gateway Interface

  • Introduced in the early 90s
  • Protocol for exchanging form data between

client and server

User requests form Web Server sends form to user User submits form CGI forwards form data to application process data & generate output Web Server sends

  • utput to user

User receives output network db server

slide-11
SLIDE 11

Alexandros Labrinidis, Univ. of Pittsburgh

11

CMU . CS 415 . 07 November 2002

CGI encoding

method= GET

Form data are encoded as part of the URL

(e.q. http://www.google.com/search?q=cmu)

method= POST

Form data are passed via environment variables

encoding of query string:

(name,value) pairs from FORM name1= value1&name2= value2&… space +

  • ther controls characters in hexadecimal format
slide-12
SLIDE 12

Alexandros Labrinidis, Univ. of Pittsburgh

12

CMU . CS 415 . 07 November 2002

Cookies

HTTP is connection-less

a new connection is established with every request HTTP/1.1 supports persistent connections (but not popular)

Q: How to maintain state over a session? A: Cookies Cookies are text-only strings that are stored at the

browsers’ memory (and disk)

  • .google.com TRUE / FALSE 2147368447 PREF

ID=0a2612f3162aa05f:TM=1035999053:LM=1035999053:S=ko0- i9cOsU6nMaW6

slide-13
SLIDE 13

Alexandros Labrinidis, Univ. of Pittsburgh

13

CMU . CS 415 . 07 November 2002

Cookies and Databases

Use cookie to store client-ID Storing client-ID enables personalization

Example: NYTimes.com

Storing client-ID enables access to restricted areas

Example: NYTimes.com (subscription-based access to articles)

Use cookie to prohibit duplicate submissions on polls

Randomly generated ID Matched against database of those already “voted”

slide-14
SLIDE 14

Alexandros Labrinidis, Univ. of Pittsburgh

14

CMU . CS 415 . 07 November 2002

Cookies and Privacy

Cookies only store text-strings given by servers

Include domain, expiration date, etc

Only a server from the same domain can access a

previously stored cookie

The case of Doubleclick:

Same cookie used by an advertising agency Match individuals with browsing profiles that span multiple sites Huge data mining opportunity Huge controversy

slide-15
SLIDE 15

Alexandros Labrinidis, Univ. of Pittsburgh

15

CMU . CS 415 . 07 November 2002

Client-side programs

Extend web user interface by running programs at clients

Allow for sophisticated UIs Must be careful of malicious code (from untrusted servers)

Java

Full-fledged programming language Protection capabilities

Other client-side scripting languages:

JavaScript (SUN) VBScript (MSFT) Flash/Shockwave (Macromedia)

slide-16
SLIDE 16

Alexandros Labrinidis, Univ. of Pittsburgh

16

CMU . CS 415 . 07 November 2002

Server-side programs

Implement server applications and workflow

Programs that generate HTML Embedded HTML

Java:

Servlets Java Server Pages (JSP – SUN)

Server-side scripting:

Active Server Pages (ASP – MSFT) PHP mod_perl

slide-17
SLIDE 17

Alexandros Labrinidis, Univ. of Pittsburgh

17

CMU . CS 415 . 07 November 2002

Embedded HTML: PHP

PHP stands for “PHP Hypertext Processor” http://www.php.net Example:

<html> <head> <title>Example</title> </head> <body> <?php echo "Hi, I'm a PHP script!"; ?> </body> </html>

slide-18
SLIDE 18

Alexandros Labrinidis, Univ. of Pittsburgh

18

CMU . CS 415 . 07 November 2002

Generating HTML: mod_perl

mod_perl: efficiently use perl to generate HTML http://perl.apache.org Example:

#!/bin/perl print << EOF; <html> <head> <title>Example</title> </head> <body> Hi, I was generated by mod_perl! </body> </html> EOF

slide-19
SLIDE 19

Alexandros Labrinidis, Univ. of Pittsburgh

19

CMU . CS 415 . 07 November 2002

DB interface: mod_perl

Two modules – allow for portability:

DBI: database independent library DBD: database dependent driver

Example:

use DBI; my $dbh = DBI->connect(“oracle”, “user”, “pass”); my $stmt = $dbh->prepare(“SELECT * from foo;”); $stmt->execute(); while (@row = $stmt->fetchrow_array() ) { print “Row: @row\n”; } $dbh->disconnect();

slide-20
SLIDE 20

Alexandros Labrinidis, Univ. of Pittsburgh

20

CMU . CS 415 . 07 November 2002

Improving Performance: CGI

CGI forks new process on every call

context switch re-connect to db server

mod_perl: maintain pool of processes

no context switch – assign call to existing process scales better – 10 times faster

[Labrinidis & Roussopoulos, SIGMOD Record, Mar 2000]

mod_perl: re-use db connections

no need to re-connect to db server twice faster than mod_perl

slide-21
SLIDE 21

Alexandros Labrinidis, Univ. of Pittsburgh

21

CMU . CS 415 . 07 November 2002

Improving Performance: Caching

Web Caching

store static content close to the users avoid re-transmitting over the network consistency: Time-To-Live (TTL)

Dynamic Web Caching

store dynamic content (from db) and re-use if possible avoid re-computing from database what to cache:

entire web pages query results HTML fragments