Ali Kamandi kamandi@ce.sharif.edu Spring 2007 Sharif University - - PowerPoint PPT Presentation

ali kamandi kamandi ce sharif edu spring 2007 sharif
SMART_READER_LITE
LIVE PREVIEW

Ali Kamandi kamandi@ce.sharif.edu Spring 2007 Sharif University - - PowerPoint PPT Presentation

Ali Kamandi kamandi@ce.sharif.edu Spring 2007 Sharif University of Technology Part 1: XML and DTD SGML (Standard Generalized Markup Language) ISO Standard, 1986, for data storage & exchange


slide-1
SLIDE 1
  • Ali Kamandi

kamandi@ce.sharif.edu Spring 2007 Sharif University of Technology

slide-2
SLIDE 2
  • Part 1: XML and DTD
slide-3
SLIDE 3
  • SGML (Standard Generalized Markup Language)

ISO Standard, 1986, for data storage & exchange Meta-language for defining languages A famous SGML language: HTML!! Separation of content and display SGML reference is 600 pages long

XML (eXtensible Markup Language)

W3C (World Wide Web Consortium) --

http://www.w3.org/XML/) recommendation in 1998

Simple subset (80/20 rule) of SGML XML specification is 26 pages long

slide-4
SLIDE 4
  • eXtensible Markup Language

Metalanguage - used to create other

languages

Has become a universal data-exchange

format

slide-5
SLIDE 5
  • <bibliography>

<paper ID= "object-fusion"> <authors> <author>Y.Papakonstantinou</author> <author>S. Abiteboul</author> <author>H. Garcia-Molina</author> </authors> <fullPaper source="fusion"/> <title>Object Fusion in Mediator Systems</title> <booktitle>VLDB 96</booktitle> </paper> </bibliography>

slide-6
SLIDE 6
  • Human-readable

Machine-readable (easy to parse) Standard format for data interchange Possible to validate Extensible

can represent any data can add new tags for new data formats

Hierarchical structure (nesting)

slide-7
SLIDE 7
  • !"#

element element name Character content Element Content Empty Element

<bibliography> <paper ID="object-fusion"> <authors> <author>Y.Papakonstantinou</author> <author>S. Abiteboul</author> <author>H. Garcia-Molina</author> </authors> <fullPaper source="fusion"/> <title>Object Fusion in Mediator Systems</title> <booktitle>VLDB 96</booktitle> </paper> </bibliography>

slide-8
SLIDE 8

$

!%

<bibliography> <paper ID="object-fusion"> <authors> <author>Y.Papakonstantinou</author> <author>S. Abiteboul</author> <author>H. Garcia-Molina</author> </authors> <fullPaper source="fusion"/> <title>Object Fusion in Mediator Systems</title> <booktitle>VLDB 96</booktitle> </paper> </bibliography>

Attribute name Attribute Value

slide-9
SLIDE 9

&

'( !

A tag is a name, enclosed by angle brackets,

with optional attributes

<foo id=“123”>

An element is a tree, containing an open tag,

contents, and a close tag

<foo id=“123”>This is an element</foo>

slide-10
SLIDE 10

)

  • A basic XML document is an XML element

Example:

<books> <book isbn=“123”> <title> Second Chance </title> <author> Matthew Dunn </author> </book> </books>

slide-11
SLIDE 11
  • <BOOKS>

<book id=“123” loc=“library”> <author>Hull</author> <title>California</title> <year> 1995 </year> </book> <article id=“555” ref=“123”> <author>Su</author> <title> Purdue</title> </article> </BOOKS>

Hull Purdue

BOOKS 123 555

California Su title author title author article book year 1995 ref loc=“library”

slide-12
SLIDE 12
  • *+

Tags properly nested Tag names case-sensitive All tags must be closed

  • r self-closing

<foo/> is the same as <foo></foo>

Attributes enclosed in quotes Document consists of a single (root) element

slide-13
SLIDE 13
  • , -.!(/

Well-Formed:

Structure follows XML syntax rules

Valid:

Structure conforms to a DTD

slide-14
SLIDE 14
  • <DT>

<IMG SRC= "greenball.gif" >&nbsp; <A NAME="object-fusion"></A> Y.Papakonstantinou, S. Abiteboul, H. Garcia-Molina. <A HREF="http://www-cse.ucsd.edu/~yannis/papers/fusion.ps"> "ObjectFusion in Mediator Systems".</A> In <I>VLDB 96.</I> </DT>

  • HTML confuses presentation with content
  • No Explicit Structure, Semantics

Author Conference Title

'/(0

slide-15
SLIDE 15
  • Extensible set of tags

Content orientated Standard Data

infrastructure

Allows multiple output

forms

Fixed set of tags Presentation oriented No data validation

capabilities

Single presentation

XML HTML

slide-16
SLIDE 16
  • *1*!1

XML Document Type Definitions (DTDs): XML Schema

defines structure and data types allows developers to build their own libraries of

interchanged data types

slide-17
SLIDE 17
  • An XML document may have an optional DTD.

A grammar for XML documents Defines

which elements can contain which other elements which attributes are allowed/required/permitted on

which elements

slide-18
SLIDE 18

$

2'22+1"

Both sides must agree on DTD DTD can be part of document or stored

separately

slide-19
SLIDE 19

&

Consider an XML document:

<db><person><name>Alan</name> <age>42</age> <email>agb@usa.net </email> </person> <person>………</person> ………. </db>

slide-20
SLIDE 20

)

DTD for it might be:

<!DOCTYPE db [ <!ELEMENT db (person*)> <!ELEMENT person (name, age, email)> <!ELEMENT name (#PCDATA)> <!ELEMENT age (#PCDATA)> <!ELEMENT email (#PCDATA)> ]>

slide-21
SLIDE 21
  • Occurrence Indicator:

Occurrence Indicator One or more Required, repeatable + None, one, or more Optional, repeatable * None or one Optional ? One and only

  • ne

Required (no indicator)

slide-22
SLIDE 22
  • !21

<!element bibliography paper*> <!element paper (authors, fullPaper?, title, booktitle)> <!element authors author+> <!element author (#PCDATA)>

Character content Authors followed by

  • ptional fullpaper,

followed by title, followed by booktitle Sequence of 1 or more author Sequence of 0 or more paper

slide-23
SLIDE 23
  • *1"!3+!

<type name="Order" > <element name="name" type="string" /> <element name="street" type="string" /> <element name="zip" type="integer" /> <...> <attribute name="orderDate" type="date" /> </type>

slide-24
SLIDE 24
  • *1"!3+!
  • <type name="personName">

<element name="title" minOccurs="0"/> <element name="forename" minOccurs="0" maxOccurs="*"/> <element name="surname"/> </type> <type name="extendedName" source="personName" derivedBy="extension"> <element name="generation" minOccurs="0"/> </type> <type name="simpleName" source="personName" derivedBy="restriction"> <restrictions> <element name="title" maxOccurs="0"/> <element name="forename" minOccurs="1" maxOccurs="1"/> </restrictions> </type>

slide-25
SLIDE 25
  • Part 2:

XSL: XML Transformation

slide-26
SLIDE 26
  • *

The eXtensible Style Language Transforms XML into HTML Actually, transforms XML into a tree, then

turns that tree into another tree, then outputs that tree as XML

slide-27
SLIDE 27
  • *1"1

XML Source XSL Stylesheet HTML Output XSL Processor

slide-28
SLIDE 28

$

'

<?xml version="1.0"?> <!DOCTYPE menu SYSTEM "menu.dtd"> <menu> <meal name="breakfast"> <food>Scrambled Eggs</food> <food>Hash Browns</food> <drink>Orange Juice</drink> </meal> <meal name="snack"> <food>Chips</food> </meal> </menu>

menu meal name

"breakfast"

food

"Scrambled Eggs"

food

"Hash Browns"

drink

"Orange Juice"

meal

slide-29
SLIDE 29

&

4"

The stuff inside the quotes in XSL patterns

"/person/name/firstname"

A sensible way to locate content in an XML

document

slide-30
SLIDE 30

)

4" *+

book/title

title child of book child of current node

/book/title

title child of book child of document root

@language

language attribute of current node

chapter/@language

language attribute of chapter child of current node

slide-31
SLIDE 31
  • 4" *+51(6

chapter[3]/para

all the para children of the third chapter

book/*/title

all title children of all children of book (but not of

their children)

chapter//para

all para children of any child of chapter,

recursively

../../title

title child of parent of parent

slide-32
SLIDE 32
  • 4" %%

descendant-or- self::node() // parent::node() .. attribute:: @ self::node() .

slide-33
SLIDE 33
  • 4" .1

para[1] or para[position()=1]

the first para node of the current node

para[last()] para[count(child::note)>0]

all paragraphs with one or more notes

para[id("abstract")]

selects all child nodes like

<para id="abstract">

para[@type='secret'] or para[attribute::type='secret']

selects all child nodes like

<para type="secret">

slide-34
SLIDE 34
  • 4" .151(6

para[not(title)]

selects all child paragraphs with no title elements

para[position() >= 2 and position() < last()]

selects all but the first and last paragraphs

para[lang("en")]

matches <para xml:lang="en-uk">…</para>

note[contains(., "alex")]

. means "test childrens' content too, recursively" in this

context

note[starts-with(., "hello")]

slide-35
SLIDE 35
  • *7

XSL is a series of rules or templates Each template matches an element Templates can contain XML commands

slide-36
SLIDE 36
  • *#!!3 -!

Main rule: apply-templates

looks for a template match applies it

Usually the template calls apply-templates

recursively on its children

If not, then processing stops at that node (but

continues for its other siblings that matched this template)

slide-37
SLIDE 37
  • 2 7

For a leaf node, output its contents For a branch node, apply templates

(recursively) (including default rule)

slide-38
SLIDE 38

$

*!*#!!

value-of

grabs raw value, good for text elements and

attributes

if

executes conditionally

number

counts position of element in group good for ordered list numbering, table of contents,

etc.

slide-39
SLIDE 39

&

'

<?xml version="1.0"?> <!DOCTYPE menu SYSTEM "menu.dtd"> <menu> <meal name="breakfast"> <food>Scrambled Eggs</food> <food>Hash Browns</food> <drink>Orange Juice</drink> </meal> <meal name="snack"> <food>Chips</food> </meal> </menu>

menu meal name

"breakfast"

food

"Scrambled Eggs"

food

"Hash Browns"

drink

"Orange Juice"

meal

slide-40
SLIDE 40

)

*+!

<?xml version="1.0"?> <!DOCTYPE xsl:stylesheet [ <!ENTITY background "#99FFFF"> ]> <xsl:stylesheet xmlns:xsl="http://www.w3.org/XSL/Transform/1.0" xmlns="http://www.w3.org/TR/REC-html40" result-ns="">

slide-41
SLIDE 41
  • +! 51(6

<xsl:template match="menu"> <HTML> <HEAD> <TITLE>Menu: <xsl:value-of select="@name"/> </TITLE> </HEAD> <BODY BGCOLOR="&background;"> <H1> Menu <xsl:value-of select="@name"/> </H1>

slide-42
SLIDE 42
  • +! 51(6

<xsl:apply-templates /> </BODY> </HTML> </xsl:template>

slide-43
SLIDE 43
  • +! 51(6

<xsl:template match="meal"> <H2><xsl:value-of select="@name"/></H2><br />; <UL> <xsl:apply-templates/> </UL> </xsl:template>

slide-44
SLIDE 44
  • +! 51(6

<xsl:template match="food"> <LI><xsl:apply-templates/></LI> </xsl:template> <xsl:template match="drink"> <LI><xsl:apply-templates/></LI> </xsl:template> </xsl:stylesheet>

slide-45
SLIDE 45
  • <BOOKS>

<book id=“123” loc=“library”> <author>Hull</author> <title>California</title> <year> 1995 </year> </book> <article id=“555” ref=“123”> <author>Su</author> <title> Purdue</title> </article> </BOOKS>

Hull Purdue

BOOKS 123 555

California Su title author title author article book year 1995 ref loc=“library”

slide-46
SLIDE 46
  • *1 3

<xsl:if test="author">

by <xsl:apply-templates select="author" /> </xsl:if>

Note: no else (?!?)

slide-47
SLIDE 47
  • *3-1"

<xsl:for-each select="chapter">

<h2><xsl:value-of select="@title"/> </h2> </xsl:for-each>

slide-48
SLIDE 48

$

713

XML Spec

http://www.w3.org/TR/REC-xml

XML FAQ

http://www.ucc.ie/xml/

Café con Leche

http://metalab.unc.edu/xml/

http://www.xml.com/

xml.org ibm.com/xml Servlet FAQ in XSL

http://www.purpletech.com/servlet-faq/