JAXP ( Java API for XML Processing ) Dr. Kanda Runapongsa Dr. - - PDF document

jaxp java api for xml processing
SMART_READER_LITE
LIVE PREVIEW

JAXP ( Java API for XML Processing ) Dr. Kanda Runapongsa Dr. - - PDF document

Java Web Services, Software Park Thailand, 2004 JAXP ( Java API for XML Processing ) Dr. Kanda Runapongsa Dr. Kanda Runapongsa Department of Computer Engineering Department of Computer Engineering Khon Kaen University Khon Kaen University


slide-1
SLIDE 1

Java Web Services, Software Park Thailand, 2004

  • Dr. Kanda Runapongsa, Khon Kaen University

1

1

JAXP ( Java API for XML Processing )

  • Dr. Kanda Runapongsa
  • Dr. Kanda Runapongsa

Department of Computer Engineering Department of Computer Engineering Khon Kaen University Khon Kaen University

2

Overview Overview

  • What are XML Parsers?

What are XML Parsers?

  • What is JAXP ?

What is JAXP ?

  • SAX : Simple API for XML

SAX : Simple API for XML

  • DOM : Document Object Model

DOM : Document Object Model

  • SAX vs. DOM

SAX vs. DOM

  • When to Use DOM ?

When to Use DOM ?

  • When to Use SAX ?

When to Use SAX ?

  • Transforming with XSLT

Transforming with XSLT

slide-2
SLIDE 2

Java Web Services, Software Park Thailand, 2004

  • Dr. Kanda Runapongsa, Khon Kaen University

2

3

What are XML Parsers What are XML Parsers ? ?

  • In order to process XML data, every program or

In order to process XML data, every program or server process needs an XML parser server process needs an XML parser

  • The parser extracts the actual data out of the

The parser extracts the actual data out of the textual representation textual representation

  • It is essential for the automatic processing of

It is essential for the automatic processing of XML documents XML documents

4

What are XML Parsers? (Cont.) What are XML Parsers? (Cont.)

  • Parsers also check whether documents conform

Parsers also check whether documents conform to the XML standard and have a correct to the XML standard and have a correct structure structure

  • There are two types of XML parsers

There are two types of XML parsers

  • Validating: check documents against a DTD or an

Validating: check documents against a DTD or an XML Schema XML Schema

  • Non

Non-

  • validating: do not check documents against a

validating: do not check documents against a DTD or an XML Schema. DTD or an XML Schema.

slide-3
SLIDE 3

Java Web Services, Software Park Thailand, 2004

  • Dr. Kanda Runapongsa, Khon Kaen University

3

5

What is JAXP ? What is JAXP ?

  • JAXP :

JAXP : The Java API for XML Processing The Java API for XML Processing

  • JAXP =

JAXP = SAX SAX + + DOM DOM + + XSTL XSTL (Java API) (Java API)

  • Now

Now(24/ 09/ 2004) (24/ 09/ 2004), JAXP , JAXP v.1.2.6 v.1.2.6

  • JAXP allows you

JAXP allows you t to

  • u

use any XML se any XML

compliant parser from within

compliant parser from within your application your application

  • A thin and lightweight Java API for parsing and transforming

A thin and lightweight Java API for parsing and transforming XML documents XML documents

  • Allows for pluggable parsers and transformers

Allows for pluggable parsers and transformers

  • Allow passing of XML document using :

Allow passing of XML document using : >> Event Driven (SAX 2.0) >> Event Driven (SAX 2.0) >> >> Three Bases (DOM level 2) Three Bases (DOM level 2)

6

JAXP Pluggable Framework for Parsers and Transformers

User Application Reference Parsers Other Parser JAXP Interface

slide-4
SLIDE 4

Java Web Services, Software Park Thailand, 2004

  • Dr. Kanda Runapongsa, Khon Kaen University

4

7

Packages in JAXP Packages in JAXP

  • The JAXP API model is quite easy to

The JAXP API model is quite easy to understand and simple to use understand and simple to use

  • javax.xml.parsers

javax.xml.parsers

  • Provide a common interface for different

Provide a common interface for different vendor vendor’ ’s SAX and DOM parsers s SAX and DOM parsers

  • rg.w3c.dom
  • rg.w3c.dom
  • Define the Document class (DOM) as well as

Define the Document class (DOM) as well as classes for all of the components of a DOM classes for all of the components of a DOM

8

Packages in JAXP Packages in JAXP

  • rg.xml.sax
  • rg.xml.sax
  • Define the basic SAX APIs

Define the basic SAX APIs

  • javax.xml.transform

javax.xml.transform

  • Define the XSLT APIs that let you transform

Define the XSLT APIs that let you transform XML into other forms XML into other forms

slide-5
SLIDE 5

Java Web Services, Software Park Thailand, 2004

  • Dr. Kanda Runapongsa, Khon Kaen University

5

9

Current Parsing Approaches Current Parsing Approaches

  • SAX (Simple API for XML) and DOM

SAX (Simple API for XML) and DOM (Document Object Model) allow programmers (Document Object Model) allow programmers to access their information stored in XML to access their information stored in XML documents documents

  • Using any programming language and a parser for

Using any programming language and a parser for that language that language

  • Both of them take very different approaches to

Both of them take very different approaches to giving you access to your information giving you access to your information

10

Overview Overview

  • XML Parsers

XML Parsers

  • What is JAXP ?

What is JAXP ?

  • SAX : Simple API for XML

SAX : Simple API for XML

  • DOM : Document Object Model

DOM : Document Object Model

  • SAX vs. DOM

SAX vs. DOM

  • When to Use DOM ?

When to Use DOM ?

  • When to Use SAX ?

When to Use SAX ?

  • Transforming with XSLT

Transforming with XSLT

slide-6
SLIDE 6

Java Web Services, Software Park Thailand, 2004

  • Dr. Kanda Runapongsa, Khon Kaen University

6

11

SAX SAX ( Simple API for XML ) ( Simple API for XML )

12

Overview Overview

  • What is SAX ?

What is SAX ?

  • SAX Operational Model

SAX Operational Model

  • Processing XML with JAXP SAX

Processing XML with JAXP SAX

  • Callback Interfaces

Callback Interfaces

  • Handling SAX events

Handling SAX events

  • startDocument

startDocument, , endDocument endDocument, characters , characters

  • startElement

startElement, , endElement endElement

  • What the

What the ContentHandler ContentHandler Doesn Doesn’ ’t Tell You t Tell You

slide-7
SLIDE 7

Java Web Services, Software Park Thailand, 2004

  • Dr. Kanda Runapongsa, Khon Kaen University

7

13

SAX SAX ( Simple API for XML ) ( Simple API for XML )

  • SAX API is based on an event

SAX API is based on an event-

  • driven processing

driven processing model where model where

  • The data elements are interpreted on a sequential

The data elements are interpreted on a sequential basis basis

  • The callbacks are called based on selected

The callbacks are called based on selected constructs constructs

  • It uses a sequential read

It uses a sequential read-

  • only approach and does
  • nly approach and does

not support random access to the XML not support random access to the XML elements elements

14

SAX Operational Model SAX Operational Model

XML

Document Parser Provided Handler

Input

Events

slide-8
SLIDE 8

Java Web Services, Software Park Thailand, 2004

  • Dr. Kanda Runapongsa, Khon Kaen University

8

15

Processing XML with JAXP Processing XML with JAXP SAX SAX

  • Major steps for parsing using JAXP:

Major steps for parsing using JAXP:

  • Getting Factory and Parser classes to perform XML

Getting Factory and Parser classes to perform XML parsing parsing

  • Setting options such as namespaces, validation, and

Setting options such as namespaces, validation, and features features

  • Creating a

Creating a defaultHandler defaultHandler implementation class implementation class

16

Getting a Factory Class Getting a Factory Class

  • Obtain a factory class using the

Obtain a factory class using the SAXParserFactory SAXParserFactory’ ’s s static static newInstance newInstance() () method method

  • SAXParserFactory

SAXParserFactory factory = factory = SAXParserFactory.newInstance SAXParserFactory.newInstance(); ();

slide-9
SLIDE 9

Java Web Services, Software Park Thailand, 2004

  • Dr. Kanda Runapongsa, Khon Kaen University

9

17

Getting and Using a Getting and Using a SAXParser SAXParser Class Class

  • Obtain the SAX parser class from the factory by

Obtain the SAX parser class from the factory by calling the calling the newSAXParser newSAXParser() static method () static method

  • SAXParser

SAXParser parser = parser = factory.newSAXParser factory.newSAXParser(); ();

  • Parse the XML data by calling the parse method

Parse the XML data by calling the parse method

  • parser.parse(

parser.parse(“ “methodCall.xml methodCall.xml” ”, handler); , handler);

  • The second argument is the handler with type

The second argument is the handler with type ContentHandler ContentHandler

18

Callback Interfaces Callback Interfaces

  • SAX uses the Observer design pattern to tell

SAX uses the Observer design pattern to tell client applications what client applications what’ ’s in a document s in a document

  • Java developers are most familiar with this

Java developers are most familiar with this pattern from the event architecture of the AWT pattern from the event architecture of the AWT and Swing and Swing

  • MouseListener

MouseListener as the observer as the observer

  • Button as the Subject

Button as the Subject

slide-10
SLIDE 10

Java Web Services, Software Park Thailand, 2004

  • Dr. Kanda Runapongsa, Khon Kaen University

10

19

Callback Interfaces (Cont.) Callback Interfaces (Cont.)

  • In SAX,

In SAX, XMLReader XMLReader plays the role of the plays the role of the Subject and the Subject and the org.xml.sax.ContentHandler

  • rg.xml.sax.ContentHandler

plays the role of Observer plays the role of Observer

  • The biggest difference between the AWT and

The biggest difference between the AWT and SAX is that SAX does not allow more than one SAX is that SAX does not allow more than one listener to be registered with each listener to be registered with each XMLReader XMLReader

20

ContentHandler ContentHandler and and DefaultHandler DefaultHandler

  • There are eleven methods declared in the

There are eleven methods declared in the ContentHandler ContentHandler interface. interface.

  • Few SAX programs actually use all eleven

Few SAX programs actually use all eleven methods methods

  • SAX includes the

SAX includes the

  • rg.xml.sax.helpers.DefaultHandler
  • rg.xml.sax.helpers.DefaultHandler class that

class that implements the implements the ContentHandler ContentHandler interface interface

  • By extending

By extending DefaultHandler DefaultHandler, we only have to , we only have to

  • verride methods we actually care about
  • verride methods we actually care about
slide-11
SLIDE 11

Java Web Services, Software Park Thailand, 2004

  • Dr. Kanda Runapongsa, Khon Kaen University

11

21

Extending from Class Extending from Class DefaultHandler DefaultHandler

  • The following code lists the methods that are

The following code lists the methods that are

  • ften
  • ften overrided
  • verrided when defining the class that

when defining the class that extends from extends from DefaultHandler DefaultHandler

  • public void

public void startDocument startDocument() ()

  • public void

public void endDocument endDocument() ()

22

Extending from Class Extending from Class DefaultHandler DefaultHandler

  • Methods often be

Methods often be overrided

  • verrided
  • public void

public void characters(char characters(char[] text, [] text, int int start, start, int int length) length)

  • public void

public void startElement(String startElement(String namespaceURI namespaceURI, , String String localName localName, String , String qualifiedName qualifiedName, Attributes , Attributes atts atts) )

  • public void

public void endElement(String endElement(String namespaceURI namespaceURI, , String String localName localName, String , String qualifiedName qualifiedName) )

slide-12
SLIDE 12

Java Web Services, Software Park Thailand, 2004

  • Dr. Kanda Runapongsa, Khon Kaen University

12

23

Receiving Documents Receiving Documents

  • The parser invokes

The parser invokes startDocument startDocument() () as soon as as soon as it begins parsing a new document before it it begins parsing a new document before it invokes any other methods in invokes any other methods in ContentHandler ContentHandler

  • It calls

It calls endDocument endDocument() () after it after it’ ’s finished parsing s finished parsing the document and will not report any further the document and will not report any further content from that document content from that document

24

Receiving Elements Receiving Elements

  • When the parser encounters a start tag, it calls

When the parser encounters a start tag, it calls the the startElement startElement() () method method

  • When the parser encounters an en tag, it calls

When the parser encounters an en tag, it calls the the endElement endElement() () method method

  • When the parser encounters an empty

When the parser encounters an empty-

  • element

element tag, it calls the tag, it calls the startElement startElement() () method and then method and then the the endElement endElement() () method method

slide-13
SLIDE 13

Java Web Services, Software Park Thailand, 2004

  • Dr. Kanda Runapongsa, Khon Kaen University

13

25

Handling Attributes Handling Attributes

  • Attributes are not reported through separate

Attributes are not reported through separate callbacks callbacks

  • Instead an Attributes object containing all the

Instead an Attributes object containing all the attributes of an element is passed to the attributes of an element is passed to the startElement startElement() () method for the start method for the start-

  • tag or

tag or empty empty-

  • element tag of the element that processes

element tag of the element that processes the attributes the attributes

26

Receiving Characters Receiving Characters

  • When the parser reads # PCDATA, it passes this

When the parser reads # PCDATA, it passes this text to the text to the characters() characters() method as an array of method as an array of chars chars

  • You must not assume that the parser will pass

You must not assume that the parser will pass you the maximum contiguous run of text in a you the maximum contiguous run of text in a single call to single call to characters() characters()

slide-14
SLIDE 14

Java Web Services, Software Park Thailand, 2004

  • Dr. Kanda Runapongsa, Khon Kaen University

14

27

What the What the ContentHandler ContentHandler Doesn Doesn’ ’t t Tell You Tell You

  • The type of quotes that surround attributes

The type of quotes that surround attributes

  • Whether empty elements are represented as

Whether empty elements are represented as <name></ name> or <name/ > <name></ name> or <name/ >

  • Whether an attribute was specified in the

Whether an attribute was specified in the instance document or defaulted in from the instance document or defaulted in from the DTD or schema DTD or schema

28

Overview Overview

  • XML Parsers

XML Parsers

  • What is JAXP ?

What is JAXP ?

  • SAX : Simple API for XML

SAX : Simple API for XML

  • DOM : Document Object Model

DOM : Document Object Model

  • SAX vs. DOM

SAX vs. DOM

  • When to Use DOM ?

When to Use DOM ?

  • When to Use SAX ?

When to Use SAX ?

  • Transforming with XSLT

Transforming with XSLT

slide-15
SLIDE 15

Java Web Services, Software Park Thailand, 2004

  • Dr. Kanda Runapongsa, Khon Kaen University

15

29

DOM DOM (Document Object Model) (Document Object Model)

30

Overview Overview

  • DOM and Programming Languages

DOM and Programming Languages

  • The Evolution of DOM

The Evolution of DOM

  • Trees

Trees

  • DOM in Action

DOM in Action

  • DOM Parsers for Java

DOM Parsers for Java

  • Parsing Documents with a DOM Parser

Parsing Documents with a DOM Parser

  • The Node Interface

The Node Interface

  • The

The NodeList NodeList Interface Interface

slide-16
SLIDE 16

Java Web Services, Software Park Thailand, 2004

  • Dr. Kanda Runapongsa, Khon Kaen University

16

31

DOM and Programming Languages DOM and Programming Languages

  • DOM is defined in the Interface Definition

DOM is defined in the Interface Definition Language (IDL) so that it Language (IDL) so that it’ ’s language neutral s language neutral

  • DOM bindings exist for most object

DOM bindings exist for most object-

  • oriented
  • riented

languages including Java, JavaScript, C++, languages including Java, JavaScript, C++, Python, and Python, and Perl Perl

32

The Evolution of DOM The Evolution of DOM

  • The first version wasn

The first version wasn’ ’t an official specification, t an official specification, just the object model that Netscape Navigator 3 just the object model that Netscape Navigator 3 and Internet Explorer 3 implemented in their and Internet Explorer 3 implemented in their

  • browsers. This is sometimes called DOM Level
  • browsers. This is sometimes called DOM Level
  • DOM Level 0 only applied to HTML

DOM Level 0 only applied to HTML documents and only in the context of JavaScript documents and only in the context of JavaScript

slide-17
SLIDE 17

Java Web Services, Software Park Thailand, 2004

  • Dr. Kanda Runapongsa, Khon Kaen University

17

33

The Evolution of DOM (Cont.) The Evolution of DOM (Cont.)

  • The growing incompatibility between the two

The growing incompatibility between the two browser object models made it obvious that browser object models made it obvious that something more standard was needed something more standard was needed

  • Hence, the W3C launched the W3C DOM

Hence, the W3C launched the W3C DOM Activity and began working on DOM Level 1 Activity and began working on DOM Level 1

34

The Evolution of DOM (Cont.) The Evolution of DOM (Cont.)

  • DOM Level 2 cleaned up the DOM Level 1

DOM Level 2 cleaned up the DOM Level 1

  • The big change was namespace support in the

The big change was namespace support in the Element and Element and Attr Attr interfaces interfaces

  • DOM2 also added a number of supplementary

DOM2 also added a number of supplementary interfaces for events, traversal, ranges, views, interfaces for events, traversal, ranges, views, and style sheets and style sheets

  • From this point, we will learn about DOM2

From this point, we will learn about DOM2

slide-18
SLIDE 18

Java Web Services, Software Park Thailand, 2004

  • Dr. Kanda Runapongsa, Khon Kaen University

18

35

Trees Trees

  • According to DOM, an XML document is a tree

According to DOM, an XML document is a tree made up of nodes of several types made up of nodes of several types

  • The tree has a single root node, and all nodes in

The tree has a single root node, and all nodes in this tree except for root have a single parent this tree except for root have a single parent node node

  • Each node has a list of child nodes

Each node has a list of child nodes

  • How to call a node that has the empty list of

How to call a node that has the empty list of children? children?

  • A leaf node

A leaf node

36

Tree and Nodes Tree and Nodes

  • There can also be nodes that are not part of the

There can also be nodes that are not part of the tree structure tree structure

  • Each attribute node belongs to one element node

Each attribute node belongs to one element node but is not considered to be a child of that element but is not considered to be a child of that element

  • A full DOM document is composed of a tree of

A full DOM document is composed of a tree of nodes, various nodes that are somehow nodes, various nodes that are somehow associated with other nodes but are not associated with other nodes but are not themselves part of the tree, and a random themselves part of the tree, and a random assortment of disconnected nodes assortment of disconnected nodes

slide-19
SLIDE 19

Java Web Services, Software Park Thailand, 2004

  • Dr. Kanda Runapongsa, Khon Kaen University

19

37

Tree Nodes Tree Nodes

  • Besides its tree connections, each node has a

Besides its tree connections, each node has a local name, a namespace URI, and a prefix; local name, a namespace URI, and a prefix; though for several kinds of nodes, these may be though for several kinds of nodes, these may be null null

  • For instance, the local name, namespace URI, and

For instance, the local name, namespace URI, and prefix of a comment are always be null prefix of a comment are always be null

38

Tree Nodes (Cont.) Tree Nodes (Cont.)

  • Each node has a string value

Each node has a string value

  • For text

For text-

  • ish

ish things like text nodes and things like text nodes and comments, this tends to be the text of the node comments, this tends to be the text of the node

  • For attributes, it

For attributes, it’ ’s normalized value of the s normalized value of the attribute attribute

  • For everything else, including elements and

For everything else, including elements and documents, the value is null documents, the value is null

slide-20
SLIDE 20

Java Web Services, Software Park Thailand, 2004

  • Dr. Kanda Runapongsa, Khon Kaen University

20

39

Tree Node Types Tree Node Types

  • DOM divides nodes into twelve types, seven of

DOM divides nodes into twelve types, seven of which can potentially be part of a DOM tree which can potentially be part of a DOM tree

  • Document nodes

Document nodes

  • Element nodes

Element nodes

  • Text nodes

Text nodes

  • Attribute nodes

Attribute nodes

  • Processing instruction nodes

Processing instruction nodes

40

Tree Node Types (Cont.) Tree Node Types (Cont.)

  • Types of a node in DOM

Types of a node in DOM

  • Comment nodes

Comment nodes

  • Document type nodes

Document type nodes

  • Document fragment nodes

Document fragment nodes

  • Notation nodes

Notation nodes

  • CDATA section nodes

CDATA section nodes

  • Entity nodes

Entity nodes

  • Entity reference nodes

Entity reference nodes

slide-21
SLIDE 21

Java Web Services, Software Park Thailand, 2004

  • Dr. Kanda Runapongsa, Khon Kaen University

21

41

Document Nodes Document Nodes

  • Each DOM tree has a single root document

Each DOM tree has a single root document node node

  • This node has children

This node has children

  • Since all documents have root elements, a

Since all documents have root elements, a document node always has exactly one element document node always has exactly one element node child node child

  • If the document has a document type

If the document has a document type declaration, then it will also have one document declaration, then it will also have one document type node child type node child

42

Document Nodes (Cont.) Document Nodes (Cont.)

  • If the document contains any comments or

If the document contains any comments or processing instructions before or after the root processing instructions before or after the root element, then these will also be child nodes of element, then these will also be child nodes of the document node the document node

  • The order of all children is maintained

The order of all children is maintained

slide-22
SLIDE 22

Java Web Services, Software Park Thailand, 2004

  • Dr. Kanda Runapongsa, Khon Kaen University

22

43

Element Nodes Element Nodes

  • Each element node has a name, a local name, a

Each element node has a name, a local name, a namespace URI (which may be null if the namespace URI (which may be null if the element is not in any namespace) and a prefix element is not in any namespace) and a prefix (which may also be null) (which may also be null)

  • An element node can contain text nodes,

An element node can contain text nodes, comment nodes, and processing instruction comment nodes, and processing instruction nodes nodes

44

Attribute Nodes Attribute Nodes

  • An attribute node has a name, a local name, a

An attribute node has a name, a local name, a prefix, a namespace URI, and a string value prefix, a namespace URI, and a string value

  • The attribute value is normalized

The attribute value is normalized

  • All white space characters are converted into a

All white space characters are converted into a single space single space

  • Attributes are not considered to be children of

Attributes are not considered to be children of the element they are attached to the element they are attached to

slide-23
SLIDE 23

Java Web Services, Software Park Thailand, 2004

  • Dr. Kanda Runapongsa, Khon Kaen University

23

45

Leaf Nodes Leaf Nodes

  • Only document, element, attribute, entity, and

Only document, element, attribute, entity, and entity reference can have children entity reference can have children

  • The remaining node types do not have children

The remaining node types do not have children

  • Several types of leaf nodes, such as text nodes,

Several types of leaf nodes, such as text nodes, comment nodes, processing instruction nodes, comment nodes, processing instruction nodes, and CDATA section nodes and CDATA section nodes

46

DOM Parsers for Java DOM Parsers for Java

  • JAXP, the Java API for XML Processing,

JAXP, the Java API for XML Processing, provides standard parser independent means to provides standard parser independent means to parse existing documents, create documents, and parse existing documents, create documents, and serialize in serialize in-

  • memory DOM trees to XML files

memory DOM trees to XML files

  • JAXP is a standard part of Java 1.4

JAXP is a standard part of Java 1.4 or higher

  • r higher
slide-24
SLIDE 24

Java Web Services, Software Park Thailand, 2004

  • Dr. Kanda Runapongsa, Khon Kaen University

24

47

Parsing Documents with a DOM Parsing Documents with a DOM Parser Parser

  • Unlike SAX, DOM does not have a class or

Unlike SAX, DOM does not have a class or interface that represents the XML parser interface that represents the XML parser

  • Each parser vendor provides their own unique

Each parser vendor provides their own unique class (org.apace.xerces.parsers.DOMParser, class (org.apace.xerces.parsers.DOMParser,

  • racle.xml.parser.v2.DOMParser)
  • racle.xml.parser.v2.DOMParser)
  • Since these classes do not share a common

Since these classes do not share a common interface or superclass, the methods they use to interface or superclass, the methods they use to parse documents vary too parse documents vary too

48

JAXP DOM Parser JAXP DOM Parser

  • The lack of a standard means of parsing an

The lack of a standard means of parsing an XML document is one of the holds that JAXP XML document is one of the holds that JAXP fills fills

  • If your parser implements JAXP, then instead of

If your parser implements JAXP, then instead of using the parser using the parser-

  • specific classes, you can use the

specific classes, you can use the javax.xml.parsers.DocumentBuilderFacotry javax.xml.parsers.DocumentBuilderFacotry and and javax.xml.parsers.DocumentBuilder javax.xml.parsers.DocumentBuilder classes to classes to parse the documents parse the documents

slide-25
SLIDE 25

Java Web Services, Software Park Thailand, 2004

  • Dr. Kanda Runapongsa, Khon Kaen University

25

49

The Node Interface The Node Interface

  • Once you

Once you’ ’ve parsed the document and formed ve parsed the document and formed

  • rg.w3c.dom.Document object, you can forget
  • rg.w3c.dom.Document object, you can forget

the differences between the various parsers and the differences between the various parsers and just work with the standard DOM interfaces just work with the standard DOM interfaces

50

Common DOM Methods Common DOM Methods

  • When you

When you’ ’re working with the DOM, you re working with the DOM, you’ ’ll ll

  • ften use the following methods
  • ften use the following methods
  • Document.getDocumentElement

Document.getDocumentElement(): Returns the (): Returns the root of the DOM tree root of the DOM tree

  • Node.getFirstChild

Node.getFirstChild() and () and Node.getLastChild Node.getLastChild() ()

  • Node.getNextSibling

Node.getNextSibling() ()

  • Element.getAttribute(String

Element.getAttribute(String attrName attrName) )

slide-26
SLIDE 26

Java Web Services, Software Park Thailand, 2004

  • Dr. Kanda Runapongsa, Khon Kaen University

26

51

The The NodeList NodeList Interface Interface

  • DOM stores the lists of children of each node in

DOM stores the lists of children of each node in NodeList NodeList objects

  • bjects
  • Indexes start from 0 and continue to one less

Indexes start from 0 and continue to one less than the length of the list, just like Java arrays than the length of the list, just like Java arrays

  • package org.w3c.dom; public interface

package org.w3c.dom; public interface NodeList NodeList { { public Node public Node item(int item(int index); index); public public int int getLength getLength(); } (); }

52

Overview Overview

  • XML Parsers

XML Parsers

  • What is JAXP ?

What is JAXP ?

  • SAX : Simple API for XML

SAX : Simple API for XML

  • DOM : Document Object Model

DOM : Document Object Model

  • SAX vs. DOM

SAX vs. DOM

  • When to Use DOM ?

When to Use DOM ?

  • When to Use SAX ?

When to Use SAX ?

  • Transforming with XSLT

Transforming with XSLT

slide-27
SLIDE 27

Java Web Services, Software Park Thailand, 2004

  • Dr. Kanda Runapongsa, Khon Kaen University

27

53

SAX vs. DOM SAX vs. DOM

  • In the case of DOM, the parser does almost

In the case of DOM, the parser does almost everything everything

  • Read the XML document in

Read the XML document in

  • Create an Object model on top of it

Create an Object model on top of it

  • Give you a reference to this object model (a

Give you a reference to this object model (a Document object) so that you can manipulate it Document object) so that you can manipulate it

  • SAX doesn

SAX doesn’ ’t expect the parser to do much t expect the parser to do much

54

SAX vs. DOM (Cont.) SAX vs. DOM (Cont.)

  • For SAX, the parser should

For SAX, the parser should

  • Read in the XML document

Read in the XML document

  • Fire a bunch of events depending on what tags it

Fire a bunch of events depending on what tags it encounters in the XML document encounters in the XML document

  • Then, the programmer needs to make sense of

Then, the programmer needs to make sense of all the tag events and create objects in their own all the tag events and create objects in their own

  • bject model
  • bject model
slide-28
SLIDE 28

Java Web Services, Software Park Thailand, 2004

  • Dr. Kanda Runapongsa, Khon Kaen University

28

55

Overview Overview

  • XML Parsers

XML Parsers

  • What is JAXP ?

What is JAXP ?

  • SAX : Simple API for XML

SAX : Simple API for XML

  • DOM : Document Object Model

DOM : Document Object Model

  • SAX vs. DOM

SAX vs. DOM

  • When to Use DOM ?

When to Use DOM ?

  • When to Use SAX ?

When to Use SAX ?

56

When to Use DOM ? When to Use DOM ?

  • DOM is quite easy to implement

DOM is quite easy to implement

  • Good for the development to be done in a short

Good for the development to be done in a short amount of time amount of time

  • DOM has crated a tree of nodes

DOM has crated a tree of nodes

  • When you need to quickly access children and parent

When you need to quickly access children and parent

  • f current nodes
  • f current nodes
  • When you need to modify an XML structure

When you need to modify an XML structure

  • What are the disadvantages of using DOM?

What are the disadvantages of using DOM?

slide-29
SLIDE 29

Java Web Services, Software Park Thailand, 2004

  • Dr. Kanda Runapongsa, Khon Kaen University

29

57

When to Use SAX When to Use SAX ? ?

  • SAX requires little memory because

SAX requires little memory because

  • It does not construct an internal representation of

It does not construct an internal representation of the XML data the XML data

  • It works well when you simply want to read data

It works well when you simply want to read data and have the application act on it and have the application act on it

  • You see the data as it streams in, but you can

You see the data as it streams in, but you can’ ’t go t go back to an earlier position or leap ahead to a back to an earlier position or leap ahead to a different position different position

58

Overview Overview

  • XML Parsers

XML Parsers

  • What is JAXP ?

What is JAXP ?

  • SAX : Simple API for XML

SAX : Simple API for XML

  • DOM : Document Object Model

DOM : Document Object Model

  • SAX vs. DOM

SAX vs. DOM

  • When to Use DOM ?

When to Use DOM ?

  • When to Use SAX ?

When to Use SAX ?

  • Transforming with XSLT

Transforming with XSLT

slide-30
SLIDE 30

Java Web Services, Software Park Thailand, 2004

  • Dr. Kanda Runapongsa, Khon Kaen University

30

59

XSL XSL -

  • The Style Sheet of XML

The Style Sheet of XML

  • XML does not use predefined tags (we can use

XML does not use predefined tags (we can use any tags we want) any tags we want)

  • <table> could mean an HTML table, a piece of

<table> could mean an HTML table, a piece of furniture, or something else furniture, or something else

  • XSL: something in addition to the XML

XSL: something in addition to the XML document that describes how the document document that describes how the document should be displayed should be displayed

60

What is XSLT? What is XSLT?

  • XSLT transforms an XML document into another

XSLT transforms an XML document into another XML document, such as an XHTML document XML document, such as an XHTML document

  • XSLT can

XSLT can

  • Add new elements into the output file

Add new elements into the output file

  • Remove elements

Remove elements

  • Rearrange and sort elements

Rearrange and sort elements

  • Test and make decisions about which elements to display

Test and make decisions about which elements to display

slide-31
SLIDE 31

Java Web Services, Software Park Thailand, 2004

  • Dr. Kanda Runapongsa, Khon Kaen University

31

61

How Does XSLT Work? How Does XSLT Work?

  • XSLT transforms an XML

XSLT transforms an XML source tree source tree into into an XML an XML result tree result tree

  • XSLT uses XPath to define parts of the

XSLT uses XPath to define parts of the source document that source document that match match one or more

  • ne or more

predefined predefined templates templates

62

How Does XSLT Work? How Does XSLT Work?

  • When a match is found, XSLT will

When a match is found, XSLT will transform transform the matching part of the the matching part of the source source document into document into the the result result document document

  • The parts of the source document that do not

The parts of the source document that do not match a template will end up unmodified in the match a template will end up unmodified in the result document result document

slide-32
SLIDE 32

Java Web Services, Software Park Thailand, 2004

  • Dr. Kanda Runapongsa, Khon Kaen University

32

63

XPath XPath and XSLT and XSLT

  • XPath

XPath is a standard that provides the is a standard that provides the mechanism for accessing the elements of an mechanism for accessing the elements of an XML document XML document

  • XPath

XPath identifies the parts of the input document identifies the parts of the input document to be transformed to be transformed

  • XPath

XPath enables you to traverse an XML enables you to traverse an XML document and select the set of elements document and select the set of elements

64

Transforming with XSLT Transforming with XSLT

  • XSL provides the syntax and semantics for

XSL provides the syntax and semantics for specifying formatting specifying formatting

  • XSLT is the processor that performs the

XSLT is the processor that performs the formatting task formatting task

  • XSLT is often used for the purpose of

XSLT is often used for the purpose of generating various output formats for an generating various output formats for an application that enables access to heterogeneous application that enables access to heterogeneous client types client types

slide-33
SLIDE 33

Java Web Services, Software Park Thailand, 2004

  • Dr. Kanda Runapongsa, Khon Kaen University

33

65

XSLT supported in JAXP XSLT supported in JAXP

  • Steps required for transformation follow these

Steps required for transformation follow these logical steps logical steps

  • Obtain a transformation factory used for

Obtain a transformation factory used for instantiating a transformer class instantiating a transformer class

  • Create a new transformer class

Create a new transformer class

  • Use the transformer class for transforming the data

Use the transformer class for transforming the data by specifying the XML input source and the output by specifying the XML input source and the output source source

66

Getting the Factory and Transformer Class Getting the Factory and Transformer Class

  • Use the factory class for instantiating a transformer

Use the factory class for instantiating a transformer implementation class implementation class TransformerFactory TransformerFactory factory = factory = TransformerFactory.newInstance TransformerFactory.newInstance(); ();

  • Use the

Use the transfomer transfomer class for applying the class for applying the stylesheet stylesheet to to the input XML data the input XML data Transformer Transformer transfomer transfomer = = factory.newTransformer(new factory.newTransformer(new StreamSoruce( StreamSoruce(“ “order.xsl

  • rder.xsl”

”)); ));

slide-34
SLIDE 34

Java Web Services, Software Park Thailand, 2004

  • Dr. Kanda Runapongsa, Khon Kaen University

34

67

Transforming the XML Transforming the XML

  • The transformer then calls the transform method to

The transformer then calls the transform method to invoke the transformation process. invoke the transformation process.

  • The parameters required in the transform method are

The parameters required in the transform method are input stream and output result input stream and output result

  • transformer.transform(new

transformer.transform(new StreamSource( StreamSource(“ “PurchoseOrder.xml PurchoseOrder.xml” ”), ), new new StreamResult(System.out StreamResult(System.out)); ));

68

Resources Resources

  • MSXML: Microsoft XML Parser:

MSXML: Microsoft XML Parser: http:/ / http:/ / msdn.microsoft.com msdn.microsoft.com/ xml/ / xml/

  • Apache

Apache Xerces Xerces: XML parsers in Java and C++: : XML parsers in Java and C++: http:/ / http:/ / xml.apache.org xml.apache.org

  • IBM

IBM AlphaWorks AlphaWorks: : http:/ / www.alphaworks.ibm.com/ tech/ xml4j http:/ / www.alphaworks.ibm.com/ tech/ xml4j

  • expat

expat: : http:/ / http:/ / www.jclark.com/ xml/ expat.html www.jclark.com/ xml/ expat.html

  • XP:

XP: http:/ / http:/ / www.jclark.com/ xml/ xp www.jclark.com/ xml/ xp/ /

  • Other sources

Other sources

  • XML.com

XML.com web site web site

  • Cover Pages: XML web site

Cover Pages: XML web site

slide-35
SLIDE 35

Java Web Services, Software Park Thailand, 2004

  • Dr. Kanda Runapongsa, Khon Kaen University

35

69

Exercises Exercises

  • Compile and run these files

Compile and run these files

  • SAX

SAX ex1.java , ex2.java , ex3.java , ex4.java,ex5.java , ex5.java ex1.java , ex2.java , ex3.java , ex4.java,ex5.java , ex5.java , ex6.java , ex7.java , ex6.java , ex7.java

  • DOM

DOM dom1.java , dom2.java , dom3.java dom1.java , dom2.java , dom3.java , dom4.java , , dom4.java , dom5.java dom5.java

  • Download Exercises

Download Exercises http:/ / gear.kku.ac.th/ ~krunapon/ 178375/ exercises/ ja http:/ / gear.kku.ac.th/ ~krunapon/ 178375/ exercises/ ja xp_exer.zip xp_exer.zip