XPT 2006 XML APIs: SAX 1
- 2. XML Processor APIs
- 2. XML Processor APIs
- How can (Java) applications manipulate
How can (Java) applications manipulate structured (XML) documents? structured (XML) documents?
– – An overview of XML processor interfaces An overview of XML processor interfaces
2.1 SAX: an event 2.1 SAX: an event-
- based interface
based interface 2.2 DOM: an object 2.2 DOM: an object-
- based interface
based interface 2.3 JAXP: Java API for XML Processing 2.3 JAXP: Java API for XML Processing
XPT 2006 XML APIs: SAX 2
Document Parser Interfaces Document Parser Interfaces
- Every XML application contains some kind of
Every XML application contains some kind of a parser a parser
– – editors, browsers editors, browsers – – transformation/style engines, DB loaders, ... transformation/style engines, DB loaders, ...
- XML parsers have become standard tools of
XML parsers have become standard tools of application development frameworks application development frameworks
– – JDK 1.4 contains JAXP, with its default parser JDK 1.4 contains JAXP, with its default parser (Apache Crimson) (Apache Crimson)
(See, e.g., (See, e.g., Leventhal Leventhal, Lewis & Fuchs: Designing XML , Lewis & Fuchs: Designing XML Internet Applications, Chapter 10, and Internet Applications, Chapter 10, and D.
- D. Megginson
Megginson: Events vs. Trees [online]) : Events vs. Trees [online])
XPT 2006 XML APIs: SAX 3
Tasks of a Parser Tasks of a Parser
- Document instance decomposition
Document instance decomposition
– – elements, attributes, text, processing instructions, elements, attributes, text, processing instructions, entities, ... entities, ...
- Verification
Verification
– – well well-
- formedness
formedness checking checking » » syntactical correctness of XML markup syntactical correctness of XML markup – – validation (against a DTD or Schema; optional) validation (against a DTD or Schema; optional)
- Access to contents of the DTD (if supported)
Access to contents of the DTD (if supported)
– – SAX 2.0 Extensions provide info of declarations: SAX 2.0 Extensions provide info of declarations: element type names and their content model element type names and their content model expressions expressions
XPT 2006 XML APIs: SAX 4
Document Parser Interfaces Document Parser Interfaces
I: Event I: Event-
- based interfaces
based interfaces
– – Command line and ESIS interfaces Command line and ESIS interfaces
» » Element Structure Information Set, traditional Element Structure Information Set, traditional interface to stand interface to stand-
- alone SGML parsers
alone SGML parsers
– – Event call Event call-
- back interfaces: SAX
back interfaces: SAX
II: Tree II: Tree-
- based (object model) interfaces
based (object model) interfaces
– – W3C DOM Recommendation W3C DOM Recommendation – – Java Java-
- specific object models: JAXB, JDOM, dom4J
specific object models: JAXB, JDOM, dom4J
XPT 2006 XML APIs: SAX 5
Command Command-
- line ESIS interface
line ESIS interface
Application Application SGML/XML Parser SGML/XML Parser Command Command line call line call <E <E </E> </E> Hi! Hi! i="1" i="1"> > ESIS ESIS Stream Stream (E (E Ai CDATA 1 Ai CDATA 1
- Hi!
Hi! )E )E
XPT 2006 XML APIs: SAX 6
Event Call Event Call-
- Back Interfaces
Back Interfaces
- Application implements a set of
Application implements a set of call call-
- back
back methods methods for handling parse events for handling parse events
– – parser notifies the application by method calls parser notifies the application by method calls – – qualified further by parameters: qualified further by parameters:
» » element type name element type name » » names and values of attributes names and values of attributes » » values of content strings, values of content strings, … …
- Idea behind
Idea behind ‘‘ ‘‘SAX SAX’’ ’’ (Simple API for XML) (Simple API for XML)
– – an industry standard API for XML parsers an industry standard API for XML parsers – – could think as could think as “ “S Serial erial A Access ccess X XML ML” ”
XPT 2006 XML APIs: SAX 7
An event call An event call-
- back application
back application
Application Main Application Main Routine Routine startDocument startDocument() () startElement startElement() () characters() characters() Parse() Parse() Callback Callback Routines Routines endElement endElement() () <A i="1"> <A i="1"> </A> </A> Hi! Hi!
"A",[i="1"] "A",[i="1"] "Hi!" "Hi!" "A" "A"
<?xml version='1.0'?> <?xml version='1.0'?>
XPT 2006 XML APIs: SAX 8
Object Model Interfaces Object Model Interfaces
- The parser builds ...
The parser builds ...
– – a document object consisting of sub a document object consisting of sub-
- objects such
- bjects such
as as document document, , elements, attributes, text elements, attributes, text, , … …
- Abstraction level higher than in event based
Abstraction level higher than in event based interfaces; more powerful access interfaces; more powerful access
– – to descendants, following siblings, to descendants, following siblings, … …
- Drawback: Higher memory consumption