Extreme DocBook Norman Walsh http://www.sun.com/ XML Standards - - PowerPoint PPT Presentation

extreme docbook
SMART_READER_LITE
LIVE PREVIEW

Extreme DocBook Norman Walsh http://www.sun.com/ XML Standards - - PowerPoint PPT Presentation

Extreme DocBook Norman Walsh http://www.sun.com/ XML Standards Architect Extreme Markup Languages 2004 01-06 August 2004 Version 1.0 Table of Contents This presentation explores some of the design choices made in recasting DocBook from an


slide-1
SLIDE 1

Extreme DocBook

Version 1.0

http://www.sun.com/

Norman Walsh

XML Standards Architect

Extreme Markup Languages 2004 01-06 August 2004

slide-2
SLIDE 2

This presentation explores some of the design choices made in recasting DocBook from an XML DTD to a RELAX NG Grammar. What is DocBook? History and Purpose State of the Art DTD vs. RELAX NG Compatibility Conclusions

2 / 61 http://www.sun.com/

Table of Contents

slide-3
SLIDE 3

What is DocBook? A DocBook Document

3 / 61 http://www.sun.com/

What is DocBook?

slide-4
SLIDE 4
  • DocBook is an XML vocabulary for writing documentation.

It is particularly well-suited to books and papers about computer hardware and software, though it is by no means limited to them.

  • It has been subset down to something that resembles HTML.
  • It has been extended to do things as different as websites

and, well, presentations like this one [colorized.html].

4 / 61 http://www.sun.com/

What is DocBook?

slide-5
SLIDE 5

<book> <bookinfo> <title>A Book Title</title> <author> <firstname>John</firstname> <surname>Doe</surname> </author> </bookinfo> <chapter> <title>The First Chapter</title> <para>Some <emphasis>text</emphasis>.</para> </chapter> </book>

5 / 61 http://www.sun.com/

A DocBook Document

slide-6
SLIDE 6

DocBook History DocBook’s Purpose Who’s Responsible for DocBook? DocBook NG is My Fault DocBook Development

6 / 61 http://www.sun.com/

History and Purpose

slide-7
SLIDE 7
  • DocBook has been actively maintained for more than a

decade.

  • It has always been maintained by a committee of some sort.

It is now being developed by an OASIS Technical Committee.

  • DocBook was an SGML DTD for many years, it is now princip-

ally an XML DTD.

7 / 61 http://www.sun.com/

DocBook History

slide-8
SLIDE 8
  • DocBook documents are mostly hand authored. Unlike SOAP

envelopes, purchase orders, and XML/RPC invocations, hu- mans write DocBook.

  • It’s mostly read by humans. DocBook documents, aren’t

usually consumed by unmarshalling processes building ob- ject graphs.

  • DocBook contains a lot of mixed content. Very few elements

have “simple content,” dates, numbers, etc.

8 / 61 http://www.sun.com/

DocBook’s Purpose

slide-9
SLIDE 9
  • Current committee members: Paul Grosso, Adam Di Carlo,

Mark Johnson, Dick Hamilton, Larry Rowland, Nancy Harris-

  • n, Gary Cornelius, Jirka Kosek, Michael Smith, Robert

Stayton, Steven Cogorno, Scott Hudson, Norman Walsh

  • Selected “alumni”: Terry Allen, Jon Bosak, Dale Dougherty,

Ralph Ferris, Dave Hollander, Eve Maler, Murray Maloney, Conleth O'Connell, Mike Rogers, Jean Tappan

9 / 61 http://www.sun.com/

Who’s Responsible for DocBook?

slide-10
SLIDE 10
  • The bugs are mine.
  • The current release is “Eaux-de-vie” from a few days ago.
  • The Technical Committee plans to move to RELAX NG for

DocBook V5.0.

10 / 61 http://www.sun.com/

DocBook NG is My Fault

slide-11
SLIDE 11
  • There have been about 15 releases in roughly ten years.
  • Four of those releases have been “major” releases.
  • That means we’ve added new stuff about ฀
  • f

the time!

11 / 61 http://www.sun.com/

DocBook Development

slide-12
SLIDE 12

DocBook Growth A DocBook DTD Fragment Growing Pains DocBook DTD Shortcomings Design Goals

12 / 61 http://www.sun.com/

State of the Art

slide-13
SLIDE 13

“DocBook is like a pearl, it grows by accretion.”

13 / 61 http://www.sun.com/

DocBook Growth

slide-14
SLIDE 14

<!ENTITY % chapter.module "INCLUDE"> <![%chapter.module;[ <!ENTITY % local.chapter.attrib ""> <!ENTITY % chapter.role.attrib "%role.attrib;"> <!ENTITY % chapter.element "INCLUDE"> <![%chapter.element;[ <!ELEMENT chapter %ho; (beginpage?, chapterinfo?, (%bookcomponent.title.content;), (%nav.class;)*, tocchap?, (%bookcomponent.content;), (%nav.class;)*)

14 / 61 http://www.sun.com/

A DocBook DTD Fragment

slide-15
SLIDE 15

%ubiq.inclusion;> <!--end of chapter.element-->]]> <!ENTITY % chapter.attlist "INCLUDE"> <![%chapter.attlist;[ <!ATTLIST chapter %label.attrib; %status.attrib; %common.attrib; %chapter.role.attrib; %local.chapter.attrib; > <!--end of chapter.attlist-->]]> <!--end of chapter.module-->]]>

A DocBook DTD Fragment (Continued)

slide-16
SLIDE 16
  • Growth by accretion has resulted in some content models

that are at best odd and at worst broken in pretty obvious ways.

  • Ten years of incremental growth has also changed the scale
  • f DocBook. Designing a schema of roughly 400 elements is

different than designing a schema of roughly 100. Logically extending decisions that looked regular and consistent when DocBook had 100 elements has not always resulted in a design that continues to look regular and consistent.

16 / 61 http://www.sun.com/

Growing Pains

slide-17
SLIDE 17
  • The DTD fails to capture some significant constraints.
  • Originally designed as an exchange DTD, it has largely be-

come an authoring DTD. Exchange and authoring aren’t

  • pposing design centers, but they are different.
  • While DocBook is a shining example of parameter entity

customization, parameter entity customization is fiendishly hard.

17 / 61 http://www.sun.com/

DocBook DTD Shortcomings

slide-18
SLIDE 18

The result of recasting DocBook should… 1. “feel like” DocBook. 2. enforce as many constraints as possible. 3. clean up the content models. 4. give users the flexibility to extend or subset the schema in an easy and straightforward way. 5. be able to generate XML DTD and W3C XML Schema ver- sions of DocBook.

18 / 61 http://www.sun.com/

Design Goals

slide-19
SLIDE 19

Uniform Info Elements Uniform Info Elements Info Elements in More Contexts Info Elements in More Contexts Required Titles (Valid) Required Titles (Invalid) Required Titles Co-Constraints (DTD) Co-Constraints Untangling Tables Untangling Tables Untangling Tables …

19 / 61 http://www.sun.com/

DTD vs. RELAX NG

slide-20
SLIDE 20
  • DocBook V4.x has setinfo, bookinfo, chapterinfo,

appendixinfo, sectioninfo, etc.

  • Many people think it would be nicer if there was just one

info element.

  • In DTDs, this can’t be done without sacrificing the ability to

customize the info elements on a contextual basis.

  • In RELAX NG, we can have different patterns that each define

an element named info.

20 / 61 http://www.sun.com/

Uniform Info Elements

slide-21
SLIDE 21

book.info = element info { ... } chapter.info = element info { ... } book = element book { book.info, ... } chapter = element chapter { chapter.info, ... }

Notes

  • RELAX NG Compact Syntax fits better on the slides
  • The examples are slightly simplified from the DocBook NG

schema.

21 / 61 http://www.sun.com/

Uniform Info Elements

slide-22
SLIDE 22

It (might) be nice to have info elements in more contexts: <para><info> <indexterm> <primary>Extreme Markup Languages</primary> </indexterm> </info>Some text.</para>

22 / 61 http://www.sun.com/

Info Elements in More Contexts

slide-23
SLIDE 23

In DTDs, we’d have to say (#PCDATA|...|info|...)* which would allow: <para>Some<info>...</info> text.</para> In RELAX NG, we can say: (info?, (text|...)*) which has the semantic we want.

23 / 61 http://www.sun.com/

Info Elements in More Contexts

slide-24
SLIDE 24

Some elements must have titles, but they can appear in one place or another:

<article> <title>Some Article Title</title> <para>Some content.</para> </article> <article> <articleinfo> <title>Some Article Title</title> <author><firstname>Jane</firstname> <surname>Doe</surname></author> </articleinfo> <para>Some content.</para> </article> 24 / 61 http://www.sun.com/

Required Titles (Valid)

slide-25
SLIDE 25

I said “in one place or another”: <article> <para>Some content without a title.</para> </article> <article> <title>Is This the Title?</title> <articleinfo> <title>Or Is This?</title> </articleinfo> <para>Some content.</para> </article>

25 / 61 http://www.sun.com/

Required Titles (Invalid)

slide-26
SLIDE 26

title.opt = title? & titleabbrev? & subtitle? title.req = title & titleabbrev? & subtitle? info.notitle = element info { (author|...)* } info.titlereq = element info { title.req, (author|...)* } element article { (title.req, info.notitle) | info.titlereq, ... }

(This isn’t exactly the same semantic.)

26 / 61 http://www.sun.com/

Required Titles

slide-27
SLIDE 27

DTDs don’t support co-constraints: <!ENTITY biblio.class.attribute " class (doi|isbn|issn|libraryofcongress |pubnumber|uri|other) #IMPLIED

  • therclass CDATA #IMPLIED

"> The desired semantic is:

  • If class is “other”, then otherclass must be specified,
  • therwise
  • The otherclass must not be specified.

27 / 61 http://www.sun.com/

Co-Constraints (DTD)

slide-28
SLIDE 28

RELAX NG does: biblio.class-enum.attribute = attribute class { "doi" | "isbn" | "issn" | "libraryofcongress" | "pubnumber" | "uri" }? biblio.class-other.attributes = attribute class { "other" }?, attribute otherclass { xsd:NMTOKEN }

28 / 61 http://www.sun.com/

Co-Constraints

slide-29
SLIDE 29

biblio.class.attrib = (biblio.class-enum.attribute | biblio.class-other.attributes)

Co-Constraints (Continued)

slide-30
SLIDE 30
  • DocBook uses CALS Tables. In DocBook V4.3, we added HTML

Tables.

  • CALS and HTML tables have overlapping element names

with different content models.

  • CALS and HTML tables have attributes with the same name

and intentionally disjoint enumerated values.

  • In DTDs, we just make a union...

30 / 61 http://www.sun.com/

Untangling Tables

slide-31
SLIDE 31

<!ELEMENT table ((thead?, tfoot?, (tbody|tr+)) | tgroup)> <!ATTLIST table frame (above | all | below | ... ... | void | vsides) #IMPLIED > <!ELEMENT tbody (tr+ | row+)>

31 / 61 http://www.sun.com/

Untangling Tables

slide-32
SLIDE 32

html.table = element table { attribute frame { "void" | "above" | "below" | "hsides" | "vsides" | "lhs" | "rhs" | "box" | "border" }?, ((html.thead?, html.tfoot?, html.tbody) | html.tr+) } html.tbody = element tbody { html.tr+ }

32 / 61 http://www.sun.com/

Untangling Tables

slide-33
SLIDE 33

cals.table = element table { attribute frame { "all" | "bottom" | "none" | "sides" | "top" | "topbot" }?, cals.tgroup } cals.tbody = element tbody { cals.row+ }

33 / 61 http://www.sun.com/

Untangling Tables

slide-34
SLIDE 34

table = html.table | cals.table This allows any HTML table or any CALS table, but no invalid mixture of the two models.

34 / 61 http://www.sun.com/

Untangling Tables

slide-35
SLIDE 35

A small number of elements and attributes benefit from real datatypes.

  • date, pubdate, etc. are real dates (maybe).
  • startinglinenumber is an integer.
  • cols on tgroup is a positive integer.

35 / 61 http://www.sun.com/

Real Datatyping

slide-36
SLIDE 36

Grammar based validation technologies (like RELAX NG) and rule based validation technologies (like Schematron) are natur- ally complimentary. Mixing them allows us to play to the strengths of each without stretching either to enforce con- straints that they aren’t readily designed to enforce.

36 / 61 http://www.sun.com/

Extra-Grammatical Constraints

slide-37
SLIDE 37
  • Exclusions. (High on my list of features for a future version
  • f RELAX NG.)
  • A version attribute on the root element.
  • Enforcing implicit constraints. (In a segmented list, the

number of segments in each list item has to be the same as the number of titles specified.)

  • Enforcing referential integrity constraints. (A cross-reference
  • n a footnoteref must point to a footnote.)

37 / 61 http://www.sun.com/

Extra-Grammatical Constraints

slide-38
SLIDE 38

db.book = [ s:rule [ context = "/db:book" s:assert [ test = "@version" "The root element must have a version attribute." ] ] ] element book { book.attlist, book.info,

38 / 61 http://www.sun.com/

RELAX NG + Schematron

slide-39
SLIDE 39

(navigation.components | components | divisions)+ } There are validators that will enforce both sets of constraints.

RELAX NG + Schematron (Continued)

slide-40
SLIDE 40
  • The ability to customize DocBook is critically important.
  • Many users subset DocBook.
  • Many users extend DocBook.
  • To be successful, these operations must be (at least) as easy

as DTD customization, preferably easier.

40 / 61 http://www.sun.com/

Customization

slide-41
SLIDE 41

1. Remove procedure. 2. Add an exercise element.

41 / 61 http://www.sun.com/

Two Customization Examples

slide-42
SLIDE 42

<!-- DocBook XML V4.3 No Procedures Subset --> <!ENTITY % ebnf.block.hook ""> <!ENTITY % local.compound.class ""> <!ENTITY % compound.class "msgset|sidebar|qandaset %ebnf.block.hook; %local.compound.class;"> <!ENTITY % procedure.content.module "IGNORE"> <!ENTITY % task.content.module "IGNORE"> <!ENTITY % sidebar.element "IGNORE"> <!ENTITY % qandaset.element "IGNORE"> <!ENTITY % qandadiv.element "IGNORE">

42 / 61 http://www.sun.com/

Removing Procedures from the DTD

slide-43
SLIDE 43

<!ENTITY % question.element "IGNORE"> <!ENTITY % answer.element "IGNORE"> <!ENTITY % revdescription.element "IGNORE"> <!ENTITY % caution.element "IGNORE"> <!ENTITY % important.element "IGNORE"> <!ENTITY % note.element "IGNORE"> <!ENTITY % tip.element "IGNORE"> <!ENTITY % warning.element "IGNORE"> <!ENTITY % docbook.dtd PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN" "http://docbook.org/xml/4.3/docbookx.dtd"> %docbook.dtd; <!ENTITY % my.sidebar.mix

Removing Procedures from the DTD (Continued)

slide-44
SLIDE 44

"%list.class; |%admon.class; |%linespecific.class; |%synop.class; |%para.class; |%informal.class; |%formal.class; |%genobj.class; |%ndxterm.class; |beginpage %local.sidebar.mix;"> <!ELEMENT sidebar (sidebarinfo?, (%formalobject.title.content;)?, (%my.sidebar.mix;)+)> <!ENTITY % my.qandaset.mix "%list.class; |%admon.class;

Removing Procedures from the DTD (Continued)

slide-45
SLIDE 45

|%linespecific.class; |%synop.class; |%para.class; |%informal.class; |%formal.class; |%genobj.class; |%ndxterm.class; %local.qandaset.mix;"> <!ELEMENT qandaset (blockinfo?, (%formalobject.title.content;)?, (%my.qandaset.mix;)*, (qandadiv+|qandaentry+))> <!ELEMENT qandadiv (blockinfo?, (%formalobject.title.content;)?, (%my.qandaset.mix;)*, (qandadiv+|qandaentry+))>

Removing Procedures from the DTD (Continued)

slide-46
SLIDE 46

<!ELEMENT question (label?, (%my.qandaset.mix;)+)> <!ELEMENT answer (label?, (%my.qandaset.mix;)*, qandaentry*)> <!ENTITY % my.revdescription.mix "%list.class; |%admon.class; |%linespecific.class; |%synop.class; |%para.class; |%informal.class; |%formal.class; |%genobj.class; |%ndxterm.class; %local.revdescription.mix;">

Removing Procedures from the DTD (Continued)

slide-47
SLIDE 47

<!ELEMENT revdescription ((%my.revdescription.mix;)+)> <!ENTITY % my.admon.mix "%list.class; |%linespecific.class; |%synop.class; |%para.class; |%informal.class; |%formal.class; |sidebar |anchor|bridgehead|remark |%ndxterm.class; |beginpage %local.admon.mix;"> <!ELEMENT caution (title?, (%my.admon.mix;)+) %admon.exclusion;>

Removing Procedures from the DTD (Continued)

slide-48
SLIDE 48

<!ELEMENT important (title?, (%my.admon.mix;)+) %admon.exclusion;> <!ELEMENT note (title?, (%my.admon.mix;)+) %admon.exclusion;> <!ELEMENT tip (title?, (%my.admon.mix;)+) %admon.exclusion;> <!ELEMENT warning (title?, (%my.admon.mix;)+) %admon.exclusion;>

Removing Procedures from the DTD (Continued)

slide-49
SLIDE 49

# DocBook NG "Bourbon" No Procedures Subset namespace db = "http://docbook.org/docbook-ng" default namespace = "http://docbook.org/docbook-ng" include "docbook.rnc" { db.procedure = notAllowed }

49 / 61 http://www.sun.com/

slide-50
SLIDE 50

<!ENTITY % local.formal.class "|exercise"> <!ENTITY % docbook.dtd PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN" "http://docbook.org/xml/4.3/docbookx.dtd"> %docbook.dtd; <!ELEMENT exercise %ho; (blockinfo?, (%formalobject.title.content;), (%example.mix;)+) %formal.exclusion;> <!ATTLIST exercise role CDATA #IMPLIED %common.attrib; >

50 / 61 http://www.sun.com/

Adding Exercises to the DTD

slide-51
SLIDE 51

# DocBook NG "Bourbon" Exercises Extension namespace db = "http://docbook.org/docbook-ng" default namespace = "http://docbook.org/docbook-ng" include "docbook.rnc" { extension.blocks |= exercise } exercise = element exercise { db.title, all.blocks+ }

51 / 61 http://www.sun.com/

Adding Exercises to the RELAX NG Schema

slide-52
SLIDE 52
  • XSLT can convert DocBook V4.x to DocBook NG.
  • It successfully converts about 94% of the DocBook test cases.
  • It doesn’t convert elements that use entity attributes. It

also doesn’t convert old style toc markup.

  • It can’t convert tests that use block elements in inline con-

texts.

52 / 61 http://www.sun.com/

Converting to NG

slide-53
SLIDE 53

Creating XML Schemas Creating DTDs

53 / 61 http://www.sun.com/

Compatibility

slide-54
SLIDE 54

Use trang: <xs:element name="book"> <xs:complexType> <xs:sequence> <xs:group ref="db:db.book.info"/> <xs:choice maxOccurs="unbounded"> <xs:group ref="db:db.navigation.components"/> <xs:element ref="db:db.components"/> <xs:element ref="db:db.divisions"/> </xs:choice> </xs:sequence> <xs:attributeGroup ref="db:db.book.attlist"/>

54 / 61 http://www.sun.com/

Creating XML Schemas

slide-55
SLIDE 55

</xs:complexType> </xs:element> (Approximates the RELAX NG patterns for MathML/SVG exten- sions because of wildcard limitations.)

Creating XML Schemas (Continued)

slide-56
SLIDE 56
  • Trang can’t.
  • One possible solution: declarative markup in the schema

and use XSLT.

  • Another solution: flatten aggressively and “reconstitute”.

56 / 61 http://www.sun.com/

Creating DTDs

slide-57
SLIDE 57

Other Approaches Conclusions Things I Haven’t Done References

57 / 61 http://www.sun.com/

Conclusions

slide-58
SLIDE 58
  • It might be technically possible to achieve some of the goals

using DTDs simply by refactoring the parameter entities (again).

  • W3C XML Schema would be better than DTDs.
  • I’m not sure the abstraction is right for schemas with

lots of mixed content.

  • Determinism would still be a problem.
  • Local element declarations force customizations to

“cascade” up the tree.

  • No support for co-constraints.
  • Schematron could do it all, but it would require a lot of

rules.

58 / 61 http://www.sun.com/

Other Approaches

slide-59
SLIDE 59
  • The DocBook RELAX NG schema satisfies the redesign goals

to a large extent: it looks and feels like DocBook while at the same time having simpler, more logical content models and better contraints.

  • The RELAX NG grammar is demonstrably easier to customize,

at least for those applications that can use the RELAX NG grammar directly.

  • DocBook NG has only 356 elements and 1,701 patterns.

59 / 61 http://www.sun.com/

Conclusions

slide-60
SLIDE 60
  • Decided to do it ODDly.
  • Satisfactorily address the “ubiquitous linking” problem.
  • Build the DTD version.
  • Finish fiddling with the build system.

60 / 61 http://www.sun.com/

Things I Haven’t Done

slide-61
SLIDE 61
  • http://docbook.org/docbook-ng/, the DocBook NG Schemas.
  • http://sourceforge.net/projects/docbook/, the DocBook

SourceForge project.

  • http://docbook.org/tdg/en/html-ng/, a special edition of

DocBook: The Definitive Guide showing the DTD and (flattened) RELAX NG content models.

  • http://www.oasis-open.org/committees/docbook/, the

DocBook Technical Committee.

  • http://norman.walsh.name/threads/refactorDocBook, a

thread through the essays I’ve written about redesigning DocBook.

61 / 61 http://www.sun.com/

References