[ Home ] [ Topic Types ] [ Master Index ]

Generation Of Simplified DTDs From A Set Of XML Sample Files

Base Name

Base Name (unscoped)
Generation Of Simplified DTDs From A Set Of XML Sample Files

Instance of

Occurrences

Paper

../papers/03-01-03/03-01-03.html

Date of Presentation

Wednesday, 22 May

Time of Presentation

11.00

Presentation Level

In-The-Middle

Abstract

We will describe a method and the related software for the automatic generation of simplified DTDs from a source DTD and a set of sample marked up files. The purpose is to create the minimum DTD that the sample set of files comply to. In this way, new files can be created and parsed using this simplified DTD but still be compliant to the original, more general one. The simplified DTD can be used to make the task of markup easier, specially for non-experienced XML writers. Our approach is to automatically select only those DTD features that are used by a set of valid documents (validated against the more general DTD) and eliminate the rest of them, obtaining a narrow scope DTD which defines a subset of the original markup scheme. This 'pruned' DTD can be used to build new documents of the same markup subclass, which in turn would still comply to the original general DTD. Using this automated method, the simplified DTD can be updated immediately in the event that new features are added to (or eliminated from) the sample set of XML files (modifications to files of the sample-set must be done using the general DTD for validation). This process can be repeated to incrementally produce a final narrow-scope DTD. In this way, we use a complex DTD as a general markup-design frame to build a simpler working-DTD that suits a specific project's markup needs. Another use of this technique is to build a one-document DTD, i.e. the minimum DTD derived from the general DTD that a given XML document would comply. Another benefit of this tool is that it produces statistical data that may help markup designers improve their markup schemes like the frequency of use of certain elements within others which is helpful to detect unusual structures that could reflect mark-up mistakes, misuse of the DTD, or DTD features that may allow unwanted generalization. This tool was used at the Miguel de Cervantes digital library of the University of Alicante to obtain simplified versions of the TEI.DTD (Sperberg-McQueen and Burnard, 1994). This work is part of a larger project in the field of text markup and derived applications.

Generated from an XML Topic Map with xtm2xhtml. (c) Stefan Mintert